Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate TOC (table of contents) #74

Open
scandinave opened this issue Jun 17, 2020 · 14 comments
Open

Generate TOC (table of contents) #74

scandinave opened this issue Jun 17, 2020 · 14 comments

Comments

@scandinave
Copy link

scandinave commented Jun 17, 2020

Is it possible to automatically generate table content of all headings, like by placing some html markup on a page ?
Something like that :

<div class="summary"></div>

@scandinave scandinave changed the title Generate Index summary Generate table summary Jun 17, 2020
@simonhaenisch
Copy link
Owner

It's not currently possible (see #9 (comment)). I'd love to do this but I personally never had the need for a table of contents, and my time is limited so I can't give you any ETA (could be months).

It's possible to write plugins for marked (see https://marked.js.org/#/USING_PRO.md), and the idea would be to write a plugin that gets all the headings, then injects the TOC (I'd prefer using a comment like <!-- TOC -->). See markedjs/marked#545 for inspiration.

Another solution would be to do this with a custom JS script that runs in the browser, queries all the headings, then injects them into div.toc. That's what I actually did in #9 and it worked pretty well, but doing it in marked is more flexible imo.

Anyway, do you feel like you're proficient enough with the matter to make a PR for this feature? I'd be able to guide you along the way.

@simonhaenisch simonhaenisch changed the title Generate table summary Generate TOC (table of contents) Jun 17, 2020
@scandinave
Copy link
Author

scandinave commented Jun 17, 2020

Thx for your feedback. I will try to find a way. If I find time to do it, i will make a PR for sure.

@pl7y
Copy link

pl7y commented Apr 19, 2021

It's possible to write plugins for marked (see https://marked.js.org/#/USING_PRO.md), and the idea would be to write a plugin that gets all the headings, then injects the TOC (I'd prefer using a comment like <!-- TOC -->). See markedjs/marked#545 for inspiration.

Hi, I'm interested in writing such extension, but I still miss something: to know the TOC you need to parse the whole file, then you can generate the TOC HTML. So you would need to inject the TOC HTML during a marked run (e.g. modifying a token), or right after it patching the HTML.

In md-to-pdf, how can you inject the TOC as you said?

Thank you very much in advance!

@simonhaenisch
Copy link
Owner

If you just want to hack something together, you can have a look at the WIP pull request linked above your comment. You can register a custom heading renderer in Marked to generate a TOC object.

The easier solution IMO is to inject a script into the page that runs once the page is loaded, and queries all headings (sth like document.querySelectorAll('h1, h2, h3, h4, h5, h6'), or however deep you want it). Then place a div.toc somewhere in your doc and replace it with the TOC content generated from the headings. I did this once a very long time ago, but never got around to finishing it.

@pl7y
Copy link

pl7y commented Apr 23, 2021

Thank you very much for the reply; what I don't get is where one can make the injection: doing something like inlining an onload event in the HTML? Feeding some parameters to md-to-pdf or to marked?

@simonhaenisch
Copy link
Owner

simonhaenisch commented Apr 23, 2021

You can do that either via the script option or just inline it into your markdown document. Simplified example:

# My Doc

<div class="toc"></div>

## First point

Lorem ipsum...

## Second point

Dolor sit amet...

<script>
  document.querySelector('div.toc').innerHtml = `
    <h2>Table of Contents</h2>
    <ol>
      ${Array.from(document.querySelectorAll('h2')).map(h => `<li>${h.textContent}</li>`).join('')}
    </ol>
  `;
</script>

Spoiler: it's not trivial to get the page numbers, I'm not sure it's actually possible except maybe with some very complicated maths because it's difficult to predict how the content breaks on page breaks. Maybe this could help to emulate the pages with CSS: https://www.w3.org/TR/WD-CSS2-971104/page.html.

@pl7y
Copy link

pl7y commented Apr 25, 2021

Thank you very much, will need to give this a try!

@robertvanhoesel
Copy link

If you want to generate a ToC that respects heading levels, the following will create a markdown formatted TOC for you using the Marked parser and default slugger. It returns a list indented with each heading level, linking to the anchor of each heading in the document.

import marked from "marked";

const parseToc = (md: string) => {
    const toc: { level: number; text: string; slug: string }[] = [];

    const renderer = new marked.Renderer();

    renderer.heading = (text, level, raw, slugger) => {
        const slug = slugger.slug(raw);
        toc.push({ level, text, slug });
        return text;
    };

    marked(md, { renderer });

    return toc
        .map((t) => `${Array(t.level).join("  ")}- [${t.text}](#${t.slug})`)
        .join("\n\n");
};

So something like

# Heading 1
Foo bar

## Subheading 2
Baz bar

### Subheading 3
Baz bar

# An other heading 1

Will be transformed to...

- [Heading 1](#heading-1)
  - [Subheading 2](#subheading-2)
    - [Subheading 3](#subheading-3)
- [An other heading 1](#an-other-heading-1)

To use it with md-to-pdf read the file, parse its ToC and prefix it to the content option.

    const md = fs.readFileSync(file, "UTF-8");

    let toc = parseToc(md);

    const pdf = await mdToPdf(
        {
            content: toc + '\n\n' + md
        },
        // ... etc

image

@designel
Copy link

designel commented Apr 7, 2022

Hello,
I am a beginner,
And I would like to generate a toc on my exported pdf files. this solution seems good in my case
could you please give more details about it and how can be used?

a lot of thanks in advance.

Best Regards

@robertvanhoesel
Copy link

@designel

The parseToc(md) function takes a markdown string, walks over all headings (# title) and then returns a string of markdown that can be used directly as a table of contents. You run it separately or before rendering the PDF.

You can prepend the new piece of markdown to the actual MD you pass to Marked to render the final PDF.

How it works:

The 'renderer' is an config object for marked that takes plugins. We create a plugin that overrides the default header rendering and instead takes all the info about that heading and puts it in an array. Level is the header level (i.e. # Level 1, ## level 2.

  const toc: { level: number; text: string; slug: string }[] = [];
  renderer.heading = (text, level, raw, slugger) => {
        const slug = slugger.slug(raw); // This creates a unique ID which we can use to link to the header
        toc.push({ level, text, slug });
        return text;
    };

Which basically creates something like

const toc = [
{ "level": 1, "text": "Heading 1", id: "#heading-1" },
{ "level": 2, "text": "Subheading 2", id: "#subheading-2" },
{ "level": 3, "text": "Subheading 3", id: "#subheading-3" },
{ "level": 1, "text": "An other heading 1", id: "#an-other-heading-1" },
]

This piece turns the above data into markdown:

return toc
        .map((t) => `${Array(t.level).join("  ")}- [${t.text}](#${t.slug})`)
        .join("\n\n");

creates:

- [Heading 1](#heading-1)
 - [Subheading 2](#subheading-2)
   - [Subheading 3](#subheading-3)
- [An other heading 1](#an-other-heading-1)

@designel
Copy link

designel commented Apr 8, 2022

hello @robertvanhoesel,

Thank you so much for your message,
I just test the parseToc(md) function but still have the following error :
image

thanks in advance,
Best Regards,

@robertvanhoesel
Copy link

@designel I'm not so sure I can help you at this point. It seems the program you are running is not even able to open or read a file. Did you at all succeed in creating a pdf from markdown using the library in this repository?

@designel
Copy link

designel commented Apr 8, 2022

@robertvanhoesel yes, I had generated a Pdf file using the library correctly, I need to complete it by generating a Toc and Cover for my document.

@robertvanhoesel
Copy link

@designel the solution described above only work in the programmatic way where you create a node script, instead of using the CLI. Have you set that up? Can you share the code you are using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants