-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guidance for converting HTML headings to heading_* Token #813
Comments
Here's what I've got so far. I'm sure there are things that need improvement, would love feedback: import cheerio from 'cheerio';
import MarkdownIt from 'markdown-it';
import Token from 'markdown-it/lib/token';
export default function htmlHeaders(md: MarkdownIt) {
md.core.ruler.after('inline', 'html-headers', (state) => {
state.tokens.forEach((blockToken) => {
if (blockToken.type !== 'html_block') {
return;
}
const $ = cheerio.load(`${blockToken.content}`, { xmlMode: true });
const headings = $('h1,h2,h3,h4,h5,h6');
if (!headings.length) {
return;
}
const { map } = blockToken;
headings.each((_, e) => {
const { tagName } = e;
const level = parseInt(tagName.substring(1), 10);
const markup = ''.padStart(level, '#');
const element = $(e);
const open = new Token('heading_open', tagName, 1);
open.markup = markup;
open.map = map;
Object.entries(e.attribs).forEach(([key, value]) => {
open.attrSet(key, value);
});
const content = new Token('text', '', 0);
content.map = map;
content.content = element.text() || '';
const body = new Token('inline', '', 0);
body.content = content.content;
body.map = map;
body.children = [content];
const close = new Token('heading_close', tagName, -1);
close.markup = markup;
const position = state.tokens.indexOf(blockToken);
state.tokens.splice(position, 0, open, body, close);
element.remove();
});
// eslint-disable-next-line no-param-reassign
blockToken.content = $.html();
});
return false;
});
} |
In general, i would propose to process HTML only after markdown is rendered to HTML. If you wish to process |
Agreed. This is a specialized case in which we need Thanks for the feedback, if you spot anything that may be a concern on another glance, please do let me know. |
I don't see obvious reasons, why your special case should not be used. It seems, you are qualified and understand well what you do. Of cause, if you enable html, it worth to use sanitizer to restrict allowed tokens & attrs. But that's another story, not specific to your question. General approach can be scraped from npm's wrapper. They tweak markdown-it to behave very close to github (to render README files on npm.com) |
Haha I had no idea that NPM had done something similar. Thanks for the tip! It looks like they went a similar, but different path with that https://github.com/npm/marky-markdown/blob/master/lib/plugin/html-heading.js |
See also #28. Probably, there are security notes you should know about (and why such popular feature is not yet landed here). github (and |
The author of
markdown-it-anchor
and I have been discussing how to handle headings that are in a markdown file as HTML. The Vue README has a few examples of these https://github.com/vuejs/vue/blob/dev/README.md. It's desirable to handle a<h2>Hello</h2>
in HTML as we would## Hello
, and tokenize those headings so they can be processed by other plugins.As I'm still learning the methodologies and best practices of markdown-it, I was hoping you might be able to provide guidance on the best method for processing the html and inserting tokens appropriately. I have a working proof of concept which splits
html_block
tokens using cheerio, and manually splicing in new Tokens, but a lot of that is manual lifting and I'd like to get your take on this before I march ahead with that. TIAThe text was updated successfully, but these errors were encountered: