The lexer appears to handle new lines incorrectly #2340

phillipb · 2021-12-30T20:33:57Z

Marked version:
4.0.6

Describe the bug
When using the lexer, it ignores new lines under some circumstances.

To Reproduce

console.log(marked.lexer("\nT\nh\n---"))

returns

[
    {
        "type": "paragraph",
        "raw": "T\nh",
        "text": "T\nh",
        "tokens": [
            {
                "type": "text",
                "raw": "T\nh",
                "text": "T\nh"
            }
        ]
    },
    {
        "type": "hr",
        "raw": "---"
    }
]

As you can see, there's a missing spacer for the new line before the hr type.

Here's a demo: https://marked.js.org/demo/?outputType=lexer&text=t%0Ah%0A----&options=%7B%0A%20%22baseUrl%22%3A%20null%2C%0A%20%22breaks%22%3A%20false%2C%0A%20%22extensions%22%3A%20null%2C%0A%20%22gfm%22%3A%20true%2C%0A%20%22headerIds%22%3A%20true%2C%0A%20%22headerPrefix%22%3A%20%22%22%2C%0A%20%22highlight%22%3A%20null%2C%0A%20%22langPrefix%22%3A%20%22language-%22%2C%0A%20%22mangle%22%3A%20true%2C%0A%20%22pedantic%22%3A%20false%2C%0A%20%22sanitize%22%3A%20false%2C%0A%20%22sanitizer%22%3A%20null%2C%0A%20%22silent%22%3A%20false%2C%0A%20%22smartLists%22%3A%20false%2C%0A%20%22smartypants%22%3A%20false%2C%0A%20%22tokenizer%22%3A%20null%2C%0A%20%22walkTokens%22%3A%20null%2C%0A%20%22xhtml%22%3A%20false%0A%7D&version=master

Expected behavior

[
    {
        "type": "paragraph",
        "raw": "T\nh",
        "text": "T\nh",
        "tokens": [
            {
                "type": "text",
                "raw": "T\nh",
                "text": "T\nh"
            }
        ]
    },
    {
        "type": "spacer",
        "raw": "\n"
    },
    {
        "type": "hr",
        "raw": "---"
    }
]

The issue seems to be here:

marked/src/Tokenizer.js

Line 76 in 9396896

if (cap[0].length > 1) {

.

Is there a reason we ignore spaces that are one character?

The text was updated successfully, but these errors were encountered:

UziTech · 2021-12-30T20:47:14Z

Duplicate of #2134 (comment)

If you would like to work on removing that and getting all the tests to pass it would be much appreciated. 😁 👍

github-actions · 2022-01-06T15:34:44Z

🎉 This issue has been resolved in version 4.0.9 🎉

The release is available on:

Your semantic-release bot 📦🚀

phillipb mentioned this issue Dec 31, 2021

Fix lexer and tokenizer to retain line breaks properly #2341

Merged

3 tasks

UziTech closed this as completed in #2341 Jan 6, 2022

github-actions bot added the released label Jan 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The lexer appears to handle new lines incorrectly #2340

The lexer appears to handle new lines incorrectly #2340

phillipb commented Dec 30, 2021

UziTech commented Dec 30, 2021

github-actions bot commented Jan 6, 2022

The lexer appears to handle new lines incorrectly #2340

The lexer appears to handle new lines incorrectly #2340

Comments

phillipb commented Dec 30, 2021

UziTech commented Dec 30, 2021

github-actions bot commented Jan 6, 2022