Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrectly parsing links with nested [] #120

Merged
merged 1 commit into from
Oct 29, 2023

Conversation

notriddle
Copy link
Contributor

@notriddle notriddle commented Oct 29, 2023

Fixes #119

This markdown input:

[[]](https://haskell.org)

[[][]](https://haskell.org)

[[[][]](https://haskell.org)

[[][][]](https://haskell.org)

In commonmark-hs, it used to make this HTML:

<p><a href="https://haskell.org">[]</a></p>
<p>[<a href="https://haskell.org">][]</a></p>
<p>[[<a href="https://haskell.org">][]</a></p>
<p>[[]<a href="https://haskell.org">][]</a></p>

In commonmark.js, it makes this instead:

<p><a href="https://haskell.org">[]</a></p>
<p><a href="https://haskell.org">[][]</a></p>
<p>[<a href="https://haskell.org">[][]</a></p>
<p><a href="https://haskell.org">[][][]</a></p>

The commonmark.js output seems to be correct according to the specification:

Brackets are allowed in the link text
only if (a) they are backslash-escaped or (b) they appear as a
matched pair of brackets, with an open bracket [, a sequence of
zero or more inlines, and a close bracket ].

This markdown input:

```markdown
[[]](https://haskell.org)

[[][]](https://haskell.org)

[[[][]](https://haskell.org)

[[][][]](https://haskell.org)
```

In commonmark-hs, [it used to make this HTML](https://pandoc.org/try/?params=%7B%22text%22%3A%22%5B%5B%5D%5D%28https%3A%2F%2Fhaskell.org%29%5Cn%5Cn%5B%5B%5D%5B%5D%5D%28https%3A%2F%2Fhaskell.org%29%5Cn%5Cn%5B%5B%5B%5D%5B%5D%5D%28https%3A%2F%2Fhaskell.org%29%5Cn%5Cn%5B%5B%5D%5B%5D%5B%5D%5D%28https%3A%2F%2Fhaskell.org%29%22%2C%22to%22%3A%22html5%22%2C%22from%22%3A%22commonmark_x%22%2C%22standalone%22%3Afalse%2C%22embed-resources%22%3Afalse%2C%22table-of-contents%22%3Afalse%2C%22number-sections%22%3Afalse%2C%22citeproc%22%3Afalse%2C%22html-math-method%22%3A%22plain%22%2C%22wrap%22%3A%22auto%22%2C%22highlight-style%22%3Anull%2C%22files%22%3A%7B%7D%2C%22template%22%3Anull%7D):

```html
<p><a href="https://haskell.org">[]</a></p>
<p>[<a href="https://haskell.org">][]</a></p>
<p>[[<a href="https://haskell.org">][]</a></p>
<p>[[]<a href="https://haskell.org">][]</a></p>
```

In commonmark.js, [it makes this instead](https://spec.commonmark.org/dingus/?text=%5B%5B%5D%5D(https%3A%2F%2Fhaskell.org)%0A%0A%5B%5B%5D%5B%5D%5D(https%3A%2F%2Fhaskell.org)%0A%0A%5B%5B%5B%5D%5B%5D%5D(https%3A%2F%2Fhaskell.org)%0A%0A%5B%5B%5D%5B%5D%5B%5D%5D(https%3A%2F%2Fhaskell.org)):

```html
<p><a href="https://haskell.org">[]</a></p>
<p><a href="https://haskell.org">[][]</a></p>
<p>[<a href="https://haskell.org">[][]</a></p>
<p><a href="https://haskell.org">[][][]</a></p>
```

The commonmark.js output seems to be correct according to the specification:

> Brackets are allowed in the [link text](https://spec.commonmark.org/0.30/#link-text)
> only if (a) they are backslash-escaped or (b) they appear as a
> matched pair of brackets, with an open bracket `[`, a sequence of
> zero or more inlines, and a close bracket `]`.
@notriddle notriddle force-pushed the notriddle/balanced-links branch from fe1ed0d to 87dae7b Compare October 29, 2023 01:48
@notriddle
Copy link
Contributor Author

cabal bench output

before
Running 1 benchmarks...
Benchmark benchmark-commonmark: RUNNING...
All
  tokenize
    tokenize sample.md: OK
      3.84 ms ± 238 μs
  parse sample.md
    commonmark default: OK
      48.4 ms ± 4.3 ms
  pathological
    nested strong emph
      commonmark
        1000:           OK
          796  μs ±  52 μs
        2000:           OK
          1.96 ms ± 131 μs
        3000:           OK
          3.11 ms ± 242 μs
        4000:           OK
          4.30 ms ± 312 μs
    many emph closers with no openers
      commonmark
        1000:           OK
          799  μs ±  51 μs
        2000:           OK
          2.02 ms ± 180 μs
        3000:           OK
          4.24 ms ± 407 μs
        4000:           OK
          5.92 ms ± 515 μs
    many emph openers with no closers
      commonmark
        1000:           OK
          791  μs ±  51 μs
        2000:           OK
          1.97 ms ± 159 μs
        3000:           OK
          4.14 ms ± 277 μs
        4000:           OK
          5.82 ms ± 567 μs
    many link closers with no openers
      commonmark
        1000:           OK
          950  μs ±  51 μs
        2000:           OK
          2.40 ms ± 173 μs
        3000:           OK
          4.63 ms ± 148 μs
        4000:           OK
          6.42 ms ± 223 μs
    many link openers with no closers
      commonmark
        1000:           OK
          972  μs ±  92 μs
        2000:           OK
          2.46 ms ± 134 μs
        3000:           OK
          4.61 ms ± 256 μs
        4000:           OK
          6.31 ms ± 171 μs
    mismatched openers and closers
      commonmark
        1000:           OK
          10.2 ms ± 944 μs
        2000:           OK
          39.4 ms ± 2.4 ms
        3000:           OK
          86.3 ms ± 8.1 ms
        4000:           OK
          149  ms ± 5.0 ms
    openers and closers multiple of 3
      commonmark
        1000:           OK
          2.73 ms ± 177 μs
        2000:           OK
          9.16 ms ± 405 μs
        3000:           OK
          20.0 ms ± 1.0 ms
        4000:           OK
          35.0 ms ± 1.9 ms
    link openers and emph closers
      commonmark
        1000:           OK
          905  μs ±  52 μs
        2000:           OK
          2.24 ms ± 106 μs
        3000:           OK
          4.10 ms ± 408 μs
        4000:           OK
          6.09 ms ± 418 μs
    nested brackets
      commonmark
        1000:           OK
          1.26 ms ±  99 μs
        2000:           OK
          2.99 ms ± 225 μs
        3000:           OK
          5.54 ms ± 509 μs
        4000:           OK
          7.79 ms ± 381 μs
    inline link openers without closers
      commonmark
        1000:           OK
          1.90 ms ± 172 μs
        2000:           OK
          4.01 ms ± 346 μs
        3000:           OK
          7.30 ms ± 346 μs
        4000:           OK
          10.0 ms ± 712 μs
    repeated pattern '[ (]('
      commonmark
        1000:           OK
          1.19 ms ±  90 μs
        2000:           OK
          2.74 ms ± 226 μs
        3000:           OK
          4.93 ms ± 406 μs
        4000:           OK
          6.78 ms ± 414 μs
    nested block quotes
      commonmark
        1000:           OK
          549  μs ±  27 μs
        2000:           OK
          1.38 ms ±  89 μs
        3000:           OK
          2.16 ms ± 192 μs
        4000:           OK
          2.96 ms ± 168 μs
    nested list
      commonmark
        1000:           OK
          469  μs ±  45 μs
        2000:           OK
          672  μs ±  50 μs
        3000:           OK
          915  μs ±  51 μs
        4000:           OK
          1.14 ms ±  86 μs
    nested list 2
      commonmark
        1000:           OK
          1.97 ms ± 106 μs
        2000:           OK
          4.13 ms ± 356 μs
        3000:           OK
          6.26 ms ± 364 μs
        4000:           OK
          9.13 ms ± 691 μs
    backticks
      commonmark
        1000:           OK
          266  μs ±  22 μs
        2000:           OK
          587  μs ±  45 μs
        3000:           OK
          1.03 ms ±  60 μs
        4000:           OK
          1.49 ms ± 105 μs
    CDATA
      commonmark
        1000:           OK
          788  μs ±  43 μs
        2000:           OK
          1.84 ms ± 131 μs
        3000:           OK
          3.00 ms ± 224 μs
        4000:           OK
          4.52 ms ± 298 μs
    <?
      commonmark
        1000:           OK
          1.56 ms ±  99 μs
        2000:           OK
          3.23 ms ± 189 μs
        3000:           OK
          5.61 ms ± 521 μs
        4000:           OK
          8.05 ms ± 771 μs
    <!A 
      commonmark
        1000:           OK
          1.17 ms ±  47 μs
        2000:           OK
          2.51 ms ± 189 μs
        3000:           OK
          4.24 ms ± 381 μs
        4000:           OK
          5.80 ms ± 523 μs

All 74 tests passed (16.91s)
Benchmark benchmark-commonmark: FINISH
after
Running 1 benchmarks...
Benchmark benchmark-commonmark: RUNNING...
All
  tokenize
    tokenize sample.md: OK
      3.89 ms ± 378 μs
  parse sample.md
    commonmark default: OK
      47.6 ms ± 3.7 ms
  pathological
    nested strong emph
      commonmark
        1000:           OK
          811  μs ±  47 μs
        2000:           OK
          2.00 ms ± 192 μs
        3000:           OK
          3.19 ms ± 231 μs
        4000:           OK
          4.38 ms ± 344 μs
    many emph closers with no openers
      commonmark
        1000:           OK
          813  μs ±  50 μs
        2000:           OK
          2.04 ms ± 183 μs
        3000:           OK
          4.27 ms ± 298 μs
        4000:           OK
          6.21 ms ± 427 μs
    many emph openers with no closers
      commonmark
        1000:           OK
          817  μs ±  56 μs
        2000:           OK
          2.00 ms ±  30 μs
        3000:           OK
          4.14 ms ± 379 μs
        4000:           OK
          6.04 ms ± 271 μs
    many link closers with no openers
      commonmark
        1000:           OK
          983  μs ±  94 μs
        2000:           OK
          2.47 ms ± 185 μs
        3000:           OK
          4.76 ms ± 268 μs
        4000:           OK
          7.14 ms ± 690 μs
    many link openers with no closers
      commonmark
        1000:           OK
          1.00 ms ±  50 μs
        2000:           OK
          2.53 ms ± 220 μs
        3000:           OK
          5.12 ms ± 492 μs
        4000:           OK
          6.85 ms ± 610 μs
    mismatched openers and closers
      commonmark
        1000:           OK
          10.3 ms ± 797 μs
        2000:           OK
          38.9 ms ± 1.4 ms
        3000:           OK
          84.9 ms ± 6.2 ms
        4000:           OK
          149  ms ± 4.1 ms
    openers and closers multiple of 3
      commonmark
        1000:           OK
          2.88 ms ± 230 μs
        2000:           OK
          9.65 ms ± 886 μs
        3000:           OK
          20.7 ms ± 901 μs
        4000:           OK
          36.2 ms ± 2.3 ms
    link openers and emph closers
      commonmark
        1000:           OK
          935  μs ±  87 μs
        2000:           OK
          2.34 ms ± 174 μs
        3000:           OK
          4.41 ms ± 176 μs
        4000:           OK
          6.42 ms ± 426 μs
    nested brackets
      commonmark
        1000:           OK
          7.21 ms ± 716 μs
        2000:           OK
          26.3 ms ± 1.4 ms
        3000:           OK
          57.7 ms ± 2.0 ms
        4000:           OK
          101  ms ± 7.6 ms
    inline link openers without closers
      commonmark
        1000:           OK
          1.96 ms ± 184 μs
        2000:           OK
          4.15 ms ± 231 μs
        3000:           OK
          7.35 ms ± 389 μs
        4000:           OK
          10.6 ms ± 994 μs
    repeated pattern '[ (]('
      commonmark
        1000:           OK
          1.23 ms ±  99 μs
        2000:           OK
          2.82 ms ± 177 μs
        3000:           OK
          5.22 ms ± 359 μs
        4000:           OK
          7.01 ms ± 613 μs
    nested block quotes
      commonmark
        1000:           OK
          556  μs ±  50 μs
        2000:           OK
          1.39 ms ±  91 μs
        3000:           OK
          2.17 ms ± 191 μs
        4000:           OK
          2.96 ms ± 224 μs
    nested list
      commonmark
        1000:           OK
          457  μs ±  23 μs
        2000:           OK
          666  μs ±  43 μs
        3000:           OK
          901  μs ±  52 μs
        4000:           OK
          1.14 ms ±  55 μs
    nested list 2
      commonmark
        1000:           OK
          1.96 ms ± 179 μs
        2000:           OK
          4.23 ms ± 125 μs
        3000:           OK
          6.50 ms ± 533 μs
        4000:           OK
          9.30 ms ± 365 μs
    backticks
      commonmark
        1000:           OK
          268  μs ±  23 μs
        2000:           OK
          585  μs ±  46 μs
        3000:           OK
          1.03 ms ±  43 μs
        4000:           OK
          1.52 ms ± 117 μs
    CDATA
      commonmark
        1000:           OK
          804  μs ±  48 μs
        2000:           OK
          1.89 ms ±  91 μs
        3000:           OK
          3.11 ms ± 243 μs
        4000:           OK
          4.46 ms ± 413 μs
    <?
      commonmark
        1000:           OK
          1.61 ms ± 123 μs
        2000:           OK
          3.34 ms ± 271 μs
        3000:           OK
          5.58 ms ± 352 μs
        4000:           OK
          8.18 ms ± 782 μs
    <!A 
      commonmark
        1000:           OK
          1.20 ms ±  52 μs
        2000:           OK
          2.57 ms ± 205 μs
        3000:           OK
          4.28 ms ± 349 μs
        4000:           OK
          5.93 ms ± 418 μs

All 74 tests passed (16.72s)
Benchmark benchmark-commonmark: FINISH

@jgm jgm merged commit 7f2d008 into jgm:master Oct 29, 2023
6 checks passed
@jgm
Copy link
Owner

jgm commented Oct 29, 2023

Excellent, thanks!

@notriddle notriddle deleted the notriddle/balanced-links branch October 29, 2023 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[fuzz result] parser sees links with unbalanced [] inside
2 participants