Skip to content

Commit

Permalink
Tokenize combining marks as WordChars not Symbol.
Browse files Browse the repository at this point in the history
Closes #114.
  • Loading branch information
jgm committed Oct 20, 2023
1 parent 68374fd commit cf945af
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 3 deletions.
8 changes: 5 additions & 3 deletions commonmark/src/Commonmark/Tokens.hs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ module Commonmark.Tokens
, untokenize
) where

import Unicode.Char (isAlphaNum)
import Unicode.Char (isAlphaNum, isMark)
import Unicode.Char.General.Compat (isSpace)
import Data.Text (Text)
import qualified Data.Text as T
Expand Down Expand Up @@ -42,7 +42,9 @@ tokenize name =
-- everything else gets in a token by itself.
f '\r' '\n' = True
f ' ' ' ' = True
f x y = isAlphaNum x && isAlphaNum y
f x y = isWordChar x && isWordChar y

isWordChar c = isAlphaNum c || isMark c

go !_pos [] = []
go !pos (!t:ts) = -- note that t:ts are guaranteed to be nonempty
Expand All @@ -57,7 +59,7 @@ tokenize name =
'\n' -> Tok LineEnd pos t :
go (incSourceLine (setSourceColumn pos 1) 1) ts
thead
| isAlphaNum thead ->
| isWordChar thead ->
Tok WordChars pos t :
go (incSourceColumn pos (T.length t)) ts
| isSpace thead ->
Expand Down
7 changes: 7 additions & 0 deletions commonmark/test/regression.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,3 +203,10 @@ Issue #67.
<p><a href="http://www.example.com/">test </a>
<a href="http://www.example.com/"> test</a></p>
````````````````````````````````

Issue #114.
```````````````````````````````` example
*.*̀.
.
<p>*.*̀.</p>
````````````````````````````````
Expand Down

0 comments on commit cf945af

Please sign in to comment.