-
-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable wide Unicode support for names #24
Open
viktor-yakubiv
wants to merge
9
commits into
micromark:main
Choose a base branch
from
viktor-yakubiv:unicode-names
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
b34ad7a
Enable wide Unicode support for names
viktor-yakubiv ebd55c3
Add tests for decomposed accent modifiers
viktor-yakubiv e03860c
Replace em dashes with unicode chars
viktor-yakubiv d9b89e5
Add tests for emojis and math symbols
viktor-yakubiv c400dab
Adhere styleguide
viktor-yakubiv f4ec634
Allow punctuation at the end of name
viktor-yakubiv 54f041d
Automatic formatting
viktor-yakubiv 4785fba
Revert punctuation at the end of name
viktor-yakubiv 73eb92a
Automatic format
viktor-yakubiv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,12 +43,9 @@ test('micromark-extension-directive (syntax, text)', async function (t) { | |
} | ||
) | ||
|
||
await t.test( | ||
'should not support a colon not followed by an alpha', | ||
async function () { | ||
assert.equal(micromark(':', options()), '<p>:</p>') | ||
} | ||
) | ||
await t.test('should not support a lonely colon', async function () { | ||
assert.equal(micromark(':', options()), '<p>:</p>') | ||
}) | ||
|
||
await t.test( | ||
'should support a colon followed by an alpha', | ||
|
@@ -57,24 +54,17 @@ test('micromark-extension-directive (syntax, text)', async function (t) { | |
} | ||
) | ||
|
||
await t.test( | ||
'should not support a colon followed by a digit', | ||
async function () { | ||
assert.equal(micromark(':9', options()), '<p>:9</p>') | ||
} | ||
) | ||
await t.test('should support a colon followed by a digit', async function () { | ||
assert.equal(micromark(':9', options()), '<p></p>') | ||
}) | ||
|
||
await t.test( | ||
'should not support a colon followed by a dash', | ||
'should not support a colon followed by a punctuation', | ||
async function () { | ||
assert.equal(micromark(':-', options()), '<p>:-</p>') | ||
} | ||
) | ||
|
||
await t.test( | ||
'should not support a colon followed by an underscore', | ||
async function () { | ||
assert.equal(micromark(':_', options()), '<p>:_</p>') | ||
assert.equal(micromark(':.', options()), '<p>:.</p>') | ||
assert.equal(micromark(':—', options()), '<p>:—</p>') // Em dash | ||
} | ||
) | ||
|
||
|
@@ -86,21 +76,43 @@ test('micromark-extension-directive (syntax, text)', async function (t) { | |
assert.equal(micromark(':a-b', options()), '<p></p>') | ||
}) | ||
|
||
await t.test('should support unicode alphabets in name', async function () { | ||
// Latin, Greek, Cyrillic respectively | ||
assert.equal(micromark(':xγз', options()), '<p></p>') | ||
}) | ||
|
||
await t.test('should support unicode accents inner name', async function () { | ||
// (Decomposed) Combining Acute Accent in Cyrillic | ||
assert.equal(micromark(':за́мок-чи-замо́к', options()), '<p></p>') | ||
}) | ||
|
||
await t.test( | ||
'should *not* support a dash at the end of a name', | ||
'should support unicode accents at the name end', | ||
async function () { | ||
assert.equal(micromark(':a-', options()), '<p>:a-</p>') | ||
// (Decomposed) Combining Circumflex Accent in Latin | ||
assert.equal(micromark(':â', options()), '<p></p>') | ||
} | ||
) | ||
|
||
await t.test('should support an underscore in a name', async function () { | ||
assert.equal(micromark(':a_b', options()), '<p></p>') | ||
await t.test('should support emojis in name', async function () { | ||
assert.equal(micromark(':🌍', options()), '<p></p>') | ||
assert.equal(micromark(':w🌍rld', options()), '<p></p>') | ||
}) | ||
|
||
await t.test('should support math symbols in name', async function () { | ||
assert.equal(micromark(':𝜋∈ℝ', options()), '<p></p>') // Italic | ||
assert.equal(micromark(':𝛑≈3.14', options()), '<p></p>') // Bold | ||
assert.equal(micromark(':𝝅∉ℚ', options()), '<p></p>') // Bold italic | ||
assert.equal(micromark(':𝞹≠3.14', options()), '<p></p>') // Sans bold italic | ||
}) | ||
|
||
await t.test( | ||
'should *not* support an underscore at the end of a name', | ||
'should *not* support punctuation at the end of a name', | ||
async function () { | ||
assert.equal(micromark(':a-', options()), '<p>:a-</p>') | ||
assert.equal(micromark(':a_', options()), '<p>:a_</p>') | ||
assert.equal(micromark(':a.', options()), '<p>:a.</p>') | ||
assert.equal(micromark(':a—', options()), '<p>:a—</p>') // Em dash | ||
} | ||
) | ||
|
||
|
@@ -411,25 +423,62 @@ test('micromark-extension-directive (syntax, leaf)', async function (t) { | |
) | ||
|
||
await t.test( | ||
'should not support two colons followed by a digit', | ||
'should support two colons followed by a digit', | ||
async function () { | ||
assert.equal(micromark('::9', options()), '<p>::9</p>') | ||
assert.equal(micromark('::9', options()), '') | ||
} | ||
) | ||
|
||
await t.test( | ||
'should not support two colons followed by a dash', | ||
'should not support two colons followed by punctuation', | ||
async function () { | ||
assert.equal(micromark('::-', options()), '<p>::-</p>') | ||
assert.equal(micromark('::_', options()), '<p>::_</p>') | ||
assert.equal(micromark('::.', options()), '<p>::.</p>') | ||
assert.equal(micromark('::—', options()), '<p>::—</p>') // Em dash | ||
} | ||
) | ||
|
||
await t.test('should support a digit in a name', async function () { | ||
assert.equal(micromark('::a9', options()), '') | ||
}) | ||
|
||
await t.test('should support a dash in a name', async function () { | ||
await t.test('should support punctuation in a name', async function () { | ||
assert.equal(micromark('::a-b', options()), '') | ||
assert.equal(micromark('::a-b', options()), '') | ||
assert.equal(micromark('::a_b', options()), '') | ||
assert.equal(micromark('::a.b', options()), '') | ||
assert.equal(micromark('::a—b', options()), '') | ||
}) | ||
|
||
await t.test('should support unicode alphabets in name', async function () { | ||
// Latin, Greek, Cyrillic respectively | ||
assert.equal(micromark('::xγз', options()), '') | ||
}) | ||
|
||
await t.test('should support unicode accents inner name', async function () { | ||
// (Decomposed) Combining Acute Accent in Cyrillic | ||
assert.equal(micromark('::за́мок-чи-замо́к', options()), '') | ||
}) | ||
|
||
await t.test( | ||
'should support unicode accents at the name end', | ||
async function () { | ||
// (Decomposed) Combining Circumflex Accent in Latin | ||
assert.equal(micromark('::â', options()), '') | ||
} | ||
) | ||
|
||
await t.test('should support emojis in name', async function () { | ||
assert.equal(micromark('::🌍', options()), '') | ||
assert.equal(micromark('::w🌍rld', options()), '') | ||
}) | ||
|
||
await t.test('should support math symbols in name', async function () { | ||
assert.equal(micromark('::𝜋∈ℝ', options()), '') // Italic | ||
assert.equal(micromark('::𝛑≈3.14', options()), '') // Bold | ||
assert.equal(micromark('::𝝅∉ℚ', options()), '') // Bold italic | ||
assert.equal(micromark('::𝞹≠3.14', options()), '') // Sans bold italic | ||
Comment on lines
+478
to
+481
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
}) | ||
|
||
await t.test( | ||
|
@@ -773,25 +822,61 @@ test('micromark-extension-directive (syntax, container)', async function (t) { | |
) | ||
|
||
await t.test( | ||
'should not support three colons followed by a digit', | ||
'should support three colons followed by a digit', | ||
async function () { | ||
assert.equal(micromark(':::9', options()), '<p>:::9</p>') | ||
assert.equal(micromark(':::9', options()), '') | ||
} | ||
) | ||
|
||
await t.test( | ||
'should not support three colons followed by a dash', | ||
'should not support three colons followed by punctuation', | ||
async function () { | ||
assert.equal(micromark(':::-', options()), '<p>:::-</p>') | ||
assert.equal(micromark(':::_', options()), '<p>:::_</p>') | ||
assert.equal(micromark(':::.', options()), '<p>:::.</p>') | ||
assert.equal(micromark(':::—', options()), '<p>:::—</p>') // Em dash | ||
} | ||
) | ||
|
||
await t.test('should support a digit in a name', async function () { | ||
assert.equal(micromark(':::a9', options()), '') | ||
}) | ||
|
||
await t.test('should support a dash in a name', async function () { | ||
await t.test('should support punctuation in a name', async function () { | ||
assert.equal(micromark(':::a-b', options()), '') | ||
assert.equal(micromark(':::a_b', options()), '') | ||
assert.equal(micromark(':::a.b', options()), '') | ||
assert.equal(micromark(':::a—b', options()), '') // Em dash | ||
}) | ||
|
||
await t.test('should support unicode alphabets in name', async function () { | ||
// Latin, Greek, Cyrillic respectively | ||
assert.equal(micromark(':::xγз', options()), '') | ||
}) | ||
|
||
await t.test('should support unicode accents inner name', async function () { | ||
// (Decomposed) Combining Acute Accent in Cyrillic | ||
assert.equal(micromark(':::за́мок-чи-замо́к', options()), '') | ||
}) | ||
|
||
await t.test( | ||
'should support unicode accents at the name end', | ||
async function () { | ||
// (Decomposed) Combining Circumflex Accent in Latin | ||
assert.equal(micromark(':::â', options()), '') | ||
} | ||
) | ||
|
||
await t.test('should support emojis in name', async function () { | ||
assert.equal(micromark(':::🌍', options()), '') | ||
assert.equal(micromark(':::w🌍rld', options()), '') | ||
}) | ||
|
||
await t.test('should support math symbols in name', async function () { | ||
assert.equal(micromark(':::𝜋∈ℝ', options()), '') // Italic | ||
assert.equal(micromark(':::𝛑≈3.14', options()), '') // Bold | ||
assert.equal(micromark(':::𝝅∉ℚ', options()), '') // Bold italic | ||
assert.equal(micromark(':::𝞹≠3.14', options()), '') // Sans bold italic | ||
}) | ||
|
||
await t.test( | ||
|
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn’t dashes also be edge characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean allowed edge characters, it was forbidden by the spec previously. I kept it but I don't mind changing.
Currently, the name cannot either start or end with any punctuation or underscore.
Is this something you suggest to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To refresh my memory: So, in
name
, this last stuff is about what is possible to exit after.That behavior at the end is very different from whether the first character is allowed to start a name.
Before, there was a very different check compared to the check in
start
:-
and_
were allowed in names but not at the end.Now they’re the same. I’m not sure if that’s useful? Perhaps the last line should just be
return ok(code)
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried
return ok(code)
at the end. Allowing the name to end with an underscore interferes with emphasis notation. I reverted the commit.I think, there is no point to allow punctuation in the end but don't allow at the start. If we are going to allow punctuation, it should be (almost) equal.
Possible options:
=
,~
(special in some flavours),_
,*
, parentheses and perhaps some more — basically anything in the ASCII (in comparison to option 3, blacklist under 128).