-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Treat UTF-16 surrogate pairs as single characters for string min/maxLength #88
base: main
Are you sure you want to change the base?
The head ref may contain hidden characters: "\u{1F4A9}"
Conversation
} | ||
|
||
fn json_simple_string(&mut self) -> NodeRef { | ||
self.lexeme(&format!("\"{}*\"", CHAR_REGEX)) | ||
self.lexeme("(?s:.*)", true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this change is orthogonal to the underlying PR -- @mmoskal if using the CHAR_REGEX
directly is marginally more performant, I can switch it back.
))) | ||
Ok(self.lexeme( | ||
&format!( | ||
"(?s:.{{{},{}}})", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
derivre
seems smart enough to match \uD83D\uDCA9
with .
, so length is counted appropriately
The JSON-quoted derivre strings do not allow It's all the same as far as JSON goes (when you read JSON Added comment here microsoft/derivre@6062cef Some general notes (from you-know-who): UTF-8 in JSON Surrogate Pairs in JSON |
Oh and BTW the test will pass if do |
Makes the following test pass: