New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: no-misleading-character-class triggers on seemingly-OK code #15080
Comments
Hi @domenic, thanks for the issue! By test cases for this rule, this seems to be intended behavior: eslint/tests/lib/rules/no-misleading-character-class.js Lines 97 to 100 in 47be800
I couldn't find any discussions about this in the original issue #10049 and PR #10511, but I think that reporting this character class makes sense - In your The rule does not warn if combining characters are not preceded by other characters in same /[\u00B7\u0300-\u036F]/u.test(""); // error
/[\u0300-\u036F\u00B7]/u.test(""); // ok
/[\u0300-\u036F]|\u00B7/u.test(""); // ok
/\u00B7|[\u0300-\u036F]/u.test(""); // ok
/[\u0300-\u036F]|[\u00B7]/u.test(""); // ok
/[\u00B7]|[\u0300-\u036F]/u.test(""); // ok In my opinion, this behavior is correct, but we should certainly improve documentation for this rule. |
Thanks for the research! Hmm. It would be nice if there were a version of this rule that did what the documentation currently says, which is prevent actually-misleading cases where the result appears as a signal grapheme. That is, taking an example from the doc: I think In other words, it would be good to have a mode for this rule that exempts escape sequences. |
I'm not opposed to adding another mode, but I still think that /ab[c\uD83D\uDC4D]/u.test("ab👍"); // `true`, this works
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0]; // 'ab👍', as expected
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0].length; // 4
/ab[c\u2747\uFE0F]/u.test("ab❇️"); // `true`, this must be working too
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0]; // 'ab❇', looks good
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0].length; // 3! the character class was misleading @eslint/eslint-tsc thoughts about this? I think the current behavior makes sense as default behavior, but we can consider adding an option to only warn when the targeted sequences appear literally as source code characters. This seems doable for regex literals, but may be problematic for |
Thanks both for the thorough explanations so far. I'm not opposed to a mode that would allow explicit code point escapes. Were this an opt-in rule, I'd say such a mode should be disabled by default for maximum safety. This being included in |
💯 |
Thanks so much all for being willing to consider a mode. My two cents:
|
Marking as accepted because there is agreement on adding a new option for this. |
I can work on this when #18082 is settled. |
Great, thanks! |
Environment
Node version: 16.9.1
npm version: 7.21.1
Local ESLint version: 7.32.0
Global ESLint version: (none)
Operating System: Windows 10
What parser are you using?
Default (Espree)
What did you do?
Configuration
Just using eslint:recommended. This can be demoed on eslint.org
What did you expect to happen?
No error, if I am understanding https://eslint.org/docs/rules/no-misleading-character-class correctly. Those docs imply the error will trigger when I include in my source text a multi-code-point character, which I have not done; I've only used Unicode escapes.
What actually happened?
Participation
Additional comments
For context, I am trying to reproduce https://www.w3.org/TR/REC-xml/#NT-NameStartChar as a RegExp. The
#xB7 | [#x0300-#x036F]
part of the grammar is what maps to my[\u00B7\u0300-\u036F]
character class.It's possible that this is a doc issue, as I'm actually doing something dangerous or unusual, and so the error should trigger? But if I am, it's not clear at all from the docs what I'm doing that's bad.
The text was updated successfully, but these errors were encountered: