Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

Open
1 task
domenic opened this issue Sep 18, 2021 · 11 comments · May be fixed by #18208
Open
1 task

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

domenic opened this issue Sep 18, 2021 · 11 comments · May be fixed by #18208
Assignees
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion enhancement This change enhances an existing feature of ESLint repro:yes rule Relates to ESLint's core rules
Projects

Comments

@domenic
Copy link

domenic commented Sep 18, 2021

Environment

Node version: 16.9.1
npm version: 7.21.1
Local ESLint version: 7.32.0
Global ESLint version: (none)
Operating System: Windows 10

What parser are you using?

Default (Espree)

What did you do?

Configuration

Just using eslint:recommended. This can be demoed on eslint.org

/[\u00B7\u0300-\u036F]/u.test("");

What did you expect to happen?

No error, if I am understanding https://eslint.org/docs/rules/no-misleading-character-class correctly. Those docs imply the error will trigger when I include in my source text a multi-code-point character, which I have not done; I've only used Unicode escapes.

What actually happened?

4:3   error  Unexpected combined character in character class  no-misleading-character-class

Participation

  • I am willing to submit a pull request for this issue.

Additional comments

For context, I am trying to reproduce https://www.w3.org/TR/REC-xml/#NT-NameStartChar as a RegExp. The #xB7 | [#x0300-#x036F] part of the grammar is what maps to my [\u00B7\u0300-\u036F] character class.

It's possible that this is a doc issue, as I'm actually doing something dangerous or unusual, and so the error should trigger? But if I am, it's not clear at all from the docs what I'm doing that's bad.

@domenic domenic added bug ESLint is working incorrectly repro:needed labels Sep 18, 2021
@eslint-github-bot eslint-github-bot bot added this to Needs Triage in Triage Sep 18, 2021
@mdjermanovic
Copy link
Member

Hi @domenic, thanks for the issue!

By test cases for this rule, this seems to be intended behavior:

{
code: "var r = /[\\u0041\\u0301]/u",
errors: [{ messageId: "combiningClass" }]
},

I couldn't find any discussions about this in the original issue #10049 and PR #10511, but I think that reporting this character class makes sense - [\u0041\u0301] might have been intended to match with "Á", the user is aware that the targeted combined character needs two code units, but maybe unaware that this doesn't match as a single character, unlike surrogate pairs.

In your /[\u00B7\u0300-\u036F]/u example, concern is that the intent might be to search for "\u00B7\u0300", ... "\u00B7\u036F" instead of any of "\u00B7", "\u0300", ... "\u036F".

The rule does not warn if combining characters are not preceded by other characters in same [], as it's clear that the intent is to match those characters separately:

/[\u00B7\u0300-\u036F]/u.test(""); // error

/[\u0300-\u036F\u00B7]/u.test(""); // ok

/[\u0300-\u036F]|\u00B7/u.test(""); // ok

/\u00B7|[\u0300-\u036F]/u.test(""); // ok

/[\u0300-\u036F]|[\u00B7]/u.test(""); // ok

/[\u00B7]|[\u0300-\u036F]/u.test(""); // ok

Online Demo

In my opinion, this behavior is correct, but we should certainly improve documentation for this rule.

@mdjermanovic mdjermanovic moved this from Needs Triage to Feedback Needed in Triage Sep 19, 2021
@domenic
Copy link
Author

domenic commented Sep 19, 2021

Thanks for the research! Hmm.

It would be nice if there were a version of this rule that did what the documentation currently says, which is prevent actually-misleading cases where the result appears as a signal grapheme. That is, taking an example from the doc: I think /^[❇️]$/u is indeed misleading because it looks like one character. But /^[\u2747\uFE0F]$/u is, in my opinion, not misleading, because it's pretty clear that there are two possible code points, of which only one will be matched. (This is despite the fact that the two regexps are equivalent semantically.)

In other words, it would be good to have a mode for this rule that exempts escape sequences.

@mdjermanovic
Copy link
Member

But /^[\u2747\uFE0F]$/u is, in my opinion, not misleading, because it's pretty clear that there are two possible code points, of which only one will be matched. (This is despite the fact that the two regexps are equivalent semantically.)

In other words, it would be good to have a mode for this rule that exempts escape sequences.

I'm not opposed to adding another mode, but I still think that [\u2747\uFE0F] can be misleading as the author may have thought it was a single code point because it appears as a single character, like [\uD83D\uDC4D] for example.

/ab[c\uD83D\uDC4D]/u.test("ab👍"); // `true`, this works
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0]; // 'ab👍', as expected
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0].length; // 4

/ab[c\u2747\uFE0F]/u.test("ab❇️"); // `true`, this must be working too
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0]; // 'ab❇', looks good
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0].length; // 3! the character class was misleading 

@eslint/eslint-tsc thoughts about this? I think the current behavior makes sense as default behavior, but we can consider adding an option to only warn when the targeted sequences appear literally as source code characters. This seems doable for regex literals, but may be problematic for RegExp("...").

@btmills
Copy link
Member

btmills commented Sep 21, 2021

Thanks both for the thorough explanations so far. I'm not opposed to a mode that would allow explicit code point escapes. Were this an opt-in rule, I'd say such a mode should be disabled by default for maximum safety. This being included in eslint:recommended gives me slight pause because we try hard to avoid any false positives in eslint:recommended, and this could be seen as a false positive. I'm still leaning toward leaving the current behavior as the default because a mode to allow this feels like a power user feature where those who need it will find it, but I wanted to call that out to see what others think.

@nzakas
Copy link
Member

nzakas commented Sep 21, 2021

leaving the current behavior as the default because a mode to allow this feels like a power user feature where those who need it will find it

💯

@domenic
Copy link
Author

domenic commented Sep 21, 2021

Thanks so much all for being willing to consider a mode. My two cents:

  • Personally it's hard to imagine that if someone wrote /[\u00B7\u0300-\u036F]/u they expect anything different than two code points being matched. That is, I cannot personally imagine when that code would be a bug and thus deserving of being warned about by ESLint.

  • However, I also agree that matching code points in this way is a power user feature. So if power users have to do an extra step to get the sensible behavior, that seems totally fine. Especially since at least some folks (per Bug: no-misleading-character-class triggers on seemingly-OK code #15080 (comment)) disagree with my assessment of the potential bugginess.

@nzakas
Copy link
Member

nzakas commented Sep 29, 2021

Marking as accepted because there is agreement on adding a new option for this.

@nzakas nzakas added enhancement This change enhances an existing feature of ESLint repro:yes rule Relates to ESLint's core rules accepted There is consensus among the team that this change meets the criteria for inclusion and removed bug ESLint is working incorrectly repro:needed labels Sep 29, 2021
@nzakas nzakas moved this from Feedback Needed to Ready to Implement in Triage Sep 29, 2021
@fasttime
Copy link
Member

fasttime commented Feb 5, 2024

I can work on this when #18082 is settled.

@nzakas
Copy link
Member

nzakas commented Mar 6, 2024

@fasttime #18082 has been merged. Are you up for looking at this?

@fasttime
Copy link
Member

fasttime commented Mar 6, 2024

@fasttime #18082 has been merged. Are you up for looking at this?

Sure! This should be doable now that we have the tooling to map characters back to source code. I'll be able to focus on this next week.

@nzakas
Copy link
Member

nzakas commented Mar 7, 2024

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted There is consensus among the team that this change meets the criteria for inclusion enhancement This change enhances an existing feature of ESLint repro:yes rule Relates to ESLint's core rules
Projects
Status: Ready to Implement
Triage
Ready to Implement
Development

Successfully merging a pull request may close this issue.

5 participants