Bug: no-misleading-character-class triggers on seemingly-OK code #15080

domenic · 2021-09-18T21:58:28Z

Environment

Node version: 16.9.1
npm version: 7.21.1
Local ESLint version: 7.32.0
Global ESLint version: (none)
Operating System: Windows 10

What parser are you using?

Default (Espree)

What did you do?

Configuration

Just using eslint:recommended. This can be demoed on eslint.org

/[\u00B7\u0300-\u036F]/u.test("");

What did you expect to happen?

No error, if I am understanding https://eslint.org/docs/rules/no-misleading-character-class correctly. Those docs imply the error will trigger when I include in my source text a multi-code-point character, which I have not done; I've only used Unicode escapes.

What actually happened?

4:3   error  Unexpected combined character in character class  no-misleading-character-class

Participation

I am willing to submit a pull request for this issue.

Additional comments

For context, I am trying to reproduce https://www.w3.org/TR/REC-xml/#NT-NameStartChar as a RegExp. The #xB7 | [#x0300-#x036F] part of the grammar is what maps to my [\u00B7\u0300-\u036F] character class.

It's possible that this is a doc issue, as I'm actually doing something dangerous or unusual, and so the error should trigger? But if I am, it's not clear at all from the docs what I'm doing that's bad.

The text was updated successfully, but these errors were encountered:

mdjermanovic · 2021-09-19T03:25:40Z

Hi @domenic, thanks for the issue!

By test cases for this rule, this seems to be intended behavior:

eslint/tests/lib/rules/no-misleading-character-class.js

Lines 97 to 100 in 47be800

 { 

 code: "var r = /[\\u0041\\u0301]/u", 

 errors: [{ messageId: "combiningClass" }] 

 },

I couldn't find any discussions about this in the original issue #10049 and PR #10511, but I think that reporting this character class makes sense - [\u0041\u0301] might have been intended to match with "Á", the user is aware that the targeted combined character needs two code units, but maybe unaware that this doesn't match as a single character, unlike surrogate pairs.

In your /[\u00B7\u0300-\u036F]/u example, concern is that the intent might be to search for "\u00B7\u0300", ... "\u00B7\u036F" instead of any of "\u00B7", "\u0300", ... "\u036F".

The rule does not warn if combining characters are not preceded by other characters in same [], as it's clear that the intent is to match those characters separately:

/[\u00B7\u0300-\u036F]/u.test(""); // error

/[\u0300-\u036F\u00B7]/u.test(""); // ok

/[\u0300-\u036F]|\u00B7/u.test(""); // ok

/\u00B7|[\u0300-\u036F]/u.test(""); // ok

/[\u0300-\u036F]|[\u00B7]/u.test(""); // ok

/[\u00B7]|[\u0300-\u036F]/u.test(""); // ok

Online Demo

In my opinion, this behavior is correct, but we should certainly improve documentation for this rule.

domenic · 2021-09-19T20:15:39Z

Thanks for the research! Hmm.

It would be nice if there were a version of this rule that did what the documentation currently says, which is prevent actually-misleading cases where the result appears as a signal grapheme. That is, taking an example from the doc: I think /^[❇️]$/u is indeed misleading because it looks like one character. But /^[\u2747\uFE0F]$/u is, in my opinion, not misleading, because it's pretty clear that there are two possible code points, of which only one will be matched. (This is despite the fact that the two regexps are equivalent semantically.)

In other words, it would be good to have a mode for this rule that exempts escape sequences.

mdjermanovic · 2021-09-19T23:07:34Z

But /^[\u2747\uFE0F]$/u is, in my opinion, not misleading, because it's pretty clear that there are two possible code points, of which only one will be matched. (This is despite the fact that the two regexps are equivalent semantically.)

In other words, it would be good to have a mode for this rule that exempts escape sequences.

I'm not opposed to adding another mode, but I still think that [\u2747\uFE0F] can be misleading as the author may have thought it was a single code point because it appears as a single character, like [\uD83D\uDC4D] for example.

/ab[c\uD83D\uDC4D]/u.test("ab👍"); // `true`, this works
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0]; // 'ab👍', as expected
/ab[c\uD83D\uDC4D]/u.exec("ab👍")[0].length; // 4

/ab[c\u2747\uFE0F]/u.test("ab❇️"); // `true`, this must be working too
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0]; // 'ab❇', looks good
/ab[c\u2747\uFE0F]/u.exec("ab❇️")[0].length; // 3! the character class was misleading

@eslint/eslint-tsc thoughts about this? I think the current behavior makes sense as default behavior, but we can consider adding an option to only warn when the targeted sequences appear literally as source code characters. This seems doable for regex literals, but may be problematic for RegExp("...").

btmills · 2021-09-21T07:52:26Z

Thanks both for the thorough explanations so far. I'm not opposed to a mode that would allow explicit code point escapes. Were this an opt-in rule, I'd say such a mode should be disabled by default for maximum safety. This being included in eslint:recommended gives me slight pause because we try hard to avoid any false positives in eslint:recommended, and this could be seen as a false positive. I'm still leaning toward leaving the current behavior as the default because a mode to allow this feels like a power user feature where those who need it will find it, but I wanted to call that out to see what others think.

nzakas · 2021-09-21T21:14:28Z

leaving the current behavior as the default because a mode to allow this feels like a power user feature where those who need it will find it

💯

domenic · 2021-09-21T21:19:57Z

Thanks so much all for being willing to consider a mode. My two cents:

Personally it's hard to imagine that if someone wrote /[\u00B7\u0300-\u036F]/u they expect anything different than two code points being matched. That is, I cannot personally imagine when that code would be a bug and thus deserving of being warned about by ESLint.
However, I also agree that matching code points in this way is a power user feature. So if power users have to do an extra step to get the sensible behavior, that seems totally fine. Especially since at least some folks (per Bug: no-misleading-character-class triggers on seemingly-OK code #15080 (comment)) disagree with my assessment of the potential bugginess.

nzakas · 2021-09-29T00:50:33Z

Marking as accepted because there is agreement on adding a new option for this.

fasttime · 2024-02-05T15:53:40Z

I can work on this when #18082 is settled.

nzakas · 2024-03-06T19:57:30Z

@fasttime #18082 has been merged. Are you up for looking at this?

fasttime · 2024-03-06T22:24:52Z

@fasttime #18082 has been merged. Are you up for looking at this?

Sure! This should be doable now that we have the tooling to map characters back to source code. I'll be able to focus on this next week.

nzakas · 2024-03-07T20:01:53Z

Great, thanks!

domenic added bug ESLint is working incorrectly repro:needed labels Sep 18, 2021

eslint-github-bot bot added this to Needs Triage in Triage Sep 18, 2021

mdjermanovic moved this from Needs Triage to Feedback Needed in Triage Sep 19, 2021

nzakas added enhancement This change enhances an existing feature of ESLint repro:yes rule Relates to ESLint's core rules accepted There is consensus among the team that this change meets the criteria for inclusion and removed bug ESLint is working incorrectly repro:needed labels Sep 29, 2021

nzakas moved this from Feedback Needed to Ready to Implement in Triage Sep 29, 2021

f1sh1918 mentioned this issue Sep 21, 2022

IGAPP-1079: Ltr content rtl layouts digitalfabrik/integreat-app#762

Merged

nzakas assigned fasttime Mar 6, 2024

fasttime linked a pull request Mar 15, 2024 that will close this issue

feat: add option allowEscape to no-misleading-character-class rule #18208

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

domenic commented Sep 18, 2021 •

edited

mdjermanovic commented Sep 19, 2021

domenic commented Sep 19, 2021 •

edited

mdjermanovic commented Sep 19, 2021

btmills commented Sep 21, 2021

nzakas commented Sep 21, 2021

domenic commented Sep 21, 2021

nzakas commented Sep 29, 2021

fasttime commented Feb 5, 2024

nzakas commented Mar 6, 2024

fasttime commented Mar 6, 2024

nzakas commented Mar 7, 2024

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

Bug: no-misleading-character-class triggers on seemingly-OK code #15080

Comments

domenic commented Sep 18, 2021 • edited

Environment

What parser are you using?

What did you do?

What did you expect to happen?

What actually happened?

Participation

Additional comments

mdjermanovic commented Sep 19, 2021

domenic commented Sep 19, 2021 • edited

mdjermanovic commented Sep 19, 2021

btmills commented Sep 21, 2021

nzakas commented Sep 21, 2021

domenic commented Sep 21, 2021

nzakas commented Sep 29, 2021

fasttime commented Feb 5, 2024

nzakas commented Mar 6, 2024

fasttime commented Mar 6, 2024

nzakas commented Mar 7, 2024

domenic commented Sep 18, 2021 •

edited

domenic commented Sep 19, 2021 •

edited