You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For greedy lexer /aaa/, /a+b+z/, /b+x/ we will not be able to sample string aaabbx even though it would be in some sense "valid".
We start with all three lexemes as possible, after first a the /b+x/ is impossible, and after the first b we also reject /aaa/, and the remaining doesn't match /a+b+z/.
We require the lexer to be able to determine lexeme boundaries based on a single byte lookahead.
Now, it seems it wouldn't be a problem for a typical programming language lexer with /if/, /while/, etc and an identifier rule /[a-z]+/, so it's probably OK.
Also, for a lexeme like /"[^"]+"/ it doesn't matter if it's greedy or lazy, since it's "self-limiting"; this probably applies to lots of lexers.
The issue is to document all this, and possibly (unlikely) change the behaviour.
@v-jkegler says that in Marpa parser he keeps the most recent lexeme match; thus when we get to x which would normally fail, we can just fall back to the cached match, scan it, and then re-run the lexer on bbx; if there was no /b+x/ lexeme, this would be an error (correctly placed at x, since if x was z instead, we would be OK) and otherwise we're good.
For greedy lexer
/aaa/
,/a+b+z/
,/b+x/
we will not be able to sample stringaaabbx
even though it would be in some sense "valid".We start with all three lexemes as possible, after first
a
the/b+x/
is impossible, and after the firstb
we also reject/aaa/
, and the remaining doesn't match/a+b+z/
.We require the lexer to be able to determine lexeme boundaries based on a single byte lookahead.
Now, it seems it wouldn't be a problem for a typical programming language lexer with
/if/
,/while/
, etc and an identifier rule/[a-z]+/
, so it's probably OK.Also, for a lexeme like
/"[^"]+"/
it doesn't matter if it's greedy or lazy, since it's "self-limiting"; this probably applies to lots of lexers.The issue is to document all this, and possibly (unlikely) change the behaviour.
Thank you @v-jkegler for finding the issue
cc @hudson-ai
The text was updated successfully, but these errors were encountered: