greedy lexer definition #57

mmoskal · 2024-11-19T19:00:37Z

For greedy lexer /aaa/, /a+b+z/, /b+x/ we will not be able to sample string aaabbx even though it would be in some sense "valid".

We start with all three lexemes as possible, after first a the /b+x/ is impossible, and after the first b we also reject /aaa/, and the remaining doesn't match /a+b+z/.

We require the lexer to be able to determine lexeme boundaries based on a single byte lookahead.

Now, it seems it wouldn't be a problem for a typical programming language lexer with /if/, /while/, etc and an identifier rule /[a-z]+/, so it's probably OK.

Also, for a lexeme like /"[^"]+"/ it doesn't matter if it's greedy or lazy, since it's "self-limiting"; this probably applies to lots of lexers.

The issue is to document all this, and possibly (unlikely) change the behaviour.

Thank you @v-jkegler for finding the issue

cc @hudson-ai

The text was updated successfully, but these errors were encountered:

mmoskal · 2024-11-21T17:28:57Z

@v-jkegler says that in Marpa parser he keeps the most recent lexeme match; thus when we get to x which would normally fail, we can just fall back to the cached match, scan it, and then re-run the lexer on bbx; if there was no /b+x/ lexeme, this would be an error (correctly placed at x, since if x was z instead, we would be OK) and otherwise we're good.

v-jkegler · 2024-11-21T17:39:49Z

And there won't be an error, since you use the Ruby Slippers to constrain your generated input to valid inputs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

greedy lexer definition #57

greedy lexer definition #57

mmoskal commented Nov 19, 2024

mmoskal commented Nov 21, 2024

v-jkegler commented Nov 21, 2024

greedy lexer definition #57

greedy lexer definition #57

Comments

mmoskal commented Nov 19, 2024

mmoskal commented Nov 21, 2024

v-jkegler commented Nov 21, 2024