Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State of discarding tokenizers is sometimes not saved #628

Open
rantvm opened this issue Nov 23, 2022 · 0 comments
Open

State of discarding tokenizers is sometimes not saved #628

rantvm opened this issue Nov 23, 2022 · 0 comments

Comments

@rantvm
Copy link

rantvm commented Nov 23, 2022

I have observed that the parser sometimes ignores the state of tokenizers that silently discard some tokens. In particular, the state is ignored if the first input chunk(s) only consist of discarded tokens. This results in the position information of the tokens becoming desynchronized from the input. Below is an example of a tokenizer that exhibits this behaviour.

const discard = { "whitespace": true, "comment": true };
function next() {
    let token;
    do {
         token = /* next token from the buffer */;
    } while (token && discard[token.type]);
   return token;
}

The cause appears to be the below if-statement in combination with the defined behavior of lexer.reset(chunk, info).

nearley/lib/nearley.js

Lines 356 to 358 in 6e24450

if (column) {
this.lexerState = lexer.save()
}

This statement seems to assume that if there has been no tokens so far, there is no tokenizer state. Simply always executing this.lexerState = lexer.save() resolves the issue. There may be circumstances where the current behaviour is required (which I am unaware of), so it may be prudent to define a parser option that causes the tokenizer state to always be stored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant