-
Notifications
You must be signed in to change notification settings - Fork 206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix parsing of 'with' inside streaming operator #693 #697
Conversation
The stream expression has an expression followed by an optional 'with' '[' array_range_expression ']' (LRM 11.4.14) There are a few other cases where an expression can be trailed with 'with', * with_constraint_block: there we expect an identifier list in parenthesis after 'with' * similar with array methods with predicate with '(' ...) Since the parser with one look-ahead can't see beyond the 'with', it runs into a shift/reduce conflict as it does not know if '(' or '[' is to follow. Disambiguate that in the lexer by creating a separate token TK_with__followed_by_bracket which is a TK_with, where the lexer already looked ahead and knows that it will be followed by '['. After seeing the '[', everything but the "with" is put back into the stream to be processed separately. TODO: [ ] Extend what can be between 'with' and '[' to include arbitrary whitespace (including: comments) [ ] Make formatter work; currently {<< {unpacked_arr with [i*4 +: 4]}}; is formatted {<<{unpacked_arrwith[i*4+:4]}}; Signed-off-by: Henner Zeller <[email protected]>
Implemented the suggestion to Next step is to see if it is easy to implement in flex to have everything between Also the formatter needs to be able to deal with this token sequence it seems ... right now it removes all whitespace. |
Signed-off-by: Henner Zeller <[email protected]>
Upcoming: Fix formatting. Right now, for module test;
initial begin
logic unpacked_arr [16] = '{0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
logic [3:0] nibble_arr[4];
for (int i = 0; i < 4; i++) begin
nibble_arr[i] = {<< {unpacked_arr with [i*4 +: 4]}};
end
end
endmodule ... the token partition tree looks like:
With that, the formatting is not adding any spaces in the relevant line; resulting in the following faulty formatting: module test;
initial begin
logic unpacked_arr[16] = '{0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
logic [3:0] nibble_arr[4];
for (int i = 0; i < 4; i++) begin
nibble_arr[i] = {<<{unpacked_arrwith[i*4+:4]}};
end
end
endmodule |
"with"[ \t]*"[" { | ||
/* TODO: formulate the ws sequence as any whitespace+comment sequence*/ | ||
yyless(4); | ||
"with"({TraditionalCommentOrSpace}|{EndOfLineComment})*"[" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be easier with an <AFTER_WITH>
state, where with
returns a TK_width
after pushing a new state.
It will be easier to re-use rules inside states than glue them together like this in monolithic regular expressions.
Inside this new state, you handle whitespace, comments, and [
, and for everything else, you can pop the state and rewind, to defer to default <INITIAL>
state handling (or whatever was previously on the state stack).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will also want lexer unit-tests for this. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will need to familiarize myself how states work. They did look somewhat complicated.
The beauty if this is, that I don't have to handle anything special, including '['; I just look ahead, find that this is the next token and TK_with needs to be a new token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(lexer unit test added)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the AFTER_DOT example, which is somewhat involved. I have to play with it to see if it is more advantageous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking over this again, I still stand by this first comment. Generally, the options are:
- track state within the lexer to alter the enumeration of the latter token
- track state inside the lexical context class to alter the enumeration of the earlier token (effectively achieving lookahead) -- this approach is pretty localized, you're defining a tiny state-machine that can run along side other independent state machines.
"{>>8 {foo, bar with \t [ a +: b]}} = nums;\n" | ||
"endfunction", | ||
"function void unpack_id_with(int nums);\n" | ||
"{>>8 {foo, bar with /* some comment */ [ a +: b]}} = nums;\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More test case ideas:
- multiple comments in between
with
and[
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
When the tokens mistakenly fused, did the formatter catch it as data corruption and bail out? (If not, that itself is a bug.) This is controlled by inter-token spacing (token_annotator). There is a general rule that keywords and identifiers required a minimum 1 space separation. You will likely need a classification entry to make the special |
There was a special case for streaming operators that reduced spaces around the various components. However, for 'with', we need to special-special case that to actually keep a space around. Signed-off-by: Henner Zeller <[email protected]>
Signed-off-by: Henner Zeller <[email protected]>
Yes, the formatter was bailing out correctly, finding that it created a syntax error after reparsing. The issue in this particular case in the annotator was, that there was a special case in the annotator that removed spaces around streaming operators components. But we do have to keep them around the 'with' keyword. So that needed to be special-special cased. Anyway, I think this is ready to review. It still uses the lexer without extra state machine though, will have to investigate it that would feel better. |
@tgorochowik I handed the review baton on to you as David focuses on other things currently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest I agree with David - this solution looks pretty hacky and I think it would be best to avoid doing it like this.
However it does solve the issue, so I guess it's okay to merge it and revisit at a later stage. So let me approve this in hope it won't start the "broken window theory" thing.
Sorry, it's been a while since looking at this. I should have prefaced my initial comment with: if it can be encoded using standard techniques for expressing precedence and associativity in LR grammars, do that first. Because |
Ok, will not submit now and revisit when there is a bit more time. |
This branch got lost, so continuing in #1504. |
The stream expression has an expression followed by
an optional 'with' '[' array_range_expression ']'
(LRM 11.4.14)
There are a few other cases where an expression can
be trailed with 'with',
identifier list in parenthesis after 'with'
with '(' ...)
Since the parser with one look-ahead can't see beyond
the 'with', it runs into a shift/reduce conflict
as it does not know if '(' or '[' is to follow.
Disambiguate that in the lexer by creating a
separate token TK_with__followed_by_bracket which is
a TK_with, where the lexer already looked ahead and
knows that it will be followed by '['.
After seeing the '[', everything but the "with" is put
back into the stream to be processed separately.
Changed lexer and parser as well as associated unit tests including formatter.
Fixing #693
Signed-off-by: Henner Zeller [email protected]