-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial parser. #10
base: main
Are you sure you want to change the base?
Conversation
Initial parser for sqlair DSL. Top-down parser with precedence for disambiguation. Address most suggestions by @benhoyt from [this PR](canonical/sqlair#8) * Use `beginChildren` and `endChildren`. * Remove PERIOD and COMMA from input/output/groupedcolumns expressions. * Rename confusing variable names (`os`, `l`, etc). * Avoid `self` for receiver names. * Use `strings.Builder` for efficient printing. * Add function to pretty print the AST (needs `go test -v`). * Rework comments. * Properly format files. Note: * Not using `PassthroughExpression` yet. `IdentityExpression` seems to fit the same role. * Input and Output expressions might be collapsed into one. The structure seems identical except for the initial marker.
parse/parser_test.go
Outdated
assert.Equal(t, expected, r) | ||
} | ||
|
||
func TestParserSimpleInputSource(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I'm a bit worried about here is whether we've covered all the cases, or got edge cases or termination conditions incorrect, and so on. I think we probably need a more thorough approach to testing, for example: run a large swath of valid and invalid SQL through it. Not sure where you'd get such a body: we could create our own and build on it, or maybe pull in a list from another library's tests, that sort of thing.
We could also use Go's coverage to ensure we've covered all the cases, and fuzzing to ensure we don't have unintended infinite loops or panics. Go has very good tooling for both of these, so they're easy to set up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, ideally we would have a big corpus of SQL queries with a comprehensive mix of input sources, output targets, etc.
Thanks for the updates. Looks like a good start! |
Initial parser for sqlair DSL. Top-down parser with precedence for disambiguation. Address most suggestions by @benhoyt from [this PR](canonical/sqlair#8) * Use `beginChildren` and `endChildren`. * Remove PERIOD and COMMA from input/output/groupedcolumns expressions. * Rename confusing variable names (`os`, `l`, etc). * Avoid `self` for receiver names. * Use `strings.Builder` for efficient printing. * Add function to pretty print the AST (needs `go test -v`). * Rework comments. * Properly format files. * Use pre-check for list separators. * s/infixfn/infixFn and s/prefixfn/prefixFn * Make {In/Pre}fixFunc private * Make NextToken and Precedence private. * Make Lexer private. * Add more tests. * Check for (,,,,) * Check for () * Check for (a, b... * Test for empty statement. * Test for missing PERIOD in output target. * Add makeToken helper to make tests more readable. Note: * Not using `PassthroughExpression` yet. `IdentityExpression` seems to fit the same role. * Input and Output expressions might be collapsed into one. The structure seems identical except for the initial marker.
Unbreak parser after changes in pull request canonical#10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean about PassThroughExpression
being unused. Trying to use it makes the parsing a little more complicated with this operator-precedence approach, so we can go with what we've got here for now.
I'll do a series of follow-ups to this with some sundry things.
Nice job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't merge this yet. Let's please do a review pass on it together, Fernando. There are a few details that can be improved, and that will help settling a standard for future work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are a few early points for us to brainstorm a bit on.
) | ||
|
||
const ( | ||
LOWEST = iota |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per standard Go conventions, Go follows a camel-case style, and we typically see some indication of what the constant is about in its name.
// This parser implements a top-down parsing strategy with operator | ||
// precedence (Pratt's parser) | ||
// https://en.wikipedia.org/wiki/Operator-precedence_parser#Pratt_parsing | ||
func NewParser(l *lexer) *Parser { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was the Lexer made private? This particular line shows a design problem in doing so: how can we have a public function that takes a private type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my suggestion that we reduce the number of exported things (and the lexer is really an implementation detail). However, @niemeyer is right that we can't have an exported function taking a non-exported type. I think we should just have the parser take a []byte
or string
directly, and it can instantiate the lexer (implementation detail). This would also make usage (including tests) simpler.
Then again, now that these packages are internal
it's much less concerning to me to have a larger exported surface area.
// Fill prefix functions according to token type. | ||
// Token types that has prefixFunc do not care about the | ||
// left part of the expression. | ||
p.prefixFn = map[TokenType]prefixFunc{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefixFunc or prefixFn?
lex: l, | ||
} | ||
|
||
// Fill prefix functions according to token type. | ||
// Token types that has prefixFunc do not care about the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to have some more clarity about what this means here, perhaps with some examples. Also, why are these maps instantiated as part of the parser for every instance? Do we plan to change them per-parser, or are they dynamic for some other reason?
Also, "has" => "have", but perhaps irrelevant if docs will change.
// Fill infix functions according to token type. | ||
// Tokens that has infixFunc care about the left part | ||
// of the expression. | ||
p.infixFn = map[TokenType]infixFunc{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same notes as above.
return nil | ||
} | ||
p.nextToken() | ||
field := p.parseExpression(p.precedence(p.currentToken)).(*IdentityExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar concern as earlier above.
func (p *Parser) parseInputSource() Expression { | ||
marker := p.currentToken | ||
p.nextToken() | ||
name := p.parseExpression(p.precedence(p.currentToken)).(*IdentityExpression) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as earlier here and below. I won't repeat the comment, but given how widespread this is it might be worth talking about it to see if there's an easy pattern we can adopt across the code.
|
||
func (p *Parser) parseGroup() Expression { | ||
// Skip left parenthesis | ||
p.nextToken() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar issue here as earlier. There's apparently a pattern established about functions trusting external call sites to define their own internal state. This seems problematic, because it means for every function there's an implied and unspecified rule about what state the world is in so that we can call it, otherwise the parser will panic. This kind of approach does not bode well to future work on this code base, when someone else will be writing code and not know what all the expectations were. Let's please talk about this as well.
if p.currentToken.Type == RPAREN { | ||
// Empty group. | ||
p.errors = append(p.errors, fmt.Sprintf( | ||
"Line: %d, column: %d: unexpected ')', expecting LITERAL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as earlier note on the error message. There's more below, but I'll stop repeating it. Let's please just do a full pass on the error messages.
// in this statement. | ||
for _, c := range r.Expressions() { | ||
t.Log(reflect.TypeOf(c), reflect.TypeOf(&IdentityExpression{})) | ||
assert.Equal(t, reflect.TypeOf(c), reflect.TypeOf(&IdentityExpression{})) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to organize these tests as a table so that it's much easier to see what's happening, and also much easier to repeat similar cases, and then improve the number of cases tested. Again, let's please chat about it and I'll present some analogous code as inspiration.
This adds a complete parser that parses any statement as only BypassParts.
Initial parser for sqlair DSL. Top-down parser with precedence for
disambiguation.
Address most suggestions by @benhoyt from this PR
beginChildren
andendChildren
.os
,l
, etc).self
for receiver names.strings.Builder
for efficient printing.go test -v
).Note:
PassthroughExpression
yet.IdentityExpression
seems tofit the same role.
structure seems identical except for the initial marker.