Add initial parser. #10

fernape · 2022-07-08T07:34:28Z

Initial parser for sqlair DSL. Top-down parser with precedence for
disambiguation.

Address most suggestions by @benhoyt from this PR

Use beginChildren and endChildren.
Remove PERIOD and COMMA from input/output/groupedcolumns expressions.
Rename confusing variable names (os, l, etc).
Avoid self for receiver names.
Use strings.Builder for efficient printing.
Add function to pretty print the AST (needs go test -v).
Rework comments.
Properly format files.

Note:

Not using PassthroughExpression yet. IdentityExpression seems to
fit the same role.
Input and Output expressions might be collapsed into one. The
structure seems identical except for the initial marker.

@benhoyt

Initial parser for sqlair DSL. Top-down parser with precedence for disambiguation. Address most suggestions by @benhoyt from [this PR](canonical/sqlair#8) * Use `beginChildren` and `endChildren`. * Remove PERIOD and COMMA from input/output/groupedcolumns expressions. * Rename confusing variable names (`os`, `l`, etc). * Avoid `self` for receiver names. * Use `strings.Builder` for efficient printing. * Add function to pretty print the AST (needs `go test -v`). * Rework comments. * Properly format files. Note: * Not using `PassthroughExpression` yet. `IdentityExpression` seems to fit the same role. * Input and Output expressions might be collapsed into one. The structure seems identical except for the initial marker.

parse/ast.go

parse/parser.go

parse/parser_test.go

benhoyt · 2022-07-10T23:53:55Z

parse/parser_test.go

+	assert.Equal(t, expected, r)
+}
+
+func TestParserSimpleInputSource(t *testing.T) {


One thing I'm a bit worried about here is whether we've covered all the cases, or got edge cases or termination conditions incorrect, and so on. I think we probably need a more thorough approach to testing, for example: run a large swath of valid and invalid SQL through it. Not sure where you'd get such a body: we could create our own and build on it, or maybe pull in a list from another library's tests, that sort of thing.

We could also use Go's coverage to ensure we've covered all the cases, and fuzzing to ensure we don't have unintended infinite loops or panics. Go has very good tooling for both of these, so they're easy to set up.

Yes, ideally we would have a big corpus of SQL queries with a comprehensive mix of input sources, output targets, etc.

benhoyt · 2022-07-10T23:54:47Z

Thanks for the updates. Looks like a good start!

@benhoyt

Initial parser for sqlair DSL. Top-down parser with precedence for disambiguation. Address most suggestions by @benhoyt from [this PR](canonical/sqlair#8) * Use `beginChildren` and `endChildren`. * Remove PERIOD and COMMA from input/output/groupedcolumns expressions. * Rename confusing variable names (`os`, `l`, etc). * Avoid `self` for receiver names. * Use `strings.Builder` for efficient printing. * Add function to pretty print the AST (needs `go test -v`). * Rework comments. * Properly format files. * Use pre-check for list separators. * s/infixfn/infixFn and s/prefixfn/prefixFn * Make {In/Pre}fixFunc private * Make NextToken and Precedence private. * Make Lexer private. * Add more tests. * Check for (,,,,) * Check for () * Check for (a, b... * Test for empty statement. * Test for missing PERIOD in output target. * Add makeToken helper to make tests more readable. Note: * Not using `PassthroughExpression` yet. `IdentityExpression` seems to fit the same role. * Input and Output expressions might be collapsed into one. The structure seems identical except for the initial marker.

Unbreak parser after changes in pull request canonical#10.

manadart

I see what you mean about PassThroughExpression being unused. Trying to use it makes the parsing a little more complicated with this operator-precedence approach, so we can go with what we've got here for now.

I'll do a series of follow-ups to this with some sundry things.

Nice job.

niemeyer

Please don't merge this yet. Let's please do a review pass on it together, Fernando. There are a few details that can be improved, and that will help settling a standard for future work.

niemeyer

Here are a few early points for us to brainstorm a bit on.

niemeyer · 2022-07-25T10:10:17Z

internal/parse/parser.go

+)
+
+const (
+	LOWEST = iota


Per standard Go conventions, Go follows a camel-case style, and we typically see some indication of what the constant is about in its name.

niemeyer · 2022-07-25T10:11:12Z

internal/parse/parser.go

+// This parser implements a top-down parsing strategy with operator
+// precedence (Pratt's parser)
+// https://en.wikipedia.org/wiki/Operator-precedence_parser#Pratt_parsing
+func NewParser(l *lexer) *Parser {


Why was the Lexer made private? This particular line shows a design problem in doing so: how can we have a public function that takes a private type?

It was my suggestion that we reduce the number of exported things (and the lexer is really an implementation detail). However, @niemeyer is right that we can't have an exported function taking a non-exported type. I think we should just have the parser take a []byte or string directly, and it can instantiate the lexer (implementation detail). This would also make usage (including tests) simpler.

Then again, now that these packages are internal it's much less concerning to me to have a larger exported surface area.

niemeyer · 2022-07-25T10:13:17Z

internal/parse/parser.go

+	// Fill prefix functions according to token type.
+	// Token types that has prefixFunc do not care about the
+	// left part of the expression.
+	p.prefixFn = map[TokenType]prefixFunc{


prefixFunc or prefixFn?

niemeyer · 2022-07-25T10:14:45Z

internal/parse/parser.go

 		lex: l,
 	}
+
+	// Fill prefix functions according to token type.
+	// Token types that has prefixFunc do not care about the


It would be nice to have some more clarity about what this means here, perhaps with some examples. Also, why are these maps instantiated as part of the parser for every instance? Do we plan to change them per-parser, or are they dynamic for some other reason?

Also, "has" => "have", but perhaps irrelevant if docs will change.

niemeyer · 2022-07-25T10:15:10Z

internal/parse/parser.go

+	// Fill infix functions according to token type.
+	// Tokens that has infixFunc care about the left part
+	// of the expression.
+	p.infixFn = map[TokenType]infixFunc{


Same notes as above.

niemeyer · 2022-07-25T10:53:13Z

internal/parse/parser.go

+		return nil
+	}
+	p.nextToken()
+	field := p.parseExpression(p.precedence(p.currentToken)).(*IdentityExpression)


Similar concern as earlier above.

niemeyer · 2022-07-25T10:54:19Z

internal/parse/parser.go

+func (p *Parser) parseInputSource() Expression {
+	marker := p.currentToken
+	p.nextToken()
+	name := p.parseExpression(p.precedence(p.currentToken)).(*IdentityExpression)


Same as earlier here and below. I won't repeat the comment, but given how widespread this is it might be worth talking about it to see if there's an easy pattern we can adopt across the code.

niemeyer · 2022-07-25T10:57:41Z

internal/parse/parser.go

+
+func (p *Parser) parseGroup() Expression {
+	// Skip left parenthesis
+	p.nextToken()


Similar issue here as earlier. There's apparently a pattern established about functions trusting external call sites to define their own internal state. This seems problematic, because it means for every function there's an implied and unspecified rule about what state the world is in so that we can call it, otherwise the parser will panic. This kind of approach does not bode well to future work on this code base, when someone else will be writing code and not know what all the expectations were. Let's please talk about this as well.

niemeyer · 2022-07-25T10:58:13Z

internal/parse/parser.go

+	if p.currentToken.Type == RPAREN {
+		// Empty group.
+		p.errors = append(p.errors, fmt.Sprintf(
+			"Line: %d, column: %d: unexpected ')', expecting LITERAL",


Same as earlier note on the error message. There's more below, but I'll stop repeating it. Let's please just do a full pass on the error messages.

niemeyer · 2022-07-25T11:00:42Z

internal/parse/parser_test.go

+	// in this statement.
+	for _, c := range r.Expressions() {
+		t.Log(reflect.TypeOf(c), reflect.TypeOf(&IdentityExpression{}))
+		assert.Equal(t, reflect.TypeOf(c), reflect.TypeOf(&IdentityExpression{}))


We need to organize these tests as a table so that it's much easier to see what's happening, and also much easier to repeat similar cases, and then improve the number of cases tested. Again, let's please chat about it and I'll present some analogous code as inspiration.

This adds a complete parser that parses any statement as only BypassParts.

fernape mentioned this pull request Jul 8, 2022

Define expressions and methods #8

Closed

benhoyt reviewed Jul 10, 2022

View reviewed changes

fernape added 5 commits July 12, 2022 09:24

Merge remote-tracking branch 'upstream/main' into add-parser

cc3eb01

Merge remote-tracking branch 'upstream/main' into add-parser

167c6e1

Unbreak after PR#10

daed5bf

Unbreak parser after changes in pull request canonical#10.

Remove spurious log

cab3c88

manadart approved these changes Jul 25, 2022

View reviewed changes

niemeyer requested changes Jul 25, 2022

View reviewed changes

jack-w-shaw pushed a commit to jack-w-shaw/sqlair that referenced this pull request Jun 20, 2023

Add bypass parser (canonical#10)

943d9c2

This adds a complete parser that parses any statement as only BypassParts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add initial parser. #10

Add initial parser. #10

fernape commented Jul 8, 2022

benhoyt Jul 10, 2022

fernape Jul 12, 2022

benhoyt commented Jul 10, 2022

manadart left a comment

niemeyer left a comment

niemeyer left a comment

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

benhoyt Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

niemeyer Jul 25, 2022

Add initial parser. #10

Are you sure you want to change the base?

Add initial parser. #10

Conversation

fernape commented Jul 8, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benhoyt commented Jul 10, 2022

manadart left a comment

Choose a reason for hiding this comment

niemeyer left a comment

Choose a reason for hiding this comment

niemeyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment