Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Remove func keyword #3430

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

feat: Remove func keyword #3430

wants to merge 9 commits into from

Conversation

max-sixty
Copy link
Member

@max-sixty max-sixty commented Aug 27, 2023

I'm super confused by this. The only change I made is to comment out the lexing of "func". But as a result, a huge number of tests fail; most queries can no longer be compiled. I don't see how that's possible — the AST that's being passed back from prql-parser surely doesn't change. (As expected, the only test in prql-parser fails is one that contained func, which I've commented out — which suggests that the AST likely doesn't change). Am I missing something very obvious? Is there dark magic in our codebase?

OK it was because stdlib was using it!

What do folks think about removing func now that we have line continuations? Exactly one way to do things etc etc...

@max-sixty max-sixty requested a review from aljazerzen August 27, 2023 20:47
@max-sixty max-sixty changed the title feat: Explore removing func feat: Remove func keyword Aug 28, 2023
@max-sixty max-sixty removed the request for review from not-my-profile August 28, 2023 06:04
@aljazerzen
Copy link
Member

Removing dependence on a keyword for multi-line declaration is cool, but there was another motive for func keyword.

I've re-added func keyword because there were cases where parsing was very close to being ambiguous. I was even thinking about making it required.

A case of ambiguity with param defaults:

let my_func = a b c:x -> x -> a + c b

A case of ambiguity with returning a function:

let my_func = a -> x -> a + x

The general problem (and potential performance sink) is that when the parser encounters a func call a b c, it cannot know if it is actually a func call, or a func definition.

Because of precedence, it first has to assume that it is a func definition and only when it does not find a trailing ->, it backtracks and parsers a b c as expressions, composing a function call.

@max-sixty
Copy link
Member Author

max-sixty commented Aug 28, 2023

Could we instead make it right-associative?

e.g. with:

let f = x y -> x z -> x + y + z

If -> is right associative, then this would be parsed as:

let f = x y -> (x z -> x + y + z)

If it's left associative, it would be parsed as:

let f = (x y -> x) z -> x + y + z

...which seems wrong.

To take your examples:

let my_func = a b c:x -> (x -> a + c b)  # this applies with no parentheses too

vs.

let my_func = a b -> (c:x -> a + c b)

and

let my_func = a -> x -> a + x  # this applies with no parentheses too

vs.

let my_func = a -> (x -> a + x)

?

@aljazerzen
Copy link
Member

Oh, we could absolutely make it left or right associative (one of these is the current behavior), but my point is that parsing this is much harder and less performant than it needs to be.

This applies to all function calls: when parsing join albums side:left (==album_id), the parser has to traverse all the way until ( before it can conclude that this is not a function declaration.

I'm all for minimal grammar, but I do think having a func keyword would benefit performance and ambiguity.

(even with the keyword, we still need to decide on left/right-associativity)

@max-sixty
Copy link
Member Author

I was less keen on the func keyword, but not immovably so.

I agree it slows down the parsing a bit. But the parsing is still pretty fast! And at least how I understand it, it's not ambiguous to humans? Though possibly that's because we so rarely see nested function definitions.

I also really like the clean aesthetics of the current form, and only having one form of function definition. Though aesthetics aren't that good a reason, and the latter point would hold if we mandated func...

@max-sixty
Copy link
Member Author

Presumably a language such as Haskell is similar? i.e. f x y = x + y could be lots of things until the =?

@aljazerzen
Copy link
Member

Presumably a language such as Haskell is similar? i.e. f x y = x + y could be lots of things until the =?

I don't know haskell, but I don't think that's the case. Aren't all terms before = always function name and its args?

@max-sixty
Copy link
Member Author

Aren't all terms before = always function name and its args?

Yes but same as with PRQL — until we get to the = (or it our case, the ->) — it could be lots of things.

In particular f could be a function being applied to two arguments, x and y?

@aljazerzen
Copy link
Member

I've looked into Haskell and it does not have this problem.

That's because f x y = ... cannot be an expression, only a statement. This means that in this statement:

a = f x y

... f x y cannot be a start of a function declaration, no matter what we add onto the line.

For anonymous functions, Haskell does have a leading denomination:

a = \x y -> x + y

I know that this might seem like a detail, but I can have major performance implications. An extreme example would be:

let x = a b:(1 + 2 + ...) c:(1 + 2 + ...) d:(1 + 2 + ...) ... z:(1 + 2 + ...)

Here, parser would first try to parse this as a function declaration - all of the parameters and all of their default values, only to encounter an end of the line, conclude that this is not a function declaration, throw this all away and start again, parsing as a function call.

I wish we could reuse the parsed args, but they do a slightly different syntax (type annotations), so this would be hard to pull off.

@max-sixty
Copy link
Member Author

That's because f x y = ... cannot be an expression, only a statement.

until we get to the = (or it our case, the ->) — it could be lots of things.

But f x y can be lots of things, no? Before we get to the =?

@aljazerzen
Copy link
Member

In Haskell, I don't think so, no. (I have a very limited knowledge of Haskell, correct me if I'm wrong).
If I understand correctly, a statement has to start with variable / function name, followed by parameters, followed by =.

@richb-hanover
Copy link
Contributor

richb-hanover commented Aug 30, 2023

What do folks think about removing func now that we have line continuations? Exactly one way to do things etc etc...

@max-sixty This discussion so far has been highly technical, targeted at people who're deeply familiar with PRQL innards. As someone who might wind up writing/tweaking documentation for this, I would enjoy more explanation:

  • Where is func used now? I thought it had been replaced by let...
  • What does this have to do with line continuations?

Update: A few sentences of context would be helpful. Thanks, as always.

@max-sixty
Copy link
Member Author

If I understand correctly, a statement has to start with variable / function name, followed by parameters, followed by =.

Sorry to belabor the point, but to ensure we're sharing understanding on this example. IIUC (also not an expert), this is valid Haskell:

f :: Int -> Int -> Int
f a b = a + b  -- Function definition

...but so is...

x = 10
y = 20

f x y  -- Function application

(even though that final statement isn't doing anything, it's still valid IIUC)

The point re before the = is important — i.e. once we get to the =, then we know it's a definition — in the same way once we get to the ->, we know it's a definition.

Does that make sense? (If I'm correct it doesn't mean that we have remove func obv! Instead I wanted to understand where we were vs. other languages.)

@max-sixty
Copy link
Member Author

max-sixty commented Aug 30, 2023

Where is func used now? I thought it had been replaced by let...

It broadly has! But it's still valid, and allows multi-line function definitions (check out PR for examples). So now we have line wrapping, one proposal is to remove func. Another proposal is to bring func back, to make the parser simpler...

@aljazerzen aljazerzen added the language-design Changes to PRQL-the-language label Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
language-design Changes to PRQL-the-language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants