Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(fmt): An attempt at aesthetic items into PL #4639

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

max-sixty
Copy link
Member

@max-sixty max-sixty commented Jun 20, 2024

By adding comments (named "aesthetics" here, and includes linewraps) to PL, this is an attempt to get around the complications of combining lexer + parser output in prqlc fmt, which #4397 has hit in a few incarnations.

This very very nearly works — with chumsky we can create a function that wraps anything that might have a trailing or following comment, implement a trait on the AST items that contain it — and away we go. (though it did require lots of debugging in the end...). The AST would then be really easy to write back out.

This requires comments to lead or follow tokens that are part of an AST item. I think there's literally a single case where it doesn't work, which is when a comment follows the final trailing comma of a tuple or array. So apart from that case, a comment always leads or follows a token that's part of an AST item.

...so tests fail at the moment, on that case.

Next we need to consider:

  • Can we workaround that one case? We don't actually care about whether there's a trailing comma — it has no semantic meaning — and we're going to override that when we write it out, so we could likely hack around it...
  • Are there actually other cases of this model failing? I know this approach — of putting aesthetic items into AST — is not generally favored. And at a meta level we should be skeptical of the claim "there's just a single case of something not working" — bad models generally have multiple failures!

This is an attempt to get around the complications of managing lexer + parser output, which PRQL#4397 has hit in a few incarnations by just adding comments ('aesthetics') to PL.

This very very nearly works -- with chumsky we can create a function that wraps anything that might have a comment, implement a trait on the AST items that contain it, and away we go (though it did require a lot of debugging in the end). This would then be really easy to write back out.

I think there's literally a single case where it doesn't work -- where a comment doesn't come directly before or directly after an AST item -- in the final trailing comma of a tuple or array. So tests fail at the moment.

Next we need to consider:
- Can we workaround that one case? We don't actually care about whether there's a trailing comma, so we could likely hack around it...
- Are there actually other cases of this model failing? I know this approach -- of putting aesthetic items into AST -- is not generally favored, and it's really rare that there's even a single case of something not working.
@max-sixty
Copy link
Member Author

One thing we could do land from this is doc-comments — which must be attached to an item, and we may want to push through the AST. It's much less important to the project than getting prqlc fmt to work, but would let us merge something from this work rather than having a bunch of PRs & branches gradually accumulating merge conflicts...

@aljazerzen
Copy link
Member

Oh, this is an interesting idea: instead of discarding all comments and new lines, we keep them in the first AST, so they can be re-incorporated into codegen.

It's a shame that it doesn't work for all the cases. It would be a beautiful solution for formatting comments. Could we parse trailing comma as an aesthetic too?


re doc comments: yes, they are very similar. I think they should be allowed only on statements, so that's even simpler to parse.

@max-sixty
Copy link
Member Author

It's a shame that it doesn't work for all the cases. It would be a beautiful solution for formatting comments. Could we parse trailing comma as an aesthetic too?

Yeah, I added some more thoughts at #4397 (comment). It's possible to do it this way, but not as elegant as I first thought — I think it would require lots of backtracking and custom parsers (i.e. not just delimited_by...) to be able to distinguish between the two cases in that comment...

This was referenced Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants