-
Notifications
You must be signed in to change notification settings - Fork 598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Formal grammar and parser #4613
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
…cess inputs/outputs Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
✅ Deploy Preview for nextflow-docs-staging canceled.
|
Some sweets-infused holiday thoughts... right now I am just producing the same AST expected by the runtime to keep this PR as simple as possible. But, like I said, we can produce whatever Groovy AST we want, so we could produce Groovy code that more effectively enables new features like static types, default arguments, etc. The main example I'm thinking of is the annotation API (see nextflow-io/rnaseq-nf#24). I originally designed it as user-facing code, but it could also be an intermediate representation that is produced by the parser. If we "compile" the process and workflow definitions to actual function definitions, then we can more easily leverage the Groovy type checking. This is just an example. We may not need the annotation API exactly, but it would be good to explore alternative AST representations, perhaps in a second iteration. |
Signed-off-by: Ben Sherman <[email protected]>
Another aside... GraalVM implements an AST model for every language that it supports. Here is the Graal Python AST source code. So we could also have the parser produce a Graal/Python AST and thereby allow the pipeline code to use Python semantics instead of Groovy semantics. We would need to design a DSL syntax for processes and workflows that would make sense with Python. Likely it would look more like Snakemake. Using native Python syntax (i.e. functions with decorators) is also an option but would likely be more verbose. We would still need to implement our own IDE tooling, but centered around Python syntax instead of Groovy syntax. The point is, if we rely on the semantics (and compiler backend) of an existing language, it doesn't have to be Groovy. It could easily be any language supported by GraalVM. |
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
This PR adds a custom parser for Nextflow scripts and uses it instead of the Groovy parser. The Nextflow parser is generated from an ANTLR grammar, which currently contains a subset of Groovy syntax with some additional rules for processes, workflows, and include statements.
To bypass the Groovy parser, we invoke the GroovyShell with a placeholder script that simply wraps the actual script in a string expression. Then in an AST transform, we extract the string value, parse it with the Nextflow parser, and insert the resulting Groovy AST into the placeholder script.
This approach allows us to control the parsing process -- including the syntax and detecting syntax errors -- while still leveraging the Groovy compiler for execution. In other words, we can define whatever grammar we want, as long as we can "compile" it into a Groovy AST. If you look at
AstBuilder
, you'll see that it converts processes / workflows / includes into the same Groovy AST structures produced byNextflowDSLImpl
.The hack I'm doing to make this work seems fine but a more robust solution might be to use internal Groovy classes in such a way that allows us to pass our AST directly to the Groovy compiler, instead of going through the GroovyShell and AST transforms. That will take time to understand which components we'll need to rip out. But the advantage is that we don't have to implement our own compiler backend.
I developed this code in a separate project and only just now incorporated it into Nextflow. I haven't tested extensively so there are likely some issues around the edges. Just wanted to finish a basic prototype before the holidays.
TODO:
AstBuilder
to parity withNextflowDSLImpl