These guidelines are intended to help developers contribute to the grammar and preserve consistency across the project while doing so.
The main objective of this project is to support all versions of the Java language. This means addressing issues with the "todo" and "known error" labels, assigning yourself to a problem that you're working on so that work isn't duplicated. We're also open to documentation improvements and feature enhancements and recommend filing an issue.
Language constructs are grouped into the top-level categories denoting declarations, statements within methods and expressions. All granular constructs feed into those through the defined grammar hierarchy.
rules: {
program: $ => repeat($._statement),
_statement: $ => prec(1, choice(
$._expression_statement,
$._declaration,
$._method_statement
)),
The grammar.js
file follows the BNF grammar outlined in the Java Language Specification.
There are situations where we've deviated from the spec:
- Prefered naming: if common developer parlance prefers a naming convention other than the spec, we tend to deviate. An example of this is for
generic_type
as the outer wrapper fortype_arguments
, since generics are a familiar Java programming concept. - Simplicity: The spec is convoluted and not conducive to compact, readable code. In this situation, we've preferred structuring things in a way that are more reusable throughout the grammar and also read clearly. An example of this is our preference to use
binary_
andunary_
expressions to model relationships between operators, as opposed to supporting the spec'sConditionalExpression
hierarchy.
There are situations in which we parse invalid code to support end-user experiences. For example, it's important to ensure syntax-highlighting doesn't break down for a snippet of Java code in a markdown file. For this reason, we currently allow expressions to be parsed outside of methods, even though that is not valid Java.
To know what is "valid enough", consider what good documentation would look like:
- ✅
int x = (1 + 2);
= This is invalid since it is not within a method, but still comprehensible. Parse this. - ❌
int x = (1 + ) =;
This is not only invalid Java, but it is invalid logic. It wouldn't make sense in documentation. Don't parse this.
Running your code using something like JavaRepl is also a good way to verify the correctness of the input program.
The recommendation is to be comprehensive in adding tests. If it's a visible node, add it to a /corpus/
test file. It's typically a good idea to test as many permutations of a particular language construct as possible. This increases test coverage, but doubly acquaints readers with a way to examine expected outputs and understand the "edges" of a language.
Three of the "most popular" Java repositories have been cloned into the project under the /examples
directory (where popularity is defined by repositories that are most starred and have highest number of active contributers within the last month). Parsing these repos allows us to gauge how well our grammar performs at parsing "real world" Java.
To test:
./script/parse-examples
runs the tests and outputs them toknown-errors.txt
, representing the files that have any errors orMISSING ;
flags.- The goal is to drive down the errors in
known-errors.txt
to 0. known-errors.txt
allows you to find erroring files and parse them individually to diagnose and debug errors.
It's worth consulting other LR Java parsers (such as JavaParser) to guide your own grammar development. Comparing tree structure and naming can provide valuable insight into what is usable.