Where a Gopher wanders around and meets lots of 🌳 Sitters...
First of all, giving credits where they are due:
This repository started as a fork of @smacker's go-tree-sitter repo
until I realized I don't want to also handle the bindings library itself
in the same project (i.e. the stuff in the root of the repo, exposing sitter.Language
type
itself & co.), I just want a (big) collection of all the tree-sitter
parsers I can add.
So here it is: started with the parsers and the automation from the above mentioned repo then added a bunch more parsers on top of it and updated automation (to support more parsers and also to automatically update the PARSERS.md file, git tags, etc.).
See PARSERS.md for the list of supported parsers. The end goal is (at least) parity with nvim_treesitter.
For contributing (or just to see how the automation works) see CONTRIBUTING.md.
The language name used is the same as TreeSitter language name (lower case, underscore
instead of spaces) and the same as the query folder name in nvim_treesitter
.
This keeps things simple and consistent.
In rare cases, the Go package name differs from the language name:
go
actually has the package nameGo
becausepackage go
does not go well in Go (pun intended) but otherwise the language name remains "go";func
language, same problem as above, so package name is actuallyFunC
(but everything else isfunc
as normal: folder, language name, etc.).
Also, some languages may have names that are not very straightforward acronyms.
In those cases, an altName
field will be populated, i.e. requirements
language
has an altName
of Pip requirements
, query
has Tree-Sitter Query Language
and so on. Search grammar.json for your grammar of interest.
See the README in go-tree-sitter-bare,
as well as the example_*.go
files in this repo.
This repo only gives you the GetLanguage()
function, you will still use the sibling
repo for all your interactions with the tree.
You can use the parsers in this repo in 3 main ways:
You can use the parsers one (or more) at a time, as you'd use any other Go package:
package main
import (
"context"
"fmt"
"github.com/alexaandru/go-sitter-forest/risor"
sitter "github.com/alexaandru/go-tree-sitter-bare"
)
func main() {
content := []byte("print('It works!')\n")
node, err := sitter.ParseCtx(context.TODO(), content, risor.GetLanguage())
if err != nil {
panic(err)
}
// Do something interesting with the parsed tree...
fmt.Println(node)
}
If (and only IF) you want to use ALL (or most of) the parsers (beware, your binary
size will be huge, as in 200MB+ huge) then you can use the root (forest
) package:
package main
import (
"context"
"fmt"
forest "github.com/alexaandru/go-sitter-forest"
sitter "github.com/alexaandru/go-tree-sitter-bare"
)
func main() {
content := []byte("print('It works!')\n")
parser := sitter.NewParser()
parser.SetLanguage(forest.GetLanguage("risor")())
tree, err := parser.ParseCtx(context.TODO(), nil, content)
if err != nil {
panic(err)
}
// Do something interesting with the parsed tree...
fmt.Println(tree.RootNode())
}
this way you can fetch and use any of the parsers dynamically, without having to manually import them. You should rarely need this though, unless you're writing a text editor or something.
A third way, and perhaps the most convenient (no, it's not, it's ~200MB with all
parsers built into the binary whereas all parsers built as plugins took ~1650MB
last time I built them all (which granted, was several versions ago, before upgrading
to TreeSitter
v0.22.1)), is to use the included Plugins.make
makefile, which allows easy creation of any and all plugins. Simply copy it to
your repo, and then you can easily make -f Plugins.make plugin-risor
, etc. or
use the plugin-all
target which creates all the plugins.
Then you can selectively use them in your app using the plugins mechanism.
IMPORTANT: You MUST use -trimpath
when building your app, when using plugins
(the Plugins.make file already includes it, but the app that uses them also needs it).
Each individual parser (as well as the bulk loader) offers an Info()
function
which can be used to retrieve information about a parser. It exposes it's entry
from grammars.json
either raw (as a string holding the JSON encoded entry)
or as an object (only available in bulk mode).
The returned Grammar
type implements Stringer
so it should give a nice summary
when printed (to screen or logs, etc.).
For transparency, any and all changes made to the parsers' (and, to be clear, I include in this term ALL the files coming from parsers, not just parser.c) files are documented below.
For one thing ALL changes are fully automated (any exceptions are noted below), no change is ever made manually, so inspecting the automation should give you a clear picture of all the changes performed to the code, changes which are detailed below:
- the include paths are rewritten to use a flat structure (i.e.
"tree_sitter/parser.h"
becomes"parser.h"
); This is needed so that the files are part of the same package, plus it also makes automation simpler; - for
unison
thescanner
file includesmaybe.c
which causescgo
to include the file twice and throw duplicate symbols error. The solution chosen was to copy the content of the included file into the scanner file and set the included file to zero bytes; this way all the code is in one file and the compilation is possible; - for parsers that include a
tag.h
file: theTAG_TYPES_BY_TAG_NAME
variable clashes between them (when those parsers are all included into one app). The solution chosen was to rename the variable by adding the_<lang>
suffix, i.e., we currently have:TAG_TYPES_BY_TAG_NAME_astro
;TAG_TYPES_BY_TAG_NAME_html
;TAG_TYPES_BY_TAG_NAME_svelte
;TAG_TYPES_BY_TAG_NAME_vue
;
- for parsers that define
serialize()
,deserialize()
,scan()
(and a few others) (i.e.org
,beancount
,html
& a few others): the offending identifiers are renamed by appending the_<lang>
suffix to them (i.e.serialize
->serialize_org
, etc.); See theputFile()
function ininternal/automation/main.go
for details; - some parsers'
grammar.js
files were not yet updated to work with latest TreeSitter, in which case we hot patch them before regenerating the parser. See thereplMap
indownloadGrammar()
function; - EXCEPTION MANUAL CHANGE:
poe_filter/parser.c
is currently invalid upon generation (is missing a right paren at line 6216, col 50 - I added it manually).
We are currently aligned with TreeSitter v0.22.2
: go-tree-sitter-bare
v1.1.1
uses v0.22.2
(and this project uses that latest version of
go-tree-sitter-bare
) and the included package.json
(which is used for regenerating grammars) is using the same version.
As for the parsers in this repo:
- almost of the parsers are now regenerating the
parser.{c,h}
files using the latesttree-sitter
(they are marked with a heavy checkmark in PARSERS.md and they do NOT have theskip
flag set in grammars.json); this is preferred way going further; - a few parsers could not yet be regenerated locally. They can still
be used just fine, but they will use whatever
parser.{c,h}
files they have in the upstream repo, which may or may not have been compiled with the latesttree-sitter
version (most likely not, or we'd also be able to regenerate them). In general they should work too though, that's how this project started after all, with downloaded files only.
So there it is, we try to converge towards using the same tree-sitter
version everywhere, and keeping up with it too.
- need to update the parsers automation to create a Go module for a new parser automatically;
- need to be able to auto-delete files deleted remotely (i.e. if a scanner.c or whatever is deleted from the source repo, we should also be deleting it locally).