a Go 🌳 Sitter Forest

Where a Gopher wanders around and meets lots of 🌳 Sitters...

First of all, giving credits where they are due:

This repository started as a fork of @smacker's go-tree-sitter repo until I realized I don't want to also handle the bindings library itself in the same project (i.e. the stuff in the root of the repo, exposing sitter.Language type itself & co.), I just want a (big) collection of all the tree-sitter parsers I can add.

So here it is: started with the parsers and the automation from the above mentioned repo then added a bunch more parsers on top of it and updated automation (to support more parsers and also to automatically update the PARSERS.md file, git tags, etc.).

See PARSERS.md for the list of supported parsers. The end goal is (at least) parity with nvim_treesitter.

For contributing (or just to see how the automation works) see CONTRIBUTING.md.

Naming Conventions

The language name used is the same as TreeSitter language name (lower case, underscore instead of spaces) and the same as the query folder name in nvim_treesitter.

This keeps things simple and consistent.

In rare cases, the Go package name differs from the language name:

go actually has the package name Go because package go does not go well in Go (pun intended) but otherwise the language name remains "go";
func language, same problem as above, so package name is actually FunC (but everything else is func as normal: folder, language name, etc.).

Also, some languages may have names that are not very straightforward acronyms. In those cases, an altName field will be populated, i.e. requirements language has an altName of Pip requirements, query has Tree-Sitter Query Language and so on. Search grammar.json for your grammar of interest.

Usage

See the README in go-tree-sitter-bare, as well as the example_*.go files in this repo.

This repo only gives you the GetLanguage() function, you will still use the sibling repo for all your interactions with the tree.

You can use the parsers in this repo in 3 main ways:

1. Standalone

You can use the parsers one (or more) at a time, as you'd use any other Go package:

package main

import (
	"context"
	"fmt"

	"github.com/alexaandru/go-sitter-forest/risor"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')\n")
	node, err := sitter.ParseCtx(context.TODO(), content, risor.GetLanguage())
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(node)
}

2. In Bulk

If (and only IF) you want to use ALL (or most of) the parsers (beware, your binary size will be huge, as in 200MB+ huge) then you can use the root (forest) package:

package main

import (
	"context"
	"fmt"

	forest "github.com/alexaandru/go-sitter-forest"
	sitter "github.com/alexaandru/go-tree-sitter-bare"
)

func main() {
	content := []byte("print('It works!')\n")
	parser := sitter.NewParser()
	parser.SetLanguage(forest.GetLanguage("risor")())

	tree, err := parser.ParseCtx(context.TODO(), nil, content)
	if err != nil {
		panic(err)
	}

	// Do something interesting with the parsed tree...
	fmt.Println(tree.RootNode())
}

this way you can fetch and use any of the parsers dynamically, without having to manually import them. You should rarely need this though, unless you're writing a text editor or something.

3. As a Plugin

A third way, ~~and perhaps the most convenient~~ (no, it's not, it's ~200MB with all parsers built into the binary whereas all parsers built as plugins took ~1650MB last time I built them all (which granted, was several versions ago, before upgrading to TreeSitter v0.22.1)), is to use the included Plugins.make makefile, which allows easy creation of any and all plugins. Simply copy it to your repo, and then you can easily make -f Plugins.make plugin-risor, etc. or use the plugin-all target which creates all the plugins.

Then you can selectively use them in your app using the plugins mechanism.

IMPORTANT: You MUST use -trimpath when building your app, when using plugins (the Plugins.make file already includes it, but the app that uses them also needs it).

Info

Each individual parser (as well as the bulk loader) offers an Info() function which can be used to retrieve information about a parser. It exposes it's entry from grammars.json either raw (as a string holding the JSON encoded entry) or as an object (only available in bulk mode).

The returned Grammar type implements Stringer so it should give a nice summary when printed (to screen or logs, etc.).

Parser Code Changes

For transparency, any and all changes made to the parsers' (and, to be clear, I include in this term ALL the files coming from parsers, not just parser.c) files are documented below.

For one thing ALL changes are fully automated (any exceptions are noted below), no change is ever made manually, so inspecting the automation should give you a clear picture of all the changes performed to the code, changes which are detailed below:

the include paths are rewritten to use a flat structure (i.e. "tree_sitter/parser.h" becomes "parser.h"); This is needed so that the files are part of the same package, plus it also makes automation simpler;
for unison the scanner file includes maybe.c which causes cgo to include the file twice and throw duplicate symbols error. The solution chosen was to copy the content of the included file into the scanner file and set the included file to zero bytes; this way all the code is in one file and the compilation is possible;
for parsers that include a tag.h file: the TAG_TYPES_BY_TAG_NAME variable clashes between them (when those parsers are all included into one app). The solution chosen was to rename the variable by adding the _<lang> suffix, i.e., we currently have:
- TAG_TYPES_BY_TAG_NAME_astro;
- TAG_TYPES_BY_TAG_NAME_html;
- TAG_TYPES_BY_TAG_NAME_svelte;
- TAG_TYPES_BY_TAG_NAME_vue;
for parsers that define serialize(), deserialize(), scan() (and a few others) (i.e. org, beancount, html & a few others): the offending identifiers are renamed by appending the _<lang> suffix to them (i.e. serialize -> serialize_org, etc.); See the putFile() function in internal/automation/main.go for details;
some parsers' grammar.js files were not yet updated to work with latest TreeSitter, in which case we hot patch them before regenerating the parser. See the replMap in downloadGrammar() function;
EXCEPTION MANUAL CHANGE: poe_filter/parser.c is currently invalid upon generation (is missing a right paren at line 6216, col 50 - I added it manually).

Versions & Status

We are currently aligned with TreeSitter v0.22.2: go-tree-sitter-bare v1.1.1 uses v0.22.2 (and this project uses that latest version of go-tree-sitter-bare) and the included package.json (which is used for regenerating grammars) is using the same version.

As for the parsers in this repo:

almost of the parsers are now regenerating the parser.{c,h} files using the latest tree-sitter (they are marked with a heavy checkmark in PARSERS.md and they do NOT have the skip flag set in grammars.json); this is preferred way going further;
a few parsers could not yet be regenerated locally. They can still be used just fine, but they will use whatever parser.{c,h} files they have in the upstream repo, which may or may not have been compiled with the latest tree-sitter version (most likely not, or we'd also be able to regenerate them). In general they should work too though, that's how this project started after all, with downloaded files only.

So there it is, we try to converge towards using the same tree-sitter version everywhere, and keeping up with it too.

TODO

need to update the parsers automation to create a Go module for a new parser automatically;
need to be able to auto-delete files deleted remotely (i.e. if a scanner.c or whatever is deleted from the source repo, we should also be deleting it locally).

Name		Name	Last commit message	Last commit date
Latest commit History 280 Commits
ada		ada
agda		agda
angular		angular
apex		apex
arduino		arduino
asm		asm
astro		astro
authzed		authzed
awk		awk
bash		bash
bass		bass
beancount		beancount
bibtex		bibtex
bicep		bicep
bitbake		bitbake
blueprint		blueprint
bp		bp
c		c
c_sharp		c_sharp
cairo		cairo
calc		calc
capnp		capnp
cel		cel
chatito		chatito
clojure		clojure
cmake		cmake
comment		comment
commonlisp		commonlisp
cooklang		cooklang
corn		corn
cpon		cpon
cpp		cpp
crystal		crystal
css		css
csv		csv
cuda		cuda
cue		cue
d		d
dart		dart
devicetree		devicetree
dhall		dhall
diff		diff
disassembly		disassembly
djot		djot
dockerfile		dockerfile
dot		dot
doxygen		doxygen
dtd		dtd
earthfile		earthfile
ebnf		ebnf
eds		eds
eex		eex
elixir		elixir
elm		elm
elsa		elsa
elvish		elvish
embedded_template		embedded_template
erlang		erlang
facility		facility
faust		faust
fennel		fennel
fidl		fidl
firrtl		firrtl
fish		fish
foam		foam
forth		forth
fortran		fortran
fsh		fsh
func		func
fusion		fusion
gdscript		gdscript
gdshader		gdshader
git_config		git_config
git_rebase		git_rebase
gitattributes		gitattributes
gitcommit		gitcommit
gitignore		gitignore
gleam		gleam
glimmer		glimmer
glsl		glsl
gn		gn
gnuplot		gnuplot
go		go
godot_resource		godot_resource
gomod		gomod
gosum		gosum
gotmpl		gotmpl
gowork		gowork
gpg		gpg
graphql		graphql
groovy		groovy
gstlaunch		gstlaunch
hack		hack
hare		hare
haskell		haskell
haskell_persistent		haskell_persistent
hcl		hcl
heex		heex
helm		helm
hjson		hjson

License

alexaandru/go-sitter-forest

Folders and files

Latest commit

History