Alpha Quality: This might do what you want. It's likely to not do what you want. Please open issues!
Torch-Grammar restricts a model to output a token sequence that conforms to a provided EBNF grammar.
For example:
import torch
from torch_grammar import GrammarSampler
from transformers import LlamaTokenizer
tokenizer = LlamaTokenizer.from_pretrained("huggyllama/llama-7b")
with open("examples/grammar.ebnf", "r") as file:
input_text = file.read()
grammar = GrammarSampler(input_text, "root", tokenizer)
ids = [[1]]
logits_processor = grammar.logits_processor()
vocab_size = len(tokenizer.get_vocab())
for i in range(10):
logits = torch.randn((1, vocab_size))
logits = logits_processor(ids, logits)
token = torch.argmax(logits).item()
# logits_processor.accept_token(token)
ids[0].append(token)
print(f"\x1b[1mfirst 10 tokens: \x1b[1;35m{tokenizer.decode(ids[0])}\x1b[0m")
logits_processor
is meant to be passed to model.generate
in a HuggingFace
transformers model but this integration is not yet super clean.
- UTF-8 support... a bit of fiddling but not terribly hard
- More expressive grammars... lookahead would be challenging.
- Easier integration with various tokenizers... LLaMA works well; T5 presents significant challenges; haven't even tried others.
- Testing and automatic benchmarking.
- Binary parse is probably not carrying its weight with all the caching and precomputation we're doing now; it should be rewritten to something less confusing. In fact it might work to just hijack Lark or something?
- 11833882144218229242
The code was originally adapted from ggerganov/llama.cpp#1773. In particular, the grammar parser is a pretty straightforward mechanical translation and the binary grammar format is identical.