different parsers #2

gminorcoles · 2023-07-22T17:07:50Z

this is cool, thanks for starting down this path with PyTorch. I came here from the HN discussion. I have been focusing on llama 2 since it seems for me to be a bit of a tipping point and its time to dig into local LLMs.

So lets say I want to limit the output of llama to valid Python code? The python grammar seems no longer to be EBNF but rather PEG. I was thinking that for my purposes I would actually be very happy if the LLM output was constrained to valid python AST. Either of these requires a different parser and thus code to handle each case.

Have you thought about how to handle this potential proliferation of parser formats? I am probably going to try to copy your approach and extend this to handle python AST.

burke · 2023-08-15T15:27:04Z

Hey, sorry, I'm chronically bad at GitHub notifications.

I think that's a really cool idea. My knowledge on the theoretical CS part of parsing theory etc. is fuzzy at best so I can't think through what an implementation would look like here for PEG as opposed to EBNF/CFG, but I would be thrilled to see that work.

The actual EBNF parsing code here was ported directly from that equivalent feature in llama.cpp with essentially no changes. I think it would be really excellent if we could support a richer format or a handful of them (this dialect of EBNF is quite rudimentary). I don't currently have any plans to work on this myself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

different parsers #2

different parsers #2

gminorcoles commented Jul 22, 2023

burke commented Aug 15, 2023

different parsers #2

different parsers #2

Comments

gminorcoles commented Jul 22, 2023

burke commented Aug 15, 2023