-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Query] True explanation behind the Guidance project? #1040
Comments
Hi @oflakne26, Thank you so much for the kind words and your interest in guidance :). Sorry to make you search -- we haven't written up a formal publication yet, but plan to do so in the near future! I can share some high level details of the project's history and how it works. Definitely feel free to ask more questions if there's anything I missed or if you'd like more detail. The project started at Microsoft Research in ~October of 2022. A few of us were given early access to GPT-4, and started collaborating across the company to build the very first version of Copilot (back then, it was called Bing Chat or by its internal codename Sydney). We were all floored by the new capabilities of these models -- keep in mind that ChatGPT hadn't even been released yet -- and wrote about our research experiences in the Sparks of AGI paper: https://arxiv.org/abs/2303.12712 However, as amazed as we were by the potential, building real world applications on top of these models was incredibly painful (even more so back then when the instruction tuning was less powerful). Integrating LLMs into traditional software stacks was painful for a number of reasons:
To address these challenges, we started building Guidance as an internal tool just for our own development. As we began to rely on it more and more, and discover new optimization opportunities or new features, we decided to open source it and share it with the community in May of 2023. The first version of guidance was a DSL (https://en.wikipedia.org/wiki/Domain-specific_language) which used a handlebars-like template language syntax that looked something like this: program = guidance("""
{{~#system~}}
You are a helpful assistant.
{{~function_def get_current_weather}}
{{~/system~}}
{{~#user~}}
Get the current weather in New York City.
{{~/user~}}
{{~#while True~}}
{{~#assistant~}}
{{gen 'answer' temperature=1.0 max_tokens=50 functions="auto"}}
{{~/assistant~}}
{{#if answer_function_call}}
{{~#function~}}
{{answer_function_call()}}
{{~/function~}}
{{else}}
{{break}}
{{/if}}
{{/while}}
""") which was great in many ways (works across languages, the programs can be serialized safely as simple strings, etc.). However, it also 1) had a steep learning curve for the python community, 2) was painful to debug for end users and 3) was a pain to keep developing and maintaining, since any time we wanted to add a new language feature (eg while loops), we had to update the entire language itself. We made the decision over the summer to switch to a more Pythonic interface, which is the current syntax you see today. We'll probably write up a blog at some point about the design decisions that went into this interface change too! Ok, all that said, how does guidance work? Fundamentally, guidance does three things today:
We'll write up a proper academic paper about our approach at some point, but on a high level, the python programs users write get turned into a formal grammar (https://en.wikipedia.org/wiki/Formal_grammar) under the hood which is a formal way of specifying what is and isn't allowed to be generated by the model. These are used in e.g. programming languages to determine what is and isn't a valid program. Similarly, while users of guidance feel like they're writing just plain python, we help them formally specify exactly what the model is and isn't allowed to generate. I'm sure you know all of this already @oflakne26 but wrote it out for the benefit of future readers) Once we have a grammar, we use a modified Earley parser (https://en.wikipedia.org/wiki/Earley_parser) under the hood to build token masks. To generate text, language models are doing next token prediction - just determining what token is most likely to follow the text that they have already seen. To do this, they calculate a probability distribution over all possible tokens before sampling one, and then autoregressively continue the process. By "token masking", I mean that our parser manipulates this probability distribution, suppressing any tokens that aren't allowed by the grammar so that the model can only sample tokens that are valid. We chose to use an Earley parser because it's among the most flexible parsing algorithms out there (which means users can write much more expressive constraints). This differentiates us from most other constrained decoding libraries out there, which use simpler parsers that can only handle e.g. regular grammars. Earley parsers come with tradeoffs -- they can be quite slow if implemented naively -- so we've taken great care to write a very high performance implementation (for details check out https://github.com/microsoft/llguidance). I talk about these concepts with a lot of heavy visuals in my //build talk from earlier this year, which might be worth a watch if you have the time: https://www.youtube.com/watch?v=qXMNPVVlCMs This post is getting quite long, but hope it gives you some insight! Happy to answer any further questions you have, and thanks again for your excitement and interest :) |
Hello,
I am a computational linguist working on an exciting project regarding NLP and machine translation.
Custom logits processors and, thereby, constrained decoding are of high interest to me, and I have found Guidance's strategies to be the most successful, after testing many of my own.
However, I cannot find a formal publication, such as a research paper, that introduces Guidance, at least not publicly. In fact, as far as I understand, the project is nearly unknown by the mainstream media.
I would like to get to know the project better, especially regarding how it was initially built. Please, if you are associated with the developers and able, provide me with some documentation on the methodology and from where it derives. I am not the best at reading source code and understanding it immediately.
Thank you! :)
The text was updated successfully, but these errors were encountered: