Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port guidance's JSON implementation #48

Merged
merged 76 commits into from
Nov 20, 2024

Conversation

hudson-ai
Copy link
Collaborator

Working towards resolving #44

Super draft PR, just opening for feedback and help with my extremely amateurish rust (thanks @mmoskal!)

To run guidance's JSON tests with this as the backend, check out my (also very rough) guidance branch
https://github.com/hudson-ai/guidance/tree/llguidance_json

@hudson-ai
Copy link
Collaborator Author

@microsoft-github-policy-service agree company="Microsoft"

Copy link
Member

@mmoskal mmoskal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks great!

parser/src/json/compiler.rs Outdated Show resolved Hide resolved
right: Option<f64>,
left_inclusive: bool,
right_inclusive: bool,
) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be easy enough to use anyhow::Result<String> and than ensure!() instead of assert!() and bail!() instead of panic!(), at least in the top-level function. The inner asserts should probably never happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. This particular file is thankfully the result of chatgpt converting your python to rust (with some annoying debugging since it took it upon itself to incorrectly modify some of the logic...)

@hudson-ai
Copy link
Collaborator Author

overall looks great!

Thanks! Please don't approve until it's done though 😆
That being said, I can totally split this into several PRs.

@@ -247,6 +251,7 @@ struct Context<'a> {
// TODO: actually use this to handle draft-specific behavior
draft: Draft,
defs: Rc<RefCell<HashMap<String, Schema>>>,
seen: Rc<RefCell<HashSet<String>>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at some point, you will probably want Rc<RefCell<InnerContext>> (probably when adding normalize counters)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, I'll look into it. Was just about to start adding said counters to track the number of Schema nodes that are being constructed. Probably not to normalize itself though, as I just made that "shallow".

@hudson-ai
Copy link
Collaborator Author

@mmoskal re: consts and enums...

"Upstairs" in the grammar compiler, I'm already transforming non-atomic const nodes (objects and arrays) into strict schemas that are then being passed to gen_json_*. This is so that they're actually properly lexed and have the correct whitespace behavior.

I could do this earlier in the Schema builder, doing it for "atomic" types too, e.g.

{"const": 42} -> {"type": "integer", "minimum": 42, "maximum": 42}

This would allow our existing intersection logic to handle these cases rather than using a full JSON validator. It would be more limited in its capabilities, but it would solve the problem of not having a real schema object when we're merging.

The only complication here is bools, which have no built-in mechanism for constraining them. So I think I would just have to add a LiteralBool type.

@hudson-ai
Copy link
Collaborator Author

Hoping to merge this ASAP though, and I think I will worry about consts, enums, and refs in a follow-up PR. Will work on counters and unnecessary cloning today

@mmoskal
Copy link
Member

mmoskal commented Nov 19, 2024

I like the const->schema idea. We just need to make sure we get similarly efficient schemas from out min/max etc logic.

Also, for the case of enum it would be ideal to generate a single regex for the enum, not a grammar (grammars have a limit of at most ~200 items in a select, but regexes don't).

But all good for later PRs, let's marge this!

@hudson-ai hudson-ai changed the title [Draft] Port guidance's JSON implementation Port guidance's JSON implementation Nov 20, 2024
@hudson-ai hudson-ai merged commit fdd515e into microsoft:main Nov 20, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants