Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port guidance's JSON implementation #48

Merged
merged 76 commits into from
Nov 20, 2024
Merged
Show file tree
Hide file tree
Changes from 19 commits
Commits
Show all changes
76 commits
Select commit Hold shift + click to select a range
09d8d59
implement schema JSON -> llgrammar JSON
hudson-ai Oct 25, 2024
78ccc30
Return GrammarWithLexer instead of TopLevelGrammar
hudson-ai Nov 7, 2024
b30d5e8
refactor json into mod
hudson-ai Nov 7, 2024
5f836ae
default to all types if type unspecified
hudson-ai Nov 8, 2024
e67ed0f
basic formats
hudson-ai Nov 8, 2024
261d27e
format parity
hudson-ai Nov 8, 2024
4d89e81
initial port of numeric ranges
hudson-ai Nov 8, 2024
9d58157
fix right inclusivity
hudson-ai Nov 8, 2024
c777deb
fix integer bounds from floats
hudson-ai Nov 8, 2024
1c6be00
more fixes for ints
hudson-ai Nov 8, 2024
a85000f
whitespace_flexible and separators
hudson-ai Nov 8, 2024
ba817e4
whitespace flexible consts and enums
hudson-ai Nov 8, 2024
8bbd244
required
hudson-ai Nov 9, 2024
6f1c35e
Merge branch 'main' into json_extension
hudson-ai Nov 9, 2024
f85598a
types is wonky
hudson-ai Nov 10, 2024
9aa5db5
substring (may pull out into separate PR)
hudson-ai Nov 10, 2024
49d2ca6
drop non-test
hudson-ai Nov 10, 2024
729425f
clean up
hudson-ai Nov 10, 2024
46bad35
null test
hudson-ai Nov 10, 2024
a898bf3
impl Default for JsonCompileOptions
hudson-ai Nov 11, 2024
49b8b6b
taken names
hudson-ai Nov 11, 2024
d576962
type inference
hudson-ai Nov 11, 2024
5b924aa
json_dumps
hudson-ai Nov 11, 2024
e8eb74a
cache recursive calls of ordered_sequence
hudson-ai Nov 12, 2024
f648ebe
check number bounds
hudson-ai Nov 12, 2024
b2fc186
a few UnsatisfiableSchemaError propagation cases
hudson-ai Nov 12, 2024
faa6162
ignore unsatisfiable schemas in anyOf
hudson-ai Nov 12, 2024
aeff512
remove substring.rs (for separate PR)
hudson-ai Nov 12, 2024
a3b0912
type inference
hudson-ai Nov 12, 2024
ce89057
drop todo
hudson-ai Nov 12, 2024
b94a085
normalize
hudson-ai Nov 14, 2024
ef6c066
filter
hudson-ai Nov 14, 2024
e462084
silence for now
hudson-ai Nov 14, 2024
6c694ad
start cleaning up a bit
hudson-ai Nov 14, 2024
7f2a74e
delete code!
hudson-ai Nov 14, 2024
42f526d
testish
hudson-ai Nov 14, 2024
0279fc5
normalize in try_from
hudson-ai Nov 14, 2024
64607e4
fmt normalize
hudson-ai Nov 14, 2024
cf3fe91
fix merged prefixItems by applying other schema's items
hudson-ai Nov 14, 2024
80f9ead
required as IndexSet
hudson-ai Nov 14, 2024
bdd7b46
refactor and flesh out try_type
hudson-ai Nov 14, 2024
91e9a6c
fix merge in case that a schema is unsat
hudson-ai Nov 14, 2024
3c38375
simple ref without siblings
hudson-ai Nov 14, 2024
e3ef006
use referencing
hudson-ai Nov 15, 2024
fa080a8
abspath
hudson-ai Nov 15, 2024
b7af4f3
attempt at recursive refs
hudson-ai Nov 15, 2024
063d0a1
simplify refs
hudson-ai Nov 15, 2024
c8bc68a
explicit default root uri
hudson-ai Nov 15, 2024
e0f5832
comment
hudson-ai Nov 15, 2024
87203f9
allow sibling keys on refs as long as they aren't too recursive
hudson-ai Nov 15, 2024
6f0d336
from bool
hudson-ai Nov 15, 2024
32b1cc0
use schema.rs
hudson-ai Nov 16, 2024
fb8744e
cargo fmt
hudson-ai Nov 16, 2024
1c89413
comment out block for a minute
hudson-ai Nov 16, 2024
66df303
fix defs
hudson-ai Nov 16, 2024
ea3a395
fix optional items
hudson-ai Nov 16, 2024
6e09951
validate enums and consts
hudson-ai Nov 18, 2024
6421707
check keyword validity
hudson-ai Nov 18, 2024
f050eba
cargo fmt
hudson-ai Nov 18, 2024
0e971cc
make @mmoskal less sad
hudson-ai Nov 19, 2024
be74d89
rename merge -> intersect and have it take ownership
hudson-ai Nov 19, 2024
31c3ba0
mark seen
hudson-ai Nov 19, 2024
ee592b0
make normalize shallow so we don't recursively re-normalize schemas t…
hudson-ai Nov 19, 2024
fd2e166
fix all but the most degenerate refs-with-sibling-keys
hudson-ai Nov 19, 2024
86c2bf3
cargo fmt
hudson-ai Nov 19, 2024
2aa4d87
encapsulate shared context
hudson-ai Nov 19, 2024
63507a9
rough limit on total schema size
hudson-ai Nov 19, 2024
5a884e5
fix sample_parser JsonCompileOptions
hudson-ai Nov 19, 2024
d822980
Merge branch 'main' into json_extension
hudson-ai Nov 19, 2024
a3a1847
take Value as value
hudson-ai Nov 19, 2024
6b9762f
use shallow clone
mmoskal Nov 19, 2024
6249e2c
Merge branch 'json_extension' of https://github.com/hudson-ai/llguida…
mmoskal Nov 19, 2024
6cbfa13
anyhow-ify numeric regex builders
hudson-ai Nov 19, 2024
20004c3
cargo fmt
hudson-ai Nov 20, 2024
ba50642
depend on jsonschema_validation feature
hudson-ai Nov 20, 2024
adac6bc
remove non-tests
hudson-ai Nov 20, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion parser/src/earley/from_guidance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ fn grammar_from_json(
input.lark_grammar.is_none(),
"cannot have both json_schema and lark_grammar"
);
let opts = JsonCompileOptions { compact: false };
let opts: JsonCompileOptions = JsonCompileOptions::default();
opts.json_to_llg(json_schema)?
} else {
lark_to_llguidance(input.lark_grammar.as_ref().unwrap())?
Expand Down
2 changes: 1 addition & 1 deletion parser/src/ffi.rs
Original file line number Diff line number Diff line change
Expand Up @@ -288,7 +288,7 @@ fn new_constraint_json(init: &LlgConstraintInit, json_schema: *const c_char) ->
.map_err(|_| anyhow::anyhow!("Invalid UTF-8 in json_schema"))?;
let json_schema = serde_json::from_str(json_schema)
.map_err(|e| anyhow::anyhow!("Invalid JSON in json_schema: {e}"))?;
let opts = JsonCompileOptions { compact: false };
let opts = JsonCompileOptions::default();
let grammar = opts
.json_to_llg(&json_schema)
.map_err(|e| anyhow::anyhow!("Error compiling JSON schema to LLG: {e}"))?;
Expand Down
341 changes: 259 additions & 82 deletions parser/src/json.rs → parser/src/json/compiler.rs

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions parser/src/json/formats.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
use lazy_static::lazy_static;
use std::collections::HashMap;

lazy_static! {
static ref FORMAT_PATTERNS: HashMap<&'static str, &'static str> = {
HashMap::from([
("date-time", r"(?P<date>[0-9]{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01]))[tT](?P<time>(?:[01][0-9]|2[0-3]):[0-5][0-9]:(?:[0-5][0-9]|60)(?P<time_fraction>\.[0-9]+)?(?P<time_zone>[zZ]|[+-](?:[01][0-9]|2[0-3]):[0-5][0-9]))"),
("time", r"(?:[01][0-9]|2[0-3]):[0-5][0-9]:(?:[0-5][0-9]|60)(?P<time_fraction>\.[0-9]+)?(?P<time_zone>[zZ]|[+-](?:[01][0-9]|2[0-3]):[0-5][0-9])"),
("date", r"[0-9]{4}-(?:0[1-9]|1[0-2])-(?:0[1-9]|[12][0-9]|3[01])"),
("duration", r"P(?:(?P<dur_date>(?:(?P<dur_year>[0-9]+Y(?:[0-9]+M(?:[0-9]+D)?)?)|(?P<dur_month>[0-9]+M(?:[0-9]+D)?)|(?P<dur_day>[0-9]+D))(?:T(?:(?P<dur_hour>[0-9]+H(?:[0-9]+M(?:[0-9]+S)?)?)|(?P<dur_minute>[0-9]+M(?:[0-9]+S)?)|(?P<dur_second>[0-9]+S)))?)|(?P<dur_time>T(?:(?P<dur_hour2>[0-9]+H(?:[0-9]+M(?:[0-9]+S)?)?)|(?P<dur_minute2>[0-9]+M(?:[0-9]+S)?)|(?P<dur_second2>[0-9]+S)))|(?P<dur_week>[0-9]+W))"),
("email", r"(?P<local_part>(?P<dot_string>[^\s@\.]+(\.[^\s@\.]+)*))@((?P<domain>(?P<sub_domain>[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?)(\.(?P<sub_domain2>[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?))*)|\[(?P<ipv4>((([0-9])|(([1-9])[0-9]|(25[0-5]|(2[0-4]|(1)[0-9])[0-9])))\.){3}(([0-9])|(([1-9])[0-9]|(25[0-5]|(2[0-4]|(1)[0-9])[0-9]))))\])"),
("hostname", r"[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(\.[a-zA-Z0-9]([a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*"),
("ipv4", r"((([0-9])|(([1-9])[0-9]|(25[0-5]|(2[0-4]|(1)[0-9])[0-9])))\.){3}(([0-9])|(([1-9])[0-9]|(25[0-5]|(2[0-4]|(1)[0-9])[0-9])))"),
("ipv6", r"(?:(?P<full>(?:[0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}))|(?:::(?:[0-9a-fA-F]{1,4}:){0,5}(?P<ls32>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:(?P<h16_1>[0-9a-fA-F]{1,4})?::(?:[0-9a-fA-F]{1,4}:){0,4}(?P<ls32_1>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,1}[0-9a-fA-F]{1,4})?::(?:[0-9a-fA-F]{1,4}:){0,3}(?P<ls32_2>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,2}[0-9a-fA-F]{1,4})?::(?:[0-9a-fA-F]{1,4}:){0,2}(?P<ls32_3>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,3}[0-9a-fA-F]{1,4})?::[0-9a-fA-F]{1,4}:(?P<ls32_4>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,4}[0-9a-fA-F]{1,4})?::(?P<ls32_5>[0-9a-fA-F]{1,4}:[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,5}[0-9a-fA-F]{1,4})?::(?P<h16_2>[0-9a-fA-F]{1,4}))|(?:((?:[0-9a-fA-F]{1,4}:){0,6}[0-9a-fA-F]{1,4})?::)"),
("uuid", r"(?P<time_low>[0-9a-fA-F]{8})-(?P<time_mid>[0-9a-fA-F]{4})-(?P<time_high_and_version>[0-9a-fA-F]{4})-(?P<clock_seq_and_reserved>[0-9a-fA-F]{2})(?P<clock_seq_low>[0-9a-fA-F]{2})-(?P<node>[0-9a-fA-F]{12})"),
("unknown", r"(?s:.*)")
])
};
}

pub fn lookup_format(name: &str) -> Option<&str> {
FORMAT_PATTERNS.get(name).copied()
}
4 changes: 4 additions & 0 deletions parser/src/json/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pub mod compiler;
mod formats;
mod numeric;
mod substring;
Loading
Loading