You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently respect JSON Schema's semantics around the additionalProperties keyword; i.e. leaving it unset is interpreted as "any property not specified by the properties keyword (1) is allowed and (2) has no restrictions on its value (other than being valid JSON)".
These semantics are useful (see #887), and I think we should continue respecting them.
That being said, I also think that the majority of our users will expect the LLM to only produce the properties that were explicitly requested. Anything "extra" costs the user time and money, and it will likely be thrown away.
I recommend the following solution:
Recommend that all users use the pydantic interface (passing a BaseModel subclass to the json generation function) UNLESS they want the extra fine-grained control that directly passing a JSON object provides.
Write a custom BaseModel.model_json_schema implementation that sets additionalProperties to False and use it to convert the BaseModel to a JSON schema we'll generate against.
I feel far more comfortable being "opinionated" when our users use the high-level pydantic interface rather than the low-level JSON interface.
Additional notes:
currently users CAN get this behavior from pydantic if they add model_config = ConfigDict(extra="forbid") to their BaseModel
extra is allowed to take on the following values: "ignore" (default), "allow", "forbid"
when extra is "ignore", any additional properties will be discarded when validating an instance, while "allow" will propagate their values to the constructed instance
it should be safe to coerce "ignore" to "forbid" since we know anything extra is going to be discarded anyway
The text was updated successfully, but these errors were encountered:
I think of json as a bit of a "higher level" guidance function (e.g. in contrast with select), and therefore think it's OK to have reasonable but configurable defaults across both the high-level pydantic and slightly lower-level "JSON schema" interfaces.
We've chatted on this, but to continue the discussion openly, my gut feel is that some well-named kwarg on guidance.json that sets the "strict" or "ignore" style behavior is the right thing to do, and let users toggle it to match "true" json schema validation semantics maybe? I agree with your assessment that most people would be surprised to receive non-specified keys back from their constrained schema generation, so we should plan on making that opt-in. Not quite sure on the arg naming yet.
@Harsha-Nori I was swayed to your side after talking, but thank you for documenting here! Currently thinking something like strict_properties, but still undecided.
We currently respect JSON Schema's semantics around the
additionalProperties
keyword; i.e. leaving it unset is interpreted as "any property not specified by theproperties
keyword (1) is allowed and (2) has no restrictions on its value (other than being valid JSON)".These semantics are useful (see #887), and I think we should continue respecting them.
That being said, I also think that the majority of our users will expect the LLM to only produce the properties that were explicitly requested. Anything "extra" costs the user time and money, and it will likely be thrown away.
I recommend the following solution:
pydantic
interface (passing aBaseModel
subclass to thejson
generation function) UNLESS they want the extra fine-grained control that directly passing a JSON object provides.BaseModel.model_json_schema
implementation that setsadditionalProperties
toFalse
and use it to convert theBaseModel
to a JSON schema we'll generate against.I feel far more comfortable being "opinionated" when our users use the high-level
pydantic
interface rather than the low-level JSON interface.Additional notes:
model_config = ConfigDict(extra="forbid")
to theirBaseModel
extra
is allowed to take on the following values: "ignore" (default), "allow", "forbid"extra
is "ignore", any additional properties will be discarded when validating an instance, while "allow" will propagate their values to the constructed instanceThe text was updated successfully, but these errors were encountered: