Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering $schema during meta-schema validation #1442

Open
gregsdennis opened this issue Sep 20, 2023 · 22 comments
Open

Encountering $schema during meta-schema validation #1442

gregsdennis opened this issue Sep 20, 2023 · 22 comments

Comments

@gregsdennis
Copy link
Member

gregsdennis commented Sep 20, 2023

In #1434, @karenetheridge left a comment regarding special-casing meta-schema validations. This issue continues that conversation.

When validating a schema with a meta-schema that contains an subschema with a different $schema value, the meta-schema cannot be expected to validate the entire document. Rather it must only apply to the appropriate portions of the schema.

An example to illustrate the problem:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$id": "base",
  "type": "object",
  "properties": {
    "foo": { "$ref": "foo" }
  },
  "$defs": {
    "foo-def": {
      "$schema": "http://json-schema.org/draft-07/schema#",
      "$id": "foo",
      "type": "array",
      "items": [
        { "type": "string" },
        { "type": "integer" }
      ]
    }
  }
}

This schema is valid, but it changes meta-schemas in the foo-def definition. Validating this as an instance against draft 2020-12 would fail because foo#/items is the array-form items which is disallowed in 2020-12.

It's necessary for the validator to have special knowledge that it's processing a meta-schema so that when it encounters a new $schema it can switch meta-schemas appropriately.

While Core 9.3.3 does address this, does it need to be clearer?

Given that a Compound Schema Document may have embedded resources which identify as using different dialects, these documents SHOULD NOT be validated by applying a meta-schema to the Compound Schema Document as an instance. It is RECOMMENDED that an alternate validation process be provided in order to validate Schema Documents. Each Schema Resource SHOULD be separately validated against its associated meta-schema.

This also ties in a little to Using $schema in JSON documents.

@gregsdennis gregsdennis added this to the stable-release milestone Sep 20, 2023
@jdesrosiers
Copy link
Member

If someone has better words suggest that they think are more clear, I'm open to that, but I think the spec already says everything it needs to say.

@gregsdennis
Copy link
Member Author

@Julian do you think we can set up a test for this? I mean, I created a test case above, but:

  1. Can the test indicate that it should be validating as a meta-schema?
  2. Is it a problem that this spans multiple drafts?

We already have the cross-draft.json tests in the optional directories, but they test changing drafts through a $ref rather than being embedded.

@jdesrosiers
Copy link
Member

We already have the cross-draft.json tests in the optional directories, but they test changing drafts through a $ref rather than being embedded.

The whole point of redefining $id as indicating an embedded schema was to make the behavior identical to referencing an external schema. So, I think "cross-draft" is a perfect place for these kinds of tests.

@Julian
Copy link
Member

Julian commented Sep 20, 2023

I think cross-draft looks right for that test yeah, at least at a first glance.

@gregsdennis
Copy link
Member Author

(It should also be noted that this can still happen in our stable spec world if the embedded schema's meta-schema declares a different vocabulary.)

@gregsdennis
Copy link
Member Author

@jdesrosiers @Julian, so cross-draft seems like a good place, assuming that the current test structure can support a meta-schema validation, but.... can the current test structure support a meta-schema validation (as opposed to just a schema validation of data that happens to also be a schema)?

@jdesrosiers
Copy link
Member

can the current test structure support a meta-schema validation (as opposed to just a schema validation of data that happens to also be a schema)?

I don't think it needs to. The tests doesn't need to validate the schema directly, what matters is the result of using the schema to validate instances. The example given at the top is a good example. An implementation that doesn't switch dialects properly will error on that schema which means they fail the test as expected. An implementation that does switch dialects properly should be able to pass a test that expects { "foo": ["a", 42] } to validate to true.

If we switch the dialects on that example and put draft-07 at the root and 2020-12 embedded, we can't test that an implementation should consider that schema invalid, but that's not a new problem for the test suite. In general we can't test for schemas that we expect to be invalid.

@gregsdennis
Copy link
Member Author

Okay. So we'll have to get creative, but the current setup can handle it.

(Interestingly, my implementation fails that test. Adding that to the list of things to do today...)

I wonder if these need to be optional tests because an implementation may not support (e.g.) draft 7.

@jdesrosiers
Copy link
Member

I wonder if these need to be optional tests because an implementation may not support (e.g.) draft 7.

Yeah, that's the reason I haven't contributed tests like this a long time ago. It depends on what dialects an implementation supports. We could define a custom dialect, but that leaves out implementations that don't support that even tho that functionality isn't actually a dependency of this behavior.

@Julian
Copy link
Member

Julian commented Sep 26, 2023

cross-draft.json is already in the optional directory (precisely for the above reason, that it depends which drafts an implementation supports).

@gregsdennis
Copy link
Member Author

I think we may need to update Core 8.1, which contains

A schema MUST successfully validate against its meta-schema

Maybe it just needs a link to 9.3.3 (add "as per section 9.3.3" or something)?

@Julian
Copy link
Member

Julian commented Sep 28, 2023

Because you mention it, I'll explicitly point out that to me it's also worth considering what the schema {"$ref": "https://json-schema.org/draft/2020-12/schema"} has to say about this example.

I.e. do we consider that that schema has any magic behavior where it will switch to using the appropriate draft when it $schema in the subresource?

If we don't think it has any magic behavior today based on the language of the existing spec, should it, if only to meet the requirements of the line you're pointing out.

@gregsdennis
Copy link
Member Author

With the new dialect requirements, {"$ref": "https://json-schema.org/draft/2020-12/schema"} isn't strictly valid. (It would need some kind of external dialect identification, like media type parameter or implementation setting.)

Assuming that we know the dialect (and thus the meta-schema), yes, I'd expect the operational dialect to shift to 2020-12 after following the $ref. But I don't consider that "magic" behavior.

@Julian
Copy link
Member

Julian commented Sep 28, 2023

What do you mean by you expect that to be the case -- are you claiming that's already specified by the existing spec? Or saying you want to add language to require that behavior in the new version?

It's certainly magical to me in that if you evaluate all the keywords in the draft2020-12 metaschema "normally" you would not get that behavior, as they specify that the values of applicator keywords are objects valid under itself (via the dynamicRef inside of it).

@gregsdennis
Copy link
Member Author

I think 9.3.3 (in the opening comment) does already specify this. Maybe it could be clearer somehow, but it already defines this behavior.

I agree it's less than ideal that you can't validate a schema as an ordinary instance, though. But I don't know a way around it.

@Julian
Copy link
Member

Julian commented Sep 28, 2023

I think 9.3.3 (in the opening comment) does already specify this.

Which behavior? To me, 9.3.3 says the opposite of what you said you expect the behavior to be.
To me it says that the schema {"$ref": "https://json-schema.org/draft/2020-12/schema"} when applied to your example will ignore the $schema in the subresource and continue on applying keywords as "usual", with no special behavior.

That section therefore reminds implementers that they need some separate interface for specifically validating schemas as schemas.

But if you give me the schema {"$ref": "https://json-schema.org/draft/2020-12/schema"} that has no "special" behavior in the existing specs.

My question/point was to say "if you're discussing this, maybe we should consider making it have special behavior by requiring implementers to treat the $ref keyword specially when the reference is a metaschema known by the implementation." Or maybe not. But I don't think the spec says what you say you expect the behavior to be today.

@gregsdennis
Copy link
Member Author

I've misunderstood your question, then. I thought you were asking about meta-schema validation of {"$ref": "https://json-schema.org/draft/2020-12/schema"} which needs a starting dialect, and {"$ref": "https://json-schema.org/draft/2020-12/schema"} itself should be validated according to that dialect.

My question/point was to say "if you're discussing this, maybe we should consider making it have special behavior by requiring implementers to treat the $ref keyword specially when the reference is a metaschema known by the implementation." Or maybe not. But I don't think the spec says what you say you expect the behavior to be today.

I'm not saying a $ref to a known meta-schema needs to be treated like a meta-schema validation. It's only through $schema that we get to "meta-schema validation mode."

I'd expect $ref-ing to a known meta-schema would treat that meta-schema like any other schema.

If you're then going to ask about our meta-schemas $ref-ing to each other, if you're starting in meta-schema validation mode, you stay in that mode even when following $refs.

@Julian
Copy link
Member

Julian commented Sep 28, 2023

Got it ok, at least I follow the logic then now thanks!

@jdesrosiers
Copy link
Member

Maybe it just needs a link to 9.3.3 (add "as per section 9.3.3" or something)?

I think this just needs to say, "A schema resource MUST successfully validate against its meta-schema". That makes it clear that it doesn't apply to compound schema documents. If we think it's necessary, we can also mention compound schema documents and link to 9.3.3.

Just to add my perspective to the other part of the discussion. I don't think there should be any "magic" involved in following a reference. The "magic" is that the compound schema needs to be deconstructed into its schema resources. Then each schema resource gets validated against its meta-schema in the normal way.

@gregsdennis
Copy link
Member Author

"A schema resource MUST successfully validate against its meta-schema"

@json-schema-org/core-team does it make sense to everyone that "schema resource" is exclusive of embedded schema resources? If so, I can work up a PR.

@jdesrosiers
Copy link
Member

jdesrosiers commented Oct 5, 2023

That's my understanding of the concept, but reading through the spec, I don't think that's made clear. It's implied in 9.3.3, but that's all.

I think referring to a "Schema Resource" is sufficient for this issue, but I also think we should follow up later to clarify the definition of "Schema Resource" where it's defined.

@karenetheridge
Copy link
Member

karenetheridge commented Oct 17, 2023

I agree it's less than ideal that you can't validate a schema as an ordinary instance, though. But I don't know a way around it.

The way around it is to forbid changing dialects in the middle of a document. If you want to cobmine schemas with different dialects, you need to use a $ref to a separate document.

That is - $schema can be legal at the top of any embedded schema resource, but if it is present it must match the root's $schema value, or any inferred dialect e.g. via a Content-Type header.

Is there any compelling reason why we need to allow changing dialects midway through a document? Forbidding it would make a lot of things (like this!) simpler. Bundling is the only thing I can think of, but how often does one want to bundle things together that use different dialects? Usually people just stick to one (the specification schema, or a slight variant of that e.g. one that forbids extra keywords, or defines a few "in-house" keywords that are used everywhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Discussion
Development

No branches or pull requests

6 participants
@karenetheridge @Julian @jdesrosiers @gregsdennis and others