Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canonical JSON #1226

Open
sudo-bmitch opened this issue Dec 12, 2024 · 7 comments
Open

Canonical JSON #1226

sudo-bmitch opened this issue Dec 12, 2024 · 7 comments

Comments

@sudo-bmitch
Copy link
Contributor

The spec currently recommends canonical JSON. That has the following requirement:

The members production in object must consist of keys in lexicographically sorted order.

However, a majority of the implementations use Go's encoding/json package which outputs fields in the order they are included in the struct. Should the spec be updated to align with the common behavior, and call out the OCI provided Go structs as the recommended order of the fields?

@tianon
Copy link
Member

tianon commented Dec 12, 2024

I'd honestly just axe that whole section -- JSON isn't supposed to be order-dependent (even though it is in practice).

@sudo-bmitch
Copy link
Contributor Author

I have an ideal goal of reproducible JSON, even if it doesn't follow the official canonical spec. Reproducible images today, when implemented, often depend on using the same builder, sometimes even the same version of that builder. If we can get to the point of different tools being able to reproduce each other's output, that would improve security in the ecosystem, detecting issues not just with a build host, but an issue with the build tooling itself. I'm hoping the spec can give those implementations a way to interoperate.

@tianon
Copy link
Member

tianon commented Dec 13, 2024

If we say our structs are the "canonical order", then 1) we need to evaluate all our structs to decide if we "like" their ordering and 2) decide what to do with fields we don't specify (yet?), which are valid per the spec. I don't think the benefits are worthwhile here -- if users want "reproducible" images, they'll need to choose an order, and "sorted" order is much saner and more well-supported in general than "the arbitrary order of the structs in the image-spec repo" (which in some languages/tools will be impossible to enforce/adhere to).

See golang/go#63397 (comment) for some related discussion in the Go context specifically (in the encoding/json/v2 pre-proposal discussion).

@sudo-bmitch
Copy link
Contributor Author

If we say our structs are the "canonical order", then 1) we need to evaluate all our structs to decide if we "like" their ordering

They've been widely used for a long time, so I think there's implicit agreement from the community with the current order.

and 2) decide what to do with fields we don't specify (yet?), which are valid per the spec.

I'm not sure those matter for the reproducibility issue. New fields would get added as needed, with a location decided at the time of the PR, likely with omitempty. Anyone using a new field is intentionally non-reproducible with older tools that do not use the new field. And anyone that adds a custom field is similarly non-reproducible with other tools that don't have those same custom fields.

I don't think the benefits are worthwhile here -- if users want "reproducible" images, they'll need to choose an order, and "sorted" order is much saner and more well-supported in general than "the arbitrary order of the structs in the image-spec repo" (which in some languages/tools will be impossible to enforce/adhere to).

I agree that sorted is saner, and an ideal I'd like to get to. But from the current tooling, it's largely not done, so I'm looking to document the existing state of the ecosystem.

See golang/go#63397 (comment) for some related discussion in the Go context specifically (in the encoding/json/v2 pre-proposal discussion).

If the json/v2 makes sorted struct entries easy to implement, then I'd be in favor to change the advice back to remove the exception for field ordering when that's used in the ecosystem. We could even make that an implementers note now to encourage a future migration.

@rchincha
Copy link

rchincha commented Dec 19, 2024

JSON isn't supposed to be order-dependent (even though it is in practice).

because folks use the same tooling, hence get the same order?

Also, not all tooling is expected to be in golang?

if users want "reproducible" images

hmm.. mandate this in spec and move to "ordered maps" in implementation? - so keep current spec as "canonical oci".

Guidance that "stick with a particular ecosystem/tooling" is practical advice but weird coming from a "standards" body.

Lexicographic ordering - too big a lift wrt existing tooling.

@sudo-bmitch
Copy link
Contributor Author

sudo-bmitch commented Dec 19, 2024

Looking over Go's json/v2, I'm seeing functionality for converting JSON output to canonical JSON output: https://pkg.go.dev/github.com/go-json-experiment/json/jsontext#Value.Canonicalize

And on StackOverflow, there's this suggestion for how one could implement a Canonicalized output today without too much code: https://stackoverflow.com/a/61887446/596285

func JSONRemarshal(bytes []byte) ([]byte, error) {
    var ifce interface{}
    err := json.Unmarshal(bytes, &ifce)
    if err != nil {
        return nil, err
    }
    return json.Marshal(ifce)
}

I think there are a few options:

  1. Delete the canonical recommendation, which makes reproducibility between tools a problem for individual tools to resolve.
  2. Leave the canonical recommendation as is, and leave the spec unchanged, and Go implementations ideally canonicalize their output (with either json/v2 or workarounds). This leaves a gap where our advice does not match real world usage of the spec.
  3. Change the recommendation to align with the common Go output today, with a note that canonical output will likely be recommended in the future.
  4. Reorder the structs to make Go's output match canonical output. There will be difficult edge cases in trying to implement this, particularly with the embedded Platform field in

Of these choices, I feel that 3 gives the best guidance to implementers of the spec.

(Edited to add StackOverflow hack.)

@tianon
Copy link
Member

tianon commented Dec 19, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants