Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Metadata for Schema? #192

Open
jamesmunns opened this issue Nov 28, 2024 · 6 comments
Open

Additional Metadata for Schema? #192

jamesmunns opened this issue Nov 28, 2024 · 6 comments

Comments

@jamesmunns
Copy link
Owner

CC @max-heller and #179

There have been a couple of asks for additional schema metadata. Off the top of my head:

  • Things like max size (for bounded types, and possibly annotations for unbounded types)
  • Things like descriptions/comments (though some of this veers into postcard-rpc's Endpoints and Topics, which tend to benefit from metadata as well

Open questions would be:

  • Should (some or all of) these fields affect the schema hash calculation?
  • How can users opt-in/out of sending this information over the wire?
@max-heller
Copy link
Collaborator

@jamesmunns
Copy link
Owner Author

Whatcha mean by custom enum discriminants? (Postcard specifically states that it uses "lexical ordering")

@max-heller
Copy link
Collaborator

max-heller commented Nov 28, 2024

Whatcha mean by custom enum discriminants? (Postcard specifically states that it uses "lexical ordering")

enum Foo {
    A = 1,
    ...
}

Similar to comments, serde and postcard don't care about discriminants and use a 0-indexed "lexical ordering", but some use cases of Schema as a reflection mechanism might.

@max-heller
Copy link
Collaborator

Things like max size (for bounded types, and possibly annotations for unbounded types)

Annotations as in something like this?

#[postcard(serialized_size(max = 512))]
bytes: Vec<u8>,

Should (some or all of) these fields affect the schema hash calculation?

One way this could work is by having a wrapper type for customizing hashing behavior:

struct HashBy<T> {
    // Which fields to include in the hash
    fields: Fields,
    value: T,
}
// Could be a bitset or something more compact
struct Fields {
    names: bool,
    max_size: bool,
    ...
}
impl Hash for HashBy<NamedType> {}
...

How can users opt-in/out of sending this information over the wire?

I could see this working with a SerializeWith<T> wrapper (similar to the one above for hashing) combined with optional/defaulted fields for max size, comments, etc. on the deserializing side.

One other open question:

  • How much will additional (unused) metadata affect type and binary sizes? Ideally it could be optimized out and comment strings wouldn't end up getting embedded in binaries if only the basic schema is needed, but I'm not sure just how smart the compiler is with consts

@jamesmunns
Copy link
Owner Author

re: hashing, I specifically meant what postcard-rpc does for creating a Key from a NamedType.

re: annotations, it could mean that! It would specifically be an annotation used when deriving postcard-schema::Schema (from the postcard-derive crate). This data would show up in NamedType (or somewhere similar). It's unclear if/how this would affect postcard itself (e.g. should we reject serializing/deserializing types that exceed this annotation? for example a String with a max of 512 but contains 600 bytes)

re: enum discriminants, hmm, that makes sense, I wonder if this adds more confusion than is useful.

re: "How much will additional (unused) metadata affect type and binary sizes", I would assume for "non-postcard-rpc users", it would be elided. However postcard-rpc supports sending the schemas for all endpoints, so I would assume it would be included in those cases. I don't assume the compiler is smart enough (yet) to totally remove unused fields (only totally unused consts).

@max-heller
Copy link
Collaborator

re: hashing, I specifically meant what postcard-rpc does for creating a Key from a NamedType.

The least surprising option would probably be to consider only the pieces that break wire compatibility if changed, i.e. only the serde-relevant pieces.

should we reject serializing/deserializing types that exceed this annotation? for example a String with a max of 512 but contains 600 bytes

Would be nice to have a serializer/deserializer flag to reject oversized inputs (re. #135) but that might be tricky to integrate with the various postcard::from_*() helpers.

re: enum discriminants, hmm, that makes sense, I wonder if this adds more confusion than is useful.

It might, but wanted to mention it since they're meaningful in some cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants