Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-string tag values #134

Closed
occasionallydavid opened this issue Jun 28, 2022 · 10 comments · Fixed by #135
Closed

Non-string tag values #134

occasionallydavid opened this issue Jun 28, 2022 · 10 comments · Fixed by #135

Comments

@occasionallydavid
Copy link

I have an existing encoding that almost-but-not-quite fits msgspec. The "almost" is that tags are integer rather than string. Would it make sense to accept non-string tags? I think the only non-string case that would be valuable is for integers.

@jcrist
Copy link
Owner

jcrist commented Jun 29, 2022

Thanks for opening this issue - this is something I thought people may find useful, but didn't want to take the time to implement it until someone asked for it. Definitely doable though, I'll add it to the queue.

Out of curiosity - is using ints for tags something common in your org/industry/field/...? Or just something you came up with? I'm trying to determine how common this use case is.

@occasionallydavid
Copy link
Author

occasionallydavid commented Jun 29, 2022

In this project it is coming from the frontend, where all the tags are integer constants. Similarly in other projects a non-integer tag would have meant an additional mapping from those strings to however the structure is represented internally, for example an enum. I'm pretty sure there are a couple of open source RPC libraries that transcode to JSON that use something like this too. Guessing it'll be a good addition for a bunch of random use cases (mostly interop)

edit: I may be wrong, but when I was thinking "open source RPC" I suspect I might mean protobuf's JSON encoding. If it wasn't protobuf, it might have been "one of the other big N" libraries of that class

@jcrist
Copy link
Owner

jcrist commented Jul 1, 2022

Thanks. A quick bit of googling turns up other use cases (see e.g. serde-rs/serde#745), so you're definitely not the only one with this issue. I've implemented this in #135.

@jcrist
Copy link
Owner

jcrist commented Jul 1, 2022

There's a few other fixes I want to get in before the next release, but if you want to try things out by installing from github, your feedback would be welcome.

$ pip install git+https://github.com/jcrist/msgspec.git

@occasionallydavid
Copy link
Author

Whoa that's a huge diff :) This sounded straightforward but I guess not. Thanks a ton, I'll give it a whirl over the weekend

@jcrist
Copy link
Owner

jcrist commented Jul 5, 2022

This sounded straightforward but I guess not.

A lot of the components were already in place, but needed some massaging to work together. msgspec works hard to minimize allocations, so we don't allocate a PyObject (an int in this case) for the decoded tag. Instead we decode into and lookup with a standard int64_t. Most of the commit is just wiring up this code path from existing components, and tests to ensure correctness for both valid & invalid inputs.

Thanks a ton, I'll give it a whirl over the weekend.

No problem! I hope it's useful for you.

@occasionallydavid
Copy link
Author

Given:

import typing

import msgspec

from . import flags


OP_MAP = {
    'TaggedBase': 0,
    'SetThreadLabelStatus': flags.CHG_SET_THREAD_LABEL_STATUS,
}

GET_TAG = OP_MAP.__getitem__


class TaggedBase(msgspec.Struct,
                 tag=GET_TAG,
                 array_like=True):
    pass


class SetThreadLabelStatus(TaggedBase):
    thread_ids: typing.List[str] = []
    label: str
    applied: bool


Possible = typing.Union[
    SetThreadLabelStatus,
]

Root = typing.List[Possible]

An encode() call like:

encode([SetThreadLabelStatus(thread_ids=["123","234"], label='inbox',  applied=True)]))

Is producing:

b'[[2,"inbox",true,["123","234"]]]'

The docs say:

If you pass array_like=True when defining the struct type, they’re instead treated as array types during encoding/decoding (with fields serialized in their definition order).

Am I doing something wrong? thread_ids is rendering last rather than first

@jcrist
Copy link
Owner

jcrist commented Jul 6, 2022

The issue here is that thread_ids has a default value, so it's shifted to the end. We don't do keyword-only arguments, so any arguments without defaults are standard positional arguments and must be before any optional keyword arguments in the struct's __init__. The ordering used for __init__ is used everywhere else as well. You can see the field ordering via YourStruct.__struct_fields__:

>>> SetThreadLabelStatus.__struct_fields__
('label', 'applied', 'thread_ids')

So in your case if the field ordering here matters, you either need to remove the default arg for thread_ids, or add defaults for the other args.

I've tried to clarify the documentation in #137, but am not sure if that clarification is sufficient. Can you take a look at #137 and let me know if it would have answered your question (and if not, if you have any suggestions for improvement)?

@occasionallydavid
Copy link
Author

Thanks so much for this, now I'm using it in the project :)

@jcrist
Copy link
Owner

jcrist commented Jul 7, 2022

Glad to hear it! Please let me know how things work out and if you find any noticeable improvements from using msgspec (performance/maintainability/usability/...).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants