Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Binary IDL With MessagePack Bytes #5742

Merged
merged 26 commits into from
Sep 18, 2024
Merged

Conversation

Future-Outlier
Copy link
Member

@Future-Outlier Future-Outlier commented Sep 12, 2024

Tracking issue

#5318

Why are the changes needed?

What changes were proposed in this pull request?

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.30%. Comparing base (e67aae0) to head (bf2a8af).
Report is 4 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5742      +/-   ##
==========================================
+ Coverage   36.21%   36.30%   +0.08%     
==========================================
  Files        1303     1305       +2     
  Lines      109644   109991     +347     
==========================================
+ Hits        39710    39927     +217     
- Misses      65810    65909      +99     
- Partials     4124     4155      +31     
Flag Coverage Δ
unittests-datacatalog 51.37% <ø> (ø)
unittests-flyteadmin 55.62% <ø> (+0.01%) ⬆️
unittests-flytecopilot 12.17% <ø> (ø)
unittests-flytectl 62.26% <ø> (+0.04%) ⬆️
unittests-flyteidl 7.12% <ø> (ø)
unittests-flyteplugins 53.35% <ø> (ø)
unittests-flytepropeller 41.87% <ø> (+0.12%) ⬆️
unittests-flytestdlib 55.21% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Future-Outlier Future-Outlier changed the title [wip] Binary IDL With MessagePack Bytes [wip] [RFC] Binary IDL With MessagePack Bytes Sep 12, 2024
@Future-Outlier Future-Outlier changed the title [wip] [RFC] Binary IDL With MessagePack Bytes [RFC] Binary IDL With MessagePack Bytes Sep 12, 2024
@Future-Outlier Future-Outlier marked this pull request as ready for review September 12, 2024 12:56
This was referenced Sep 14, 2024
|-----------------------------------|----------------------------------------------|
| Protobuf Struct -> JSON String -> Python Val | Binary (value: MessagePack Bytes, tag: msgpack) IDL Object -> Bytes -> (Dict ->) -> Python Val |

Note: if a python value can't directly be converted to `MessagePack Bytes`, we can convert it to `Dict`, and then convert it to `MessagePack Bytes`.
Copy link
Contributor

@wild-endeavor wild-endeavor Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note here to say very clearly that there is no JSON in the new type at all. JSON plays zero part of the new spec (except for the schema).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem

Copy link
Contributor

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments. Overall, I like the direction.

rfc/system/5741-binary-idl-with-message-pack.md Outdated Show resolved Hide resolved
msgpack_bytes = lv.scalar.json.value
else:
raise ValueError(f"{tag} is not supported to decode this Binary Literal: {lv.scalar.binary}.")
return msgpack.loads(msgpack_bytes)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the full signature of to_python_value? Specifically, we can use typing.cast(expected_python_type, msgpack.loads(msgpack_bytes)) to get type-checkers to agree with this, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We don't need this, because we will use from mashumaro.codecs.msgpack import MessagePackDecoder, MessagePackEncoder to encode and decode.
    It will make sure we convert it back to a type we 100% want.

  2. msgpack.dumps will only be used when dealing with untyped dict.

rfc/system/5741-binary-idl-with-message-pack.md Outdated Show resolved Hide resolved
rfc/system/5741-binary-idl-with-message-pack.md Outdated Show resolved Hide resolved
}
}
// Use Message Pack as Default Tag for deserialization.
func MakeBinaryLiteral(v []byte) *core.Literal {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not called anywhere yet (besides a few tests). We can make the tag part of the signature (and assume a default value).

Comment on lines 522 to 525
MsgPack is a good choice because it's more smaller and faster than UTF-8 Encoded JSON String.

You can see the performance comparison here: https://github.com/flyteorg/flyte/pull/5607#issuecomment-2333174325
We will use `msgpack` to do it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison should move to the Alternatives section. The conclusion should be about how this design ticks the two problems we set out to solve (1. a better representation for json objects in Flyte, and 2. Fix Attribute Access once and for all).


1. No JSON Schema provided:

Input is expected as an `Object` (e.g., `{"a": 1}`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a Javascript Object right? Let's add that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a section about pasting? How do users paste in something? Can add to unresolved questions if you want. If this is a Javascript object, we should allow json pasting, but what about yaml? What about msgpack bytes if they were copied from a binary file?


Input is expected as an `Object` (e.g., `{"a": 1}`).

2. JSON Schema provided:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both cases, with or without JSON schema... what happens after the user enters data? Can you add a short description of what happens? I assume it's some JS msgpack library that will turn the object into bytes.

wild-endeavor
wild-endeavor previously approved these changes Sep 18, 2024
eapolinario
eapolinario previously approved these changes Sep 18, 2024
@Future-Outlier Future-Outlier enabled auto-merge (squash) September 18, 2024 07:19
@Future-Outlier Future-Outlier enabled auto-merge (squash) September 18, 2024 07:58
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
Signed-off-by: Future-Outlier <[email protected]>
@Future-Outlier Future-Outlier self-assigned this Sep 18, 2024
@Future-Outlier Future-Outlier added the rfc A label for RFC issues label Sep 18, 2024
@Future-Outlier Future-Outlier merged commit 312910d into master Sep 18, 2024
47 of 48 checks passed
@Future-Outlier Future-Outlier deleted the MessagePack-IDL branch September 18, 2024 11:15
@eapolinario
Copy link
Contributor

Thank you for all your work on this, @Future-Outlier ! This feature is going to solve a massive a pain in the Flyte ecosystem.

return true
}

return isSameTypeInJSON(upstreamMetadata, downstreamMetadata) ||\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Future-Outlier for identifying a solution to this problem 🙇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc A label for RFC issues
Projects
Status: Implementation in progress
Development

Successfully merging this pull request may close these issues.

4 participants