Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify non-binary and binary data, take 2 #284

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kelnos
Copy link
Contributor

@kelnos kelnos commented Mar 15, 2024

This allows users to create a payload struct (that implements Serialize/Deserialize) that has any binary payload data embedded in the struct directly, rather than needing to look in a "side" Vec of payload data, and have to guess what order the binary payloads fit into their data model.

To accomplish this, there are new Serializer and Deserializer implementations that mostly wrap the Ser/Deser impls provided for serde_json::Value. Unfortunately serde_json doesn't expose everything necessary to do this in a truly simple way; some duplication of serde_json's functionality is needed.

NB: due to difficulties with lifetimes, the deserializer has to currently clone the Vec of binary payloads as they get passed around to sub-deserializers. While this isn't the worst thing (cloning Bytes is cheap), it's not free, and this needs to be revisited and fixed.

Depends on #285.
Closes #276.
Supersedes #283.

I also took the opportunity to migrate both socketioxide and engineioxide from representing binary payloads as Vec<u8> to bytes::Bytes. This is more or less necessary in socketioxide, but isn't strictly necessary for engineioxide, though IMO it's nice to be consistent. I can move that to a separate PR if you'd prefer.

Some notes:

  • Aside from the very basic tests I wrote, I haven't tested this. I've done a little more ad-hoc testing and have fixed up all of the unit/doc/e2e tests, so I feel a bit more confident about the code.
  • I haven't benchmarked this; would like to solve the lifetimes issue so I can stop cloning the binary payloads first.
  • There's probably some documentation that needs to be updated, will need to go through it. At the very least there should be a mention that if anyone has a data type that contains binary payloads and wants to convert to a serde_json::Value, they should use socketioxide::to_value() instead of serde_json::to_value(), as the former will extract binary bits and replace with placeholders, while the latter will serialize the binary bits to a JSON array of numbers.
  • With this new implementation, we don't remove the Bin extractor or the .bin() methods on the socket/ack/etc. If the user is using serde_json::Value instead of a custom struct with embedded Bytes members, they'll need both of those things.
  • Despite Bin still being there, it maybe should implement FromMessageParts, since cloning Bytes is cheap, and (hopefully) the most common usage should be Data with a custom struct with embedded Bytes members. Data, in this case, should probably implement FromMessage only. As discussed before, I think TryData should remain as FromMessageParts in case users need to attempt to deserialize more than once. I haven't implemented these changes.

socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch 2 times, most recently from 5d620be to 3a795d1 Compare March 15, 2024 05:31
@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch from 3a795d1 to d1e30f6 Compare March 15, 2024 05:35
socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
socketioxide/src/value/de.rs Fixed Show fixed Hide fixed
@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch from d1e30f6 to b1f7734 Compare March 15, 2024 05:40
@kelnos
Copy link
Contributor Author

kelnos commented Mar 15, 2024

Ok, nevermind on the lifetimes thing; not sure what I was doing the first time, but I figured it out without any trouble...

@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch 3 times, most recently from 7dafe8c to 9de323e Compare March 15, 2024 06:35
@kelnos
Copy link
Contributor Author

kelnos commented Mar 15, 2024

Started looking into the emit() side of things (the original changes above only handle receiving packets with binary payloads). There were a couple ways to approach it.

The first one simpler, but IMO can cause some confusion with how the API is supposed to be used. In this option, there are really no changes: emit() on Socket, ConfOperators, and BroadcastOperators now needs to call socketioxide::to_value() (instead of serde_json::to_value()) in order to extract any binary data, and convert the places where there was binary data into placeholder objects. Pretty straightforward.

The potential user confusion, to me, comes from the fact that those emit() methods all take a T: Serialize, regardless of whether or not the user has called bin() already to set binary data. So there could be a case where someone calls bin(), and then calls emit() with some struct that also has binary data embedded in it. Then what do we do? The best thing to do is probably return an error, since we can't be sure what the actually wanted: did they want to overwrite the existing binary data they added? Did they want to append the existing binary data to the binary data found in the struct? Did they want to append the binary data found in the struct to the existing binary data? Do we have to add extra array elements to the data payload and add more binary placeholders? Or does the struct they passed already have those placeholders in there? We can count the placeholder objects and decide, but it's a messy heuristic; we don't really know what the user intended.

A way to fix this problem is to have an emit() method signature that changes if bin() has been called. This is not a big deal: I can add an extra type param to ConfOperators and BroadcastOperators that results in different emit() implementations depending on if bin() has been called. So if bin() has not yet been called, emit() and emit_with_ack() take a T: Serialize, but if bin() has been called, they instead take a serde_json::Value. Then there's no confusion or possibility of error.

Initially I was thinking that leaving the API alone (the first, simpler option) would be fine, but the more I thought about the edge cases I mention above, I was less comfortable with it, so I decided to implement the second option. But I put it in its own commit and I'm happy to change it back to the simpler API surface if you'd prefer. (Note that while the diff is fairly large, a decent amount of it is just moving existing code into different impl blocks, and adding duplicate emit() and emit_with_ack() implementations that just convert & extract the data, and forward to the original implementation.)

@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch 3 times, most recently from 64aa2cf to 00430b3 Compare March 15, 2024 22:26
@Totodore
Copy link
Owner

Do you think it would be possible to split the Vec<u8> to Bytes modification in another PR? Because I'm sure I will merge this (it was on my to-do list) and also it would help to reduce the size of this PR?

@Totodore
Copy link
Owner

But I put it in its own commit and I'm happy to change it back to the simpler API surface if you'd prefer.

I think it is a really good solution (I didn't think about it, maybe it would have been better to use this generic param hack for Conf/Broadcast operators also rather than splitting this into two different struct and therefore having a lot of code duplication).
I'll check this more in depth tomorrow :).

@kelnos
Copy link
Contributor Author

kelnos commented Mar 17, 2024

Do you think it would be possible to split the Vec<u8> to Bytes modification in another PR?

Done, see #285. After that's merged I'll rebase this branch on top of it.

@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch 2 times, most recently from 1156c17 to c2db9a8 Compare March 18, 2024 01:02
@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch 2 times, most recently from e9b5f0b to 07ef868 Compare April 15, 2024 07:38
@kelnos
Copy link
Contributor Author

kelnos commented Apr 15, 2024

Rebased against main, but now I'm seeing a failure in the new concurrent_emit test. The error I'm seeing (and CI is seeing) is that somehow it first gets Binary(b"\x01\x02\x03") instead of the Message packet. I'm not really sure how to debug this. There were quite a few conflicts I had to resolve during the rebase, so I suppose I got something wrong? I looked through it all and didn't see anything that seemed wrong, but admittedly I'm not the expert here...

@Totodore
Copy link
Owner

Totodore commented Apr 15, 2024

Rebased against main, but now I'm seeing a failure in the new concurrent_emit test. The error I'm seeing (and CI is seeing) is that somehow it first gets Binary(b"\x01\x02\x03") instead of the Message packet. I'm not really sure how to debug this. There were quite a few conflicts I had to resolve during the rebase, so I suppose I got something wrong? I looked through it all and didn't see anything that seemed wrong, but admittedly I'm not the expert here...

The message corresponding to the binary payload is incorrectly serialized:
"50-["test","bin",{"_placeholder":true,"num":0},{"_placeholder":true,"num":1}]", VS
"52-["test","bin",{"_placeholder":true,"num":0},{"_placeholder":true,"num":1}]".

For a weird reason the payload count is 0 and not 2.

        let payload_count = count_binary_payloads(&data); // This deserve some unit testing
        let bin_count = bin.len();
        dbg!(payload_count, bin_count);

@kelnos
Copy link
Contributor Author

kelnos commented Apr 15, 2024

Ah -- it's because the test is "wrong". The non-binary payload of the data is just "bin", which has no binary payloads, as it doesn't have any placeholders.

This is one of the "holes" in the usefulness of serde_json::Value here, since constructing a placeholder is manual, something that ideally a user shouldn't have to do (since that's an implementation detail of the protocol). A couple ideas:

  1. Check the non-binary data before emitting the event. If the number of placeholders in the non-binary part is less than the number of binary payloads in the packet, ensure that the non-binary data is an array, and add more placeholders to that array.
  2. Require that the user be explicit in "placing" the placeholders, but give them a convenient way to do it, like a simple function:
pub fn create_binary_placeholder(num: usize) -> serde_json::Value { .. }

Then users can do something like:

let value = Value::Array(vec![
    "bin".into(),
    create_binary_placeholder(0),
    create_binary_placeholder(1),
]);

Neither of these options is particularly great. The first one assumes too much intent from the user that may not be there (personally, I would probably advocate for .emit() to return an error when this happens). The second one feels pretty manual, and means the user can't use the json!() macro when constructing payloads (though arguably they may not be able to do that for payloads containing binary data regardless).

Looking back, part of my motivation for creating the PayloadValue stuff was because, essentially, socketio payloads are not really JSON. They're sort of a "JSON+binary", which can't really be represented by serde_json::Value, at least not without some trade offs... this issue illustrating one of them.

For users who are sending binary data, they should really be using a struct that implements Serialize, with Bytes members for the binary data. (Unfortunately there isn't a great way in rust to do this with an array instead of an object, at least not without serializable heterogeneous lists.) The .bin() and .emit(Value) methods are mainly there for back-compat, and also for things like echoing back data where you receive the Vec<Bytes> and Value (which already has the placeholders in it), and are sending it back, or perhaps forwarding it on somewhere else.

So I can either 1) attempt to find a solution (one of the above, or something else) that allows people to chain .bin().emit() while properly constructing the JSON that gets sent, or 2) we can just ignore that for now, and expect people to use structs when they want to construct their own payloads with binary data from scratch. What do you think?

Regarding the test, this fixes it, though, as I said above, it's not great to require users do stuff like this:

@@ -21,7 +21,14 @@ pub async fn emit() {
                         Bytes::from_static(&[1, 2, 3]),
                         Bytes::from_static(&[4, 5, 6]),
                     ])
-                    .emit("test", "bin".into())
+                    .emit(
+                        "test",
+                        serde_json::json!([
+                            "bin",
+                            { "_placeholder": true, "num": 0 },
+                            { "_placeholder": true, "num": 1 }
+                        ]),
+                    )
                     .unwrap();
                 }
             });

An alternate fix, which is more user-friendly, but has a different payload structure:

@@ -8,6 +8,13 @@ use bytes::Bytes;
 use engineioxide::Packet::*;
 use socketioxide::{extract::SocketRef, SocketIo};
 
+#[derive(serde::Serialize)]
+struct TestPayload {
+    bin: String,
+    binary1: Bytes,
+    binary2: Bytes,
+}
+
 #[tokio::test]
 pub async fn emit() {
     const BUFFER_SIZE: usize = 10000;
@@ -17,12 +24,12 @@ pub async fn emit() {
             let s = socket.clone();
             tokio::task::spawn_blocking(move || {
                 for _ in 0..100 {
-                    s.bin(vec![
-                        Bytes::from_static(&[1, 2, 3]),
-                        Bytes::from_static(&[4, 5, 6]),
-                    ])
-                    .emit("test", "bin".into())
-                    .unwrap();
+                    let payload = TestPayload {
+                        bin: "bin".to_string(),
+                        binary1: Bytes::from_static(&[1, 2, 3]),
+                        binary2: Bytes::from_static(&[4, 5, 6]),
+                    };
+                    s.emit("test", payload).unwrap();
                 }
             });
         }
@@ -34,7 +41,7 @@ pub async fn emit() {
     let mut count = 0;
     let mut total = 0;
     const MSG: &str =
-        r#"52-["test","bin",{"_placeholder":true,"num":0},{"_placeholder":true,"num":1}]"#;
+        r#"52-["test",{"bin":"bin","binary1":{"_placeholder":true,"num":0},"binary2":{"_placeholder":true,"num":1}}]"#;
     while let Some(msg) = srx.recv().await {
         match msg {
             Message(msg) if count == 0 && msg == MSG => {

@Totodore
Copy link
Owner

The 2nd solution seems much better, with this implementation is it possible to keep the same kind of layout ?
Like TestPayload(String, Bytes, Bytes) or TestPayload([String, Bytes, Bytes])?

@kelnos
Copy link
Contributor Author

kelnos commented Apr 15, 2024

Yes, same layout should work.

This allows users to create a payload struct (that implements
Serialize/Deserialize) that has any binary payload data embedded in the
struct directly, rather than needing to look in a "side" Vec of payload
data, and have to guess what order the binary payloads fit into their
data model.

To accomplish this, there are new Serializer and Deserializer
implementations that mostly wrap the Ser/Deser impls provided for
serde_json::Value.  Unfortunately serde_json doesn't expose everything
necessary to do this in a truly simple way; some duplication of
serde_json's functionality is needed.

Closes Totodore#276.
In order to reduce possible API-use confusion, ConfOperators and
BroadcastOperators now has a new type param, for a new (sealed) trait,
BinaryHolding, which "tells" the struct whether or not the user has
added binary data or not.

If the user has not added binary data, the operator will be
WithoutBinary, and emit() and emit_with_ack() take a T: Serialize as
usual, and those methods will attempt to extract any binary payloads
from the user-provided struct before sending the event.

If the user explicitly adds binary data (by calling the bin() method),
they will get back a new instance of the operator that is WithBinary.
Then the emit() and emit_with_ack() methods will take a
serde_json::Value, which is expected to already have placeholder objects
where any binary payload is supposed to be.
This is a convenience function so callers can create the non-binary data
to go along with their binary payloads, in a way that doesn't require
them to understand socketio protocol implementation details.
@kelnos kelnos force-pushed the integrated-data-binary-take-2 branch from 1f056ca to c4851b5 Compare April 15, 2024 23:44
@kelnos kelnos marked this pull request as ready for review April 16, 2024 00:13
@Totodore
Copy link
Owner

Totodore commented Apr 17, 2024

I dig a bit the issue of the emit with serde_json::Value. I'm afraid the solution with create_binary_placeholder would disturb the user because it is not intuitive. What I propose is :

  • Keeping the current behaviour of the bin operator at least for the next release. When you call bin, it simply add payloads at the end of the data array. We could mark the bin operator deprecated and remove it in the following release or after a given period of time if we think that it is not good to have two possible ways to send binary payloads.

  • Without the bin operator with your current implementation it is possible to do this:

#[derive(Debug, serde::Serialize)]
struct Test(Bytes, Bytes);
...
s.emit(
    "test",
    Test(
        Bytes::from_static(&[1, 2, 3]),
        Bytes::from_static(&[4, 5, 6]),
    ),
) // == "["test",{"_placeholder":true,"num":0},{"_placeholder":true,"num":1}]"

or this if the user needs to add serde_json::Value to its payload:

#[derive(Debug, serde::Serialize)]
struct Test(serde_json::Value, Bytes, Bytes);
...
s.emit(
    "test",
    Test(
        serde_json::json!({"foo":true, "bar": 1}),
        Bytes::from_static(&[1, 2, 3]),
        Bytes::from_static(&[4, 5, 6]),
    ),
) // == "[\"test\",{\"bar\":1,\"foo\":true},{\"_placeholder\":true,\"num\":0},{\"_placeholder\":true,\"num\":1}]".

(We could even remove the struct Test and directly use tuples)

In a more complicated scenario where the user needs to dynamically imbricate Bytes on a serde_json::Value it become more complicated without your helper. Maybe we could simply don't allow that? Globally I prefer to have a more stricter API rather than to much code to support edge cases and an API with 5 different ways to do the same things.

Also it appears that the placeholder is not "stripped" from the received payload if it is deserialized to serde_json::Value. This change might break some code at runtime because the data the user received is not what he expected to be.

Globally I think that we should exclude any placeholder for the user if he ser/deser his data has a dynamic serde_json::Value whether he receives it or he sends it.

Also we should maybe add some tests to be sure about each corner case of this feature:

#[derive(Debug, Serialize, Deserialize)]
struct Test(Bytes, Bytes);
#[derive(Debug, Serialize, Deserialize)]
struct Test<N: usize>([Bytes; N]);
#[derive(Debug, Serialize, Deserialize)]
struct Test((Bytes,));
#[derive(Debug, Serialize, Deserialize)]
struct Test((Value, Bytes));
#[derive(Debug, Serialize, Deserialize)]
struct Test<N: usize>((Value, [Bytes; N]));
#[derive(Debug, Serialize, Deserialize)]
struct Test { data: Value, bin: Vec<Bytes> }
#[derive(Debug, Serialize, Deserialize)]
struct Test { data: Value, data2: (String, Vec<Bytes>) }
...

@kelnos
Copy link
Contributor Author

kelnos commented Apr 17, 2024

Keeping the current behaviour of the bin operator at least for the next release. When you call bin, it simply add payloads at the end of the data array

Sounds good.

Also it appears that the placeholder is not "stripped" from the received payload if it is deserialized to serde_json::Value.

Yes, that's correct, and there's really no way around that if we want the new features to work. The problem is that we can't actually tell if the user wants a serde_json::Value or some T: DeserializeOwned in their handler. If we knew they wanted the former, sure, we could strip it out. But if they want the latter, then stripping it out means the deserializer has no idea where to put the Bytes values, and in what order.

@Totodore
Copy link
Owner

Totodore commented Apr 19, 2024

Yes, that's correct, and there's really no way around that if we want the new features to work. The problem is that we can't actually tell if the user wants a serde_json::Value or some T: DeserializeOwned in their handler. If we knew they wanted the former, sure, we could strip it out. But if they want the latter, then stripping it out means the deserializer has no idea where to put the Bytes values, and in what order.

I worked a bit on this and I found a solution:

  • When the user want to deserialize to a custom struct with a Bytes field this will go through deserialize_bytes.
  • When the user want to deserialize to a serde_json::Value there are 3 possible case:
    • The placeholder is the root object (is this case I think it is normal to throw an error because the event data is always sent as an array.
    • The placeholder is in a sequence, in this case we can add a check to the SeqAccess implementation to skip the placeholder:
impl<'a, 'de, T: serde::Deserialize<'de>> SeqAccess<'de> for SeqDeserializer<'a, T> {
    type Error = serde_json::Error;

    fn next_element_seed<S>(&mut self, seed: S) -> Result<Option<S::Value>, Self::Error>
    where
        S: DeserializeSeed<'de>,
    {
        match self.iter.next() {
            Some(value) => {
                if is_object_placeholder(&value) {
                    return self.next_element_seed(seed);
                }

                let payload = Deserializer {
                    value,
                    binary_payloads: self.binary_payloads,
                    _phantom: PhantomData::<T>,
                };
                seed.deserialize(payload).map(Some)
            }
            None => Ok(None),
        }
    }
...
  • The placeholder is in a map, in this case we can add a check to the MapAccess implementation to skip the placeholder:
impl<'a, 'de, T: serde::Deserialize<'de>> MapAccess<'de> for MapDeserializer<'a, T> {
    type Error = serde_json::Error;

    fn next_key_seed<K>(&mut self, seed: K) -> Result<Option<K::Value>, Self::Error>
    where
        K: DeserializeSeed<'de>,
    {
        match self.iter.next() {
            Some((key, value)) => {
                if is_object_placeholder(&value) {
                    return self.next_key_seed(seed);
                }
                self.value = Some(value);
                let key_deser = MapKeyDeserializer::from(key);
                seed.deserialize(key_deser).map(Some)
            }
            None => Ok(None),
        }
    }
...

We can also maybe keep track of the number of placeholders in the deserializer state so we can check the count with the number of binaries at the end.

https://github.com/Totodore/socketioxide/tree/pr-284-placeholder-detection for the implementation.

Copy link
Owner

@Totodore Totodore left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only a small review of the value mod. Overall I think we tend to something good and mergeable so thats nice :). I see only one friction point: the need to clone each byte slice when serializing a byte array. For images and stuff like that this can become heavy to clone so much things. I didn't find a solution for this at the moment though.

Once we agree on implementation specifications we need more test and maybe a benchmark with a comparison between serde_json::{from|to}_value and socketioxide::{from|to}_value. Also memory consumption check and latency under high load with heavy binaries.

match self.value {
Value::String(s) => visitor.visit_bytes(s.as_bytes()),
Value::Object(o) => visit_value_object_for_bytes(o, self.binary_payloads, visitor),
Value::Array(a) => visit_value_array_for_bytes(a, visitor),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure of the usefulness of visit_value_array_for_bytes, maybe I didn't understand its purpose. The Value type doesn't have any concept of "bytes" and treats everything as an array. In this case the official Value deser impl seems to defer deser to its SeqAccess impl. On our side we could:

  • In the same way defer the call to our SeqAccess impl and this would then be automatically deserialized to an array of u8 as wanted. But we could go into some issue with the _placeholder check
  • call self.value.deserialize_bytes(visitor) to defer to the inner impl (I think this is the best option).

Comment on lines +270 to +303
fn visit_value_object_for_bytes<'a: 'a, 'de, V>(
o: Map<String, Value>,
binary_payloads: &'a [Bytes],
visitor: V,
) -> Result<V::Value, serde_json::Error>
where
V: de::Visitor<'de>,
{
if !o.len() == 2
|| !o
.get("_placeholder")
.and_then(|v| v.as_bool())
.unwrap_or(false)
{
Err(serde_json::Error::invalid_type(
Unexpected::Map,
&"binary payload placeholder object",
))
} else if let Some(num) = o.get("num").and_then(|v| v.as_u64()) {
if let Some(payload) = binary_payloads.get(num as usize) {
visitor.visit_bytes(payload)
} else {
Err(serde_json::Error::invalid_value(
Unexpected::Unsigned(num),
&"a payload number in range",
))
}
} else {
Err(serde_json::Error::invalid_value(
Unexpected::Map,
&"binary payload placeholder without valid num",
))
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor this for readability

Suggested change
fn visit_value_object_for_bytes<'a: 'a, 'de, V>(
o: Map<String, Value>,
binary_payloads: &'a [Bytes],
visitor: V,
) -> Result<V::Value, serde_json::Error>
where
V: de::Visitor<'de>,
{
if !o.len() == 2
|| !o
.get("_placeholder")
.and_then(|v| v.as_bool())
.unwrap_or(false)
{
Err(serde_json::Error::invalid_type(
Unexpected::Map,
&"binary payload placeholder object",
))
} else if let Some(num) = o.get("num").and_then(|v| v.as_u64()) {
if let Some(payload) = binary_payloads.get(num as usize) {
visitor.visit_bytes(payload)
} else {
Err(serde_json::Error::invalid_value(
Unexpected::Unsigned(num),
&"a payload number in range",
))
}
} else {
Err(serde_json::Error::invalid_value(
Unexpected::Map,
&"binary payload placeholder without valid num",
))
}
}
/// Visit a `serde_json::Value::Object` as a binary payload placeholder.
/// If found the data is extracted from the binary payloads and passed to the visitor.
fn visit_value_object_for_bytes<'a, 'de, V>(
o: Map<String, Value>,
binary_payloads: &'a [Bytes],
visitor: V,
) -> Result<V::Value, serde_json::Error>
where
V: de::Visitor<'de>,
{
let err = |msg: &'static str| serde_json::Error::invalid_type(Unexpected::Map, &msg);
// Expect "_placeholder: true"
let has_placeholder = o
.get("_placeholder")
.and_then(|v| v.as_bool())
.unwrap_or(false);
if !o.len() == 2 || !has_placeholder {
return Err(err("binary payload placeholder object"));
};
// Expect "num: i"
let bin_index = o
.get("num")
.and_then(|v| v.as_u64())
.ok_or_else(|| err("binary payload placeholder without valid num"))?;
if let Some(payload) = binary_payloads.get(bin_index as usize) {
visitor.visit_bytes(payload)
} else {
Err(serde_json::Error::invalid_value(
Unexpected::Unsigned(bin_index),
&"a payload number in range",
))
}
}

Comment on lines +305 to +331
fn visit_value_array_for_bytes<'de, V>(
a: Vec<Value>,
visitor: V,
) -> Result<V::Value, serde_json::Error>
where
V: de::Visitor<'de>,
{
let bytes = a
.into_iter()
.map(|v| match v {
Value::Number(n) => n
.as_u64()
.and_then(|n| u8::try_from(n).ok())
.ok_or_else(|| {
serde_json::Error::invalid_value(
Unexpected::Other("non-u8 number"),
&"number that fits in a u8",
)
}),
_ => Err(serde_json::Error::invalid_value(
unexpected_value(&v),
&"number that fits in a u8",
)),
})
.collect::<Result<Vec<u8>, _>>()?;
visitor.visit_bytes(&bytes)
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not useful if we don't deser Value as a byte and only use Array representation

Comment on lines +463 to +466
match self.iter.size_hint() {
(lower, Some(upper)) if lower == upper => Some(upper),
_ => None,
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Iterator is an ExactSizeIterator so we can call

Suggested change
match self.iter.size_hint() {
(lower, Some(upper)) if lower == upper => Some(upper),
_ => None,
}
self.iter.len()

Comment on lines +33 to +43
// This is less efficient than it could be, but `Deserializer::from_str()` (used
// internally by `serde_json`) wants to hold a reference to `self.key` longer than is
// permitted. The methods on `Deserializer` for deserializing into a `Number` without
// that constraint are not exposed publicly.
let reader = VecRead::from(self.key);
let mut deser = serde_json::Deserializer::from_reader(reader);
let number = deser.$method(visitor)?;
let _ = deser
.end()
.map_err(|_| serde_json::Error::custom("expected numeric map key"))?;
Ok(number)
Copy link
Owner

@Totodore Totodore Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might deserve a little benchmark but doing this might be faster:

        let num = serde_json::from_str(self.key.as_ref())?;
        visitor.$method(num)

Another solution is to create a custom Visitor without lifetimes:

        struct I8Visitor;
        impl de::Visitor<'_> for I8Visitor {
            type Value = i8;

            fn expecting(&self, formatter: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
                formatter.write_str("i8")
            }

            fn visit_i8<E>(self, v: i8) -> Result<Self::Value, E> {
                Ok(v)
            }
        }
        let el = serde_json::Deserializer::from_str(self.key.as_ref()).deserialize_i8(I8Visitor)?;
        visitor.visit_i8(el)

(this being generified by a macro ofc)

Comment on lines +369 to +390
fn serialize_i8(self, v: i8) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
}

fn serialize_i16(self, v: i16) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
}

fn serialize_i32(self, v: i32) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
}

fn serialize_i64(self, v: i64) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
}

fn serialize_i128(self, v: i128) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
}

fn serialize_u8(self, v: u8) -> Result<Self::Ok, Self::Error> {
Ok(v.to_string())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A macro for this would be nice

@kelnos
Copy link
Contributor Author

kelnos commented Apr 28, 2024

@Totodore just wanted to leave a note that I haven't abandoned this -- I'm getting married next week, so my time for open source has been unsurprisingly limited lately.

@Totodore
Copy link
Owner

@Totodore just wanted to leave a note that I haven't abandoned this -- I'm getting married next week, so my time for open source has been unsurprisingly limited lately.

No problem, I wish you a great wedding!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Relationship between Bin and the structure of Data can be ambiguous for binary packets
2 participants