Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(serde_with): make BytesOrString adjustable #793

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion serde_with/src/de/impls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,10 @@ where
}

#[cfg(feature = "alloc")]
impl<'de> DeserializeAs<'de, Vec<u8>> for BytesOrString {
impl<'de, PREFERENCE> DeserializeAs<'de, Vec<u8>> for BytesOrString<PREFERENCE>
where
PREFERENCE: formats::TypePreference,
{
fn deserialize_as<D>(deserializer: D) -> Result<Vec<u8>, D::Error>
where
D: Deserializer<'de>,
Expand Down
53 changes: 53 additions & 0 deletions serde_with/src/formats.rs
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,59 @@ create_format!(
Unpadded
);

/// When serializing a value of a type,
/// that allows multiple types during deserialization,
/// prefer a specific type.
pub trait TypePreference: SerializeAs<[u8]> {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's neat to just reuse the SerializeAs trait for the behavior.

Does the trait need to be fully public? We can't make it private, since it appears in public bounds, but seal all implementations. Similar to

mod sealed {
pub trait Sealed {}
impl Sealed for super::Standard {}
impl Sealed for super::UrlSafe {}
impl Sealed for super::Crypt {}
impl Sealed for super::Bcrypt {}
impl Sealed for super::ImapMutf7 {}
impl Sealed for super::BinHex {}
}
.

I would like to keep the serialization and deserialization behavior matching if possible. For that a fully public trait feels unnecessary. Having ASCII and unicode strings supported feels sufficient for now. If it turns out that more freedom is necessary, it can be unsealed later.


/// Prefer serializing it as ASCII string.
pub struct PreferAsciiString;

impl TypePreference for PreferAsciiString {}

impl SerializeAs<[u8]> for PreferAsciiString {
fn serialize_as<S>(source: &[u8], serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
match core::str::from_utf8(source) {
Ok(text) if text.is_ascii() => serializer.serialize_str(text),
_ => serializer.serialize_bytes(source),
Copy link
Owner

@jonasbb jonasbb Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, these serialize_bytes should just be serialize.

Suggested change
_ => serializer.serialize_bytes(source),
_ => serializer.serialize(source),

}
}
}

/// Prefer serializing it as bytes.
pub struct PreferBytes;

impl TypePreference for PreferBytes {}

impl SerializeAs<[u8]> for PreferBytes {
fn serialize_as<S>(source: &[u8], serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
serializer.serialize_bytes(source)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
serializer.serialize_bytes(source)
serializer.serialize(source)

}
}

/// Prefer serializing it as string.
pub struct PreferString;

impl TypePreference for PreferString {}

impl SerializeAs<[u8]> for PreferString {
fn serialize_as<S>(source: &[u8], serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
match core::str::from_utf8(source) {
Ok(text) => serializer.serialize_str(text),
_ => serializer.serialize_bytes(source),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ => serializer.serialize_bytes(source),
_ => serializer.serialize(source),

}
}
}

/// Specify how lenient the deserialization process should be
///
/// Formats which make use of this trait should specify how it affects the deserialization behavior.
Expand Down
16 changes: 15 additions & 1 deletion serde_with/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -811,9 +811,23 @@ pub struct DefaultOnNull<T = Same>(PhantomData<T>);
/// assert_eq!("✨Works!".as_bytes(), &*a.bytes_or_string);
/// # }
/// ```
///
/// Often it is prefered to serialize these bytes as string again,
/// but `BytesOrString` will always return an array of integers in the range of `0–255`.
/// This can be adjusted using its generic type parameter,
/// which can be either [`PreferBytes`] (default), [`PreferAsciiString`] or [`PreferString`].
/// The latter two will try to convert arbitrary bytes to a `&str` first and will fallback to
/// serializing as array of bytes only if these bytes would form an invalid string.
/// `PreferAsciiString` will serialize strings containing non-ASCII characters as array as well.
///
/// [`String`]: std::string::String
/// [`PreferBytes`]: formats::PreferBytes
/// [`PreferAsciiString`]: formats::PreferString
/// [`PreferString`]: formats::PreferString
#[cfg(feature = "alloc")]
pub struct BytesOrString;
pub struct BytesOrString<PREFERENCE: formats::TypePreference = formats::PreferBytes>(
PhantomData<PREFERENCE>,
);

/// De/Serialize Durations as number of seconds.
///
Expand Down
5 changes: 4 additions & 1 deletion serde_with/src/schemars_0_8.rs
Original file line number Diff line number Diff line change
Expand Up @@ -430,7 +430,10 @@ impl<T> JsonSchemaAs<T> for Bytes {
forward_schema!(Vec<u8>);
}

impl JsonSchemaAs<Vec<u8>> for BytesOrString {
impl<PREFERENCE> JsonSchemaAs<Vec<u8>> for BytesOrString<PREFERENCE>
where
PREFERENCE: formats::TypePreference,
{
fn schema_name() -> String {
"BytesOrString".into()
}
Expand Down
7 changes: 5 additions & 2 deletions serde_with/src/ser/impls.rs
Original file line number Diff line number Diff line change
Expand Up @@ -595,12 +595,15 @@ where
}

#[cfg(feature = "alloc")]
impl SerializeAs<Vec<u8>> for BytesOrString {
impl<PREFERENCE> SerializeAs<Vec<u8>> for BytesOrString<PREFERENCE>
where
PREFERENCE: formats::TypePreference,
{
fn serialize_as<S>(source: &Vec<u8>, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
source.serialize(serializer)
PREFERENCE::serialize_as(source.as_slice(), serializer)
}
}

Expand Down
62 changes: 59 additions & 3 deletions serde_with/tests/serde_as/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ use core::{
use expect_test::expect;
use serde::{Deserialize, Serialize};
use serde_with::{
formats::{CommaSeparator, Flexible, Strict},
formats::{CommaSeparator, Flexible, PreferAsciiString, PreferString, Strict},
serde_as, BoolFromInt, BytesOrString, DisplayFromStr, IfIsHumanReadable, Map,
NoneAsEmptyString, OneOrMany, Same, Seq, StringWithSeparator,
};
Expand Down Expand Up @@ -477,7 +477,7 @@ fn test_none_as_empty_string() {
}

#[test]
fn test_bytes_or_string() {
fn test_bytes_or_string_as_bytes() {
#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct S(#[serde_as(as = "BytesOrString")] Vec<u8>);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a test for an explicit PreferBytes too?

Expand All @@ -491,8 +491,64 @@ fn test_bytes_or_string() {
3
]"#]],
);
check_deserialization(S(vec![72, 101, 108, 108, 111]), r#""Hello""#);
check_deserialization(S(vec![70, 111, 111, 98, 97, 114]), r#""Foobar""#);
}

#[test]
fn test_bytes_or_string_as_string() {
#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct S(#[serde_as(as = "BytesOrString<PreferString>")] Vec<u8>);

is_equal(S(vec![72, 101, 108, 108, 111]), expect![[r#""Hello""#]]);

check_deserialization(S(vec![0xf0, 0x9f, 0xa6, 0xa6]), r#""🦦""#);
is_equal(S(vec![0xf0, 0x9f, 0xa6, 0xa6]), expect![[r#""🦦""#]]);

is_equal(
S(vec![0, 255]),
expect![[r#"
[
0,
255
]"#]],
);
check_deserialization(S(vec![87, 111, 114, 108, 100]), r#""World""#);
}

#[test]
fn test_bytes_or_string_as_ascii_string() {
#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct S(#[serde_as(as = "BytesOrString<PreferAsciiString>")] Vec<u8>);

is_equal(S(vec![72, 101, 108, 108, 111]), expect![[r#""Hello""#]]);

check_deserialization(S(vec![0xf0, 0x9f, 0xa6, 0xa6]), r#""🦦""#);
is_equal(
S(vec![0xf0, 0x9f, 0xa6, 0xa6]),
expect![[r#"
[
240,
159,
166,
166
]"#]],
);

is_equal(
S(vec![0, 255]),
expect![[r#"
[
0,
255
]"#]],
);
check_deserialization(S(vec![87, 111, 114, 108, 100]), r#""World""#);
}

#[test]
fn test_bytes_or_string_nested() {
#[serde_as]
#[derive(Debug, Serialize, Deserialize, PartialEq)]
struct SVec(#[serde_as(as = "Vec<BytesOrString>")] Vec<Vec<u8>>);
Expand Down