Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix deserialization of empty structs and tuples in untagged enums #2805

Merged
merged 9 commits into from
Aug 24, 2024

Conversation

Mingun
Copy link
Contributor

@Mingun Mingun commented Aug 17, 2024

This PR is logical continuation of #2788 and #2804, for untagged enums. While I working on it I found a bug that was fixed.

This PR moves all tests which tests top-level untagged enums (those which is defined using #[serde(untagged)] attribute) into a dedicated file and adds tests to cover missing methods of ContentRefDeserializer that used for deserialization. Flatten tests where untagged enum is flattened into structure I consider foremost as flatten tests, so they left untouched.

Each method in ContentRefDeserializer and linked structs was marked with a comment that shows which test(s) cover it. After this work a bug was catched:

#[derive(Deserialize)]
#[serde(untagged)]
enum Untagged {
    Newtype(ExternallyTagged),
}

#[derive(Deserialize)]
enum ExternallyTagged {
    EmptyTuple(),
    EmptyStruct {},
}

The Untagged type could not be deserialized because SeqRefDeserializer (which is used for deserialization) has a special code to handle empty sequences which was not covered and never worked:

serde/serde/src/private/de.rs

Lines 2228 to 2244 in 1a9ffdb

fn deserialize_any<V>(mut self, visitor: V) -> Result<V::Value, Self::Error>
where
V: de::Visitor<'de>,
{
let len = self.iter.len();
if len == 0 {
visitor.visit_unit()
} else {
let ret = tri!(visitor.visit_seq(&mut self));
let remaining = self.iter.len();
if remaining == 0 {
Ok(ret)
} else {
Err(de::Error::invalid_length(len, &"fewer elements in array"))
}
}
}

The new tests shows that it is impossible to create such a type that would accept visit_unit() in line 2234. The condition len == 0 assumes that type would be able to deserialized from unit, but:

  1. struct variants was never able to deserialize from it (they expect only visit_map or visit_seq)
  2. tuple variants even with zero fields expect visit_seq only. The suggestion to accept visit_unit instead was rejected in Fix incorrect representation of tuples with skipped fields #2520

After appling the fix the MapRefDeserializer and SeqRefDeserializer no longer used so they are removed. Actually, because direct deserialization of maps and sequences from the ContentRefDeserializer uses value::MapDeserializer, that is just unifies the behaviour:

serde/serde/src/private/de.rs

Lines 1938 to 1946 in 1a9ffdb

fn deserialize_seq<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
match *self.content {
Content::Seq(ref v) => visit_content_seq_ref(v, visitor),
_ => Err(self.invalid_type(&visitor)),
}
}

serde/serde/src/private/de.rs

Lines 1967 to 1975 in 1a9ffdb

fn deserialize_map<V>(self, visitor: V) -> Result<V::Value, Self::Error>
where
V: Visitor<'de>,
{
match *self.content {
Content::Map(ref v) => visit_content_map_ref(v, visitor),
_ => Err(self.invalid_type(&visitor)),
}
}

value::MapDeserializer and value::SeqDeserializer are used:

serde/serde/src/private/de.rs

Lines 1685 to 1718 in 1a9ffdb

fn visit_content_seq_ref<'a, 'de, V, E>(
content: &'a [Content<'de>],
visitor: V,
) -> Result<V::Value, E>
where
V: Visitor<'de>,
E: de::Error,
{
let seq = content.iter().map(ContentRefDeserializer::new);
let mut seq_visitor = SeqDeserializer::new(seq);
let value = tri!(visitor.visit_seq(&mut seq_visitor));
tri!(seq_visitor.end());
Ok(value)
}
fn visit_content_map_ref<'a, 'de, V, E>(
content: &'a [(Content<'de>, Content<'de>)],
visitor: V,
) -> Result<V::Value, E>
where
V: Visitor<'de>,
E: de::Error,
{
let map = content.iter().map(|(k, v)| {
(
ContentRefDeserializer::new(k),
ContentRefDeserializer::new(v),
)
});
let mut map_visitor = MapDeserializer::new(map);
let value = tri!(visitor.visit_map(&mut map_visitor));
tri!(map_visitor.end());
Ok(value)
}

@Mingun
Copy link
Contributor Author

Mingun commented Aug 22, 2024

@dtolnay , @oli-obk , can you look at this bugfix?

Moved and renamed:
From test_annotations
- test_expecting_message_untagged_tagged_enum                  => expecting_message
- flatten::enum_::untagged::straitforward                      => contains_flatten

From test_macros
- test_untagged_newtype_struct                                 => newtype_struct
- test_untagged_enum                                           => complex
- test_untagged_enum_with_flattened_integer_key                => contains_flatten_with_integer_key
- test_enum_in_untagged_enum                                   => newtype_enum
- test_untagged_bytes                                          => string_and_bytes
- test_untagged_newtype_variant_containing_unit_struct_not_map => newtype_unit_and_empty_map
(review this commit with "ignore whitespace changes" option on)
…en==0 condition

failures (2):
    newtype_enum::empty_struct_from_seq
    newtype_enum::tuple0
SeqRefDeserializer::deserialize_any has a special condition for empty sequence, which
emits visit_unit. That condition assumes that type would be able to deserialized from
unit, but:
1) struct variants was never able to deserialize from it (they expect only visit_map or visit_seq)
2) tuple variants even with zero fields expect visit_seq only. The suggestion to accept visit_unit
   instead was rejected in serde-rs#2520

Fixes (2):
    newtype_enum::tuple0
    newtype_enum::empty_struct_from_seq
Although they are slightly different, this difference is irrelevant:
- MapDeserializer has a specialization for deserialize_seq and deserialize_tuple, but
  only MapRefDeserializer::deserialize_any is used by the code which is almost the same
- MapDeserializer checks that map was consumed after visit_map, but MapRefDeserializer
  does not. Actually, each derived implementation consumes map and each manual implementation
  also should consume it

Also, MapDeserializer already used when value deserialized from ContentRefDeserializer
directly and MapRefDeserializer was only used to deserialize Struct variants of enums.
There are no reasons why the behavior should be different in those two cases
@dtolnay dtolnay changed the title Fix deserialziation of empty structs and tuples in untagged enums Fix deserialization of empty structs and tuples in untagged enums Aug 24, 2024
Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dtolnay dtolnay merged commit 9eaf7b9 into serde-rs:master Aug 24, 2024
15 checks passed
@Mingun Mingun deleted the untagged-tests branch August 24, 2024 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants