Add finalizer attribute hook to validate a deserialized structure #642

erickt · 2016-12-17T22:53:42Z

In servo/rust-url#252, @Manishearth is writing a serde deserializer for urls since parsing a string can be expensive. However, to do this safely from untrusted sources, they need to validate that the data is semantically correct. We unfortunately don't have an easy way of doing this without writing a custom deserializer. It'd be pretty simple to do as an attribute, however. Here's my idea:

#[derive(Serialize, Deserialize)]
#[serde(finalizer="check_greater_than_10")]
struct GreaterThan10 {
    n: usize,
}

fn greater_than_10<D>(item: GreaterThan10) -> Result<GreaterThan10, D::Error>
    where D: Deserializer
{
    if item.n > 10 {
        Ok(item)
    } else {
        Err(Error::invalid_value("less than or equal to 10"))
    }
}

dtolnay · 2016-12-28T03:55:59Z

This seems relevant: https://github.com/Keats/validator. Let's make sure to support this library in a nice way.

radix · 2017-01-04T22:43:25Z

My hacky workaround is to

don't derive Deserialize for my main data type "DataType"
copy my main data type to a "FakeDataType" which has the same fields, and make this one derive Deserialize
implement Deserialize for DataType by:
1. invoking deserialize to FakeDataType
2. validate the result
3. convert FakeDataType to DataType.

It's a dumb hack but at least it allows me to avoid rewriting all the deser code manually.

dtolnay · 2017-01-07T21:28:54Z

#626 has another use case.

I am on board with implementing exactly the approach in @erickt's comment.

oli-obk · 2017-03-29T12:33:23Z

Another use-case from irc by @alteous

I would like to obtain a maximum integer value from a tree of structures. The context is these values are offsets into arrays, and I want to check that they are all valid ahead of time. The integer values are wrapped in Index types.

beamspease · 2017-04-16T00:21:33Z

I believe I could also use this for an enum which is either "unknown" or some value fitting a specific pattern, but both are strings.

Mange · 2017-04-17T20:01:24Z

I have the same usecase as in #626. I want to load shapes, and I also want to store some precomputed attributes on them that are derived from the other fields (for example radius_squared = radius * radius).

I don't want to serialize or deserialize these fields, but I want them to be present in the finished structs. The shapes could be deeply nested in another structure that is loaded, so it's very messy to try to go down the structures in a mutable fashion and set it on the structs manually.

scottmcm · 2017-04-19T03:43:12Z

Perhaps a silly question, but why an attribute instead of a trait? I pictured:

trait SerdeFinalize: Sized {
    fn finalize<E:Error>(self) -> Result<Self, E>;
}

#[derive(Serialize, Deserialize)]
#[serde(Finalize)]
struct GreaterThan10 {
    n: usize,
}

impl SerdeFinalize for GreaterThan10 {
    fn finalize<E:Error>(self) -> Result<Self, E> {
        if self.n > 10 {
            Ok(self)
        } else {
            Err(E::invalid_value(Unexpected::Unsigned(self.n as u64), &"a number above 10"))
        }
    }
}

(And maybe #[serde(Finalize)] would no longer be needed once specialization is stable and everything could get a default no-op finalizer that's overridable if desired.)

TedDriggs · 2017-04-20T20:13:22Z

I've been working on something related to this for derive_builder. My assumption has been that the two-step process:

Gather data from some outside source into an intermediate struct
Finalize the domain type (perform semantic validations, generate cached values, RAII, etc.)

should be split across two different types, and that the sidecar type could be generated through a combination of macros, i.e. deriving a Builder which in turn derives Deserialize. In that world, field-level validations would be handled through types, rather than the attributes in the validator project.

I've run into a few hurdles so far:

I don't know of a clean way to invoke TryFrom<...> that doesn't require writing manual deserialization code. This makes fallible conversions to domain-specific types (like "Credit Card" or "Email") very tedious to write. I'm hoping that the stabilization of TryFrom will pave the way for a more reusable solution.
Surfacing errors of many different types to the outer caller requires boxing or other approaches that complicate error inspection.
Getting multiple errors during a single deserialization isn't possible.

lucab · 2017-04-28T08:12:53Z

Cross-referencing Keats/validator#22.

It looks like one of the main concerns there is how to bubble up structured/detailed errors when validation/finalization fails. If anybody has usecases/opinions, it would be nice to feed them back there.

TedDriggs · 2017-05-12T17:51:43Z

There seem to be two situations for this:

After deserialization, perform an infallible transformation `FnOnce(T) -> T
After deserialization, perform a fallible transformation FnOnce(T) -> Result<T, E: serde::de::Error>

From a naming perspective, these seem exactly like map and and_then. It probably shouldn't be valid to declare both on one type/field.

@lucab I've been going back and forth on the errors front. Having that function return a serde Error is really useful for composition with this feature; maybe users can handle that conversion in their own code?

alexreg · 2017-08-17T01:47:55Z

Any progress on this lately?

Keats · 2017-08-18T07:31:00Z

It would be nice to have a hook like the one mentioned by @scottmcm. With validator, I could do something like

#[derive(Debug, Validate, Deserialize)]
#[serde(Finalize)]
struct SignupData {
    #[validate(email)]
    mail: String,
    #[validate(url)]
    site: String,
    #[validate(length(min = "1"), custom = "validate_unique_username")]
    #[serde(rename = "firstName")]
    first_name: String,
    #[validate(range(min = "18", max = "20"))]
    age: u32,
}

// This 
impl SerdeFinalize for SignupData {
    fn finalize<ValidationErrors>(self, deserialization_errors: Option<HashMap<Cow<String>, serde::de::Error>>) -> Result<Self, ValidationErrors> {
      if let Some(de_errors) = deserialization_errors {
           // format errors and return without doing any validation?
      }
      // the current Validate::validate fn would be impl here instead
    }
}

// And use it later
let data: SignupData = serde_json::from_str(data)?;

This way it's a one step action for users. The part I'm not sure about is if a user wants to hook into the serde error messages to add i18n for example or to have only one kind of error. I believe https://docs.serde.rs/serde/de/trait.Error.html would be enough for that but those errors will need to be passed to whatever Finalize function is so the validation library can format them as seen in the snippet above (type completely made up and probably incorrect).

dtolnay · 2017-12-17T16:25:17Z

We are brainstorming some ways to generalize this in #1118.

bstrie · 2019-08-27T18:45:25Z

Note that the try_from attribute on containers may often suffice for this ( https://serde.rs/container-attrs.html#from ). This issue's original example can be written as so:

#[derive(Serialize, Deserialize)]
#[serde(try_from="usize")]
struct GreaterThan10(usize);

impl TryFrom<usize> for GreaterThan10 {
    type Error = &'static str;
    fn try_from(value: usize) -> Result<Self, Self::Error> {
        if value > 10 {
            Ok(GreaterThan10(value))
        } else {
            Err("value not greater than 10")
        }
    }
}

menixator · 2020-08-29T11:17:51Z

@bstrie's workaround is pretty good. For custom structs, you can do make a shadow type, implement deserialize on it, and do validations within the TryFrom implementation.

use serde::Deserialize;
use serde_json;
// The target is to not allow deserialization if option1 & option2 are none
#[derive(Deserialize, Debug)]
#[serde(try_from = "MyTypeShadow")]
pub struct MyType {
    option1: Option<usize>,
    option2: Option<usize>,
}

// The shadow type only has to implement Deserialize
#[derive(Deserialize)]
struct MyTypeShadow {
    option1: Option<usize>,
    option2: Option<usize>,
}

pub struct MyTypeValidationError;

// The error type has to implement Display
impl std::fmt::Display for MyTypeValidationError {
    fn fmt(&self, formatter: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(formatter, "option1 and option2 cannot be null")
    }
}

impl std::convert::TryFrom<MyTypeShadow> for MyType {
    type Error = MyTypeValidationError;
    fn try_from(shadow: MyTypeShadow) -> Result<Self, Self::Error> {
        let MyTypeShadow { option1, option2 } = shadow;
        if option1.is_none() && option2.is_none() {
            return Err(MyTypeValidationError);
        }
        // Any other validations
        Ok(MyType { option1, option2 })
    }
}

fn main() {
    // This will return an Err
    println!("{:?}", serde_json::from_str::<MyType>(r##"{}"##));
    // This will work
    println!(
        "{:?}",
        serde_json::from_str::<MyType>(r##"{"option1": 20}"##)
    );
}

sffc · 2022-05-23T15:50:45Z

Would a PR implementing the approach in the OP (#642 (comment)) be accepted (a finalizer= attribute on the struct level), or is this issue blocked on something else?

MolotovCherry · 2022-06-16T18:51:26Z

A finalizer fn for deserialization is also useful for when you were forced to do a #[serde(skip)] on a field due to it being from a 3rd party lib with no serialization support, so you have to reinitialize the data once the struct is deserialized (or some other post-setup code like a field's value that is skipped, but derives its value at runtime from another serialized field)

kangalio · 2022-10-26T20:47:13Z

Another use case: Discord API libraries serenity and twilight. Some JSON payloads omit important data from nested objects because it can be inferred from top-level fields. But when deserializing into a Rust type, we want to include that important data in the nested object itself.

Serenity's current workaround is a manual Deserialize impl that deserializes into a serde_json::Value, patches that, and then continues to deserialize into the target type (1). In other places, it manually parses from the patched serde_json::Value into the target type directly (2, 3)

Twilight's current workaround is a full blown 710LOC manual Deserialize impl (1)

nathan-at-least · 2023-05-08T19:42:11Z

Just skimmed over this ticket, and IIUC, there's no built-in support for this in serde 1.0.x, correct?

The TryFrom pattern in #642 (comment) looks like a great option for my use case (which is post-deserialization validation).

Because this pattern seems useful and somewhat common, I just submitted serde-rs/serde-rs.github.io#148 to add it to https://serde.rs .

- Introduce Serializable trait, which I think will solve the problem of ephemeral struct fields getting reconstituted when we deserialize a Serde struct. (Note serde-rs/serde#642, which could eventually supersede this stuff.) - Added a few more hardcoded [MidiNote]s. - More [MusicalTime] math ops.

erickt mentioned this issue Dec 17, 2016

Add faster serde serialization, sacrificing invariant-safety for speed in release mode; test that it roudtrips servo/rust-url#252

Closed

dtolnay added the enhancement label Dec 23, 2016

dtolnay mentioned this issue Jan 7, 2017

Action immediately after deserialization? #626

Closed

lucab mentioned this issue Apr 27, 2017

serde: support for validation during deserialization Keats/validator#22

Closed

lucab mentioned this issue May 30, 2017

Support a #[serde(validate = "some_function")] attribute on fields #939

Open

dtolnay mentioned this issue Jul 24, 2017

Provide a way to "finalize" deserializaton #998

Closed

MaksymZavershynskyi mentioned this issue Mar 11, 2019

Cache hash for receipt block near/nearcore#700

Merged

axos88 mentioned this issue Apr 27, 2019

Verify deserialized value #1515

Closed

niklasf mentioned this issue Jan 27, 2021

Combination of Deserialize and from_str_unchecked() is unsound gz/rust-cpuid#43

Closed

gunvirranu mentioned this issue May 6, 2021

Add Serde support gunvirranu/block-grid#67

Merged

xlc mentioned this issue Jul 2, 2021

Allow somehow validate the decoded output paritytech/parity-scale-codec#280

Open

ggutoski mentioned this issue Oct 5, 2021

Automatically validate deserialized structs axelarnetwork/tofn#177

Closed

schneems mentioned this issue Oct 19, 2021

StackId should support the "any" stack ("*") heroku/libcnb.rs#104

Closed

leaty mentioned this issue Dec 2, 2021

Default value for socket? leaty/rkeep#10

Open

generall mentioned this issue Feb 10, 2022

[geo] validate coordinates in API qdrant/qdrant#294

Closed

cbgbt mentioned this issue Jun 6, 2022

Allow cluster-dns-ip to be set to a list of IP Addresses bottlerocket-os/bottlerocket#2176

Merged

5 tasks

kangalio mentioned this issue Nov 3, 2022

serde(with) for containers #1118

Open

t-nelson mentioned this issue Jan 19, 2023

adds new contact-info with forward compatible sockets solana-labs/solana#29596

Merged

nathan-at-least mentioned this issue May 8, 2023

Add an example for arbitrary type validation upon deserialization. serde-rs/serde-rs.github.io#148

Open

Nutomic mentioned this issue Oct 3, 2023

Compatibility with Mastodon private messages LemmyNet/lemmy#2657

Closed

qdot mentioned this issue Apr 21, 2024

Implement type shadowing and try_from for loading configurations from file buttplugio/buttplug#625

Open

Shadow53 mentioned this issue Nov 26, 2024

Custom extra validation on ParseFromJSON poem-web/poem#918

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add finalizer attribute hook to validate a deserialized structure #642

Add finalizer attribute hook to validate a deserialized structure #642

erickt commented Dec 17, 2016

dtolnay commented Dec 28, 2016

radix commented Jan 4, 2017

dtolnay commented Jan 7, 2017

oli-obk commented Mar 29, 2017

beamspease commented Apr 16, 2017

Mange commented Apr 17, 2017

scottmcm commented Apr 19, 2017

TedDriggs commented Apr 20, 2017

lucab commented Apr 28, 2017

TedDriggs commented May 12, 2017

alexreg commented Aug 17, 2017

Keats commented Aug 18, 2017

dtolnay commented Dec 17, 2017

bstrie commented Aug 27, 2019

menixator commented Aug 29, 2020 •

edited

Loading

sffc commented May 23, 2022

MolotovCherry commented Jun 16, 2022 •

edited

Loading

kangalio commented Oct 26, 2022 •

edited

Loading

nathan-at-least commented May 8, 2023 •

edited

Loading

Add finalizer attribute hook to validate a deserialized structure #642

Add finalizer attribute hook to validate a deserialized structure #642

Comments

erickt commented Dec 17, 2016

dtolnay commented Dec 28, 2016

radix commented Jan 4, 2017

dtolnay commented Jan 7, 2017

oli-obk commented Mar 29, 2017

beamspease commented Apr 16, 2017

Mange commented Apr 17, 2017

scottmcm commented Apr 19, 2017

TedDriggs commented Apr 20, 2017

lucab commented Apr 28, 2017

TedDriggs commented May 12, 2017

alexreg commented Aug 17, 2017

Keats commented Aug 18, 2017

dtolnay commented Dec 17, 2017

bstrie commented Aug 27, 2019

menixator commented Aug 29, 2020 • edited Loading

sffc commented May 23, 2022

MolotovCherry commented Jun 16, 2022 • edited Loading

kangalio commented Oct 26, 2022 • edited Loading

nathan-at-least commented May 8, 2023 • edited Loading

menixator commented Aug 29, 2020 •

edited

Loading

MolotovCherry commented Jun 16, 2022 •

edited

Loading

kangalio commented Oct 26, 2022 •

edited

Loading

nathan-at-least commented May 8, 2023 •

edited

Loading