Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add finalizer attribute hook to validate a deserialized structure #642

Open
erickt opened this issue Dec 17, 2016 · 19 comments
Open

Add finalizer attribute hook to validate a deserialized structure #642

erickt opened this issue Dec 17, 2016 · 19 comments

Comments

@erickt
Copy link
Member

erickt commented Dec 17, 2016

In servo/rust-url#252, @Manishearth is writing a serde deserializer for urls since parsing a string can be expensive. However, to do this safely from untrusted sources, they need to validate that the data is semantically correct. We unfortunately don't have an easy way of doing this without writing a custom deserializer. It'd be pretty simple to do as an attribute, however. Here's my idea:

#[derive(Serialize, Deserialize)]
#[serde(finalizer="check_greater_than_10")]
struct GreaterThan10 {
    n: usize,
}

fn greater_than_10<D>(item: GreaterThan10) -> Result<GreaterThan10, D::Error>
    where D: Deserializer
{
    if item.n > 10 {
        Ok(item)
    } else {
        Err(Error::invalid_value("less than or equal to 10"))
    }
}
@dtolnay
Copy link
Member

dtolnay commented Dec 28, 2016

This seems relevant: https://github.com/Keats/validator. Let's make sure to support this library in a nice way.

@radix
Copy link

radix commented Jan 4, 2017

My hacky workaround is to

  1. don't derive Deserialize for my main data type "DataType"
  2. copy my main data type to a "FakeDataType" which has the same fields, and make this one derive Deserialize
  3. implement Deserialize for DataType by:
    1. invoking deserialize to FakeDataType
    2. validate the result
    3. convert FakeDataType to DataType.

It's a dumb hack but at least it allows me to avoid rewriting all the deser code manually.

@dtolnay
Copy link
Member

dtolnay commented Jan 7, 2017

#626 has another use case.

I am on board with implementing exactly the approach in @erickt's comment.

@oli-obk
Copy link
Member

oli-obk commented Mar 29, 2017

Another use-case from irc by @alteous

I would like to obtain a maximum integer value from a tree of structures. The context is these values are offsets into arrays, and I want to check that they are all valid ahead of time. The integer values are wrapped in Index types.

@beamspease
Copy link

I believe I could also use this for an enum which is either "unknown" or some value fitting a specific pattern, but both are strings.

@Mange
Copy link

Mange commented Apr 17, 2017

I have the same usecase as in #626. I want to load shapes, and I also want to store some precomputed attributes on them that are derived from the other fields (for example radius_squared = radius * radius).

I don't want to serialize or deserialize these fields, but I want them to be present in the finished structs. The shapes could be deeply nested in another structure that is loaded, so it's very messy to try to go down the structures in a mutable fashion and set it on the structs manually.

@scottmcm
Copy link

Perhaps a silly question, but why an attribute instead of a trait? I pictured:

trait SerdeFinalize: Sized {
    fn finalize<E:Error>(self) -> Result<Self, E>;
}

#[derive(Serialize, Deserialize)]
#[serde(Finalize)]
struct GreaterThan10 {
    n: usize,
}

impl SerdeFinalize for GreaterThan10 {
    fn finalize<E:Error>(self) -> Result<Self, E> {
        if self.n > 10 {
            Ok(self)
        } else {
            Err(E::invalid_value(Unexpected::Unsigned(self.n as u64), &"a number above 10"))
        }
    }
}

(And maybe #[serde(Finalize)] would no longer be needed once specialization is stable and everything could get a default no-op finalizer that's overridable if desired.)

@TedDriggs
Copy link
Contributor

I've been working on something related to this for derive_builder. My assumption has been that the two-step process:

  1. Gather data from some outside source into an intermediate struct
  2. Finalize the domain type (perform semantic validations, generate cached values, RAII, etc.)

should be split across two different types, and that the sidecar type could be generated through a combination of macros, i.e. deriving a Builder which in turn derives Deserialize. In that world, field-level validations would be handled through types, rather than the attributes in the validator project.

I've run into a few hurdles so far:

  1. I don't know of a clean way to invoke TryFrom<...> that doesn't require writing manual deserialization code. This makes fallible conversions to domain-specific types (like "Credit Card" or "Email") very tedious to write. I'm hoping that the stabilization of TryFrom will pave the way for a more reusable solution.
  2. Surfacing errors of many different types to the outer caller requires boxing or other approaches that complicate error inspection.
  3. Getting multiple errors during a single deserialization isn't possible.

@lucab
Copy link

lucab commented Apr 28, 2017

Cross-referencing Keats/validator#22.

It looks like one of the main concerns there is how to bubble up structured/detailed errors when validation/finalization fails. If anybody has usecases/opinions, it would be nice to feed them back there.

@TedDriggs
Copy link
Contributor

There seem to be two situations for this:

  1. After deserialization, perform an infallible transformation `FnOnce(T) -> T
  2. After deserialization, perform a fallible transformation FnOnce(T) -> Result<T, E: serde::de::Error>

From a naming perspective, these seem exactly like map and and_then. It probably shouldn't be valid to declare both on one type/field.

@lucab I've been going back and forth on the errors front. Having that function return a serde Error is really useful for composition with this feature; maybe users can handle that conversion in their own code?

@alexreg
Copy link

alexreg commented Aug 17, 2017

Any progress on this lately?

@Keats
Copy link

Keats commented Aug 18, 2017

It would be nice to have a hook like the one mentioned by @scottmcm. With validator, I could do something like

#[derive(Debug, Validate, Deserialize)]
#[serde(Finalize)]
struct SignupData {
    #[validate(email)]
    mail: String,
    #[validate(url)]
    site: String,
    #[validate(length(min = "1"), custom = "validate_unique_username")]
    #[serde(rename = "firstName")]
    first_name: String,
    #[validate(range(min = "18", max = "20"))]
    age: u32,
}

// This 
impl SerdeFinalize for SignupData {
    fn finalize<ValidationErrors>(self, deserialization_errors: Option<HashMap<Cow<String>, serde::de::Error>>) -> Result<Self, ValidationErrors> {
      if let Some(de_errors) = deserialization_errors {
           // format errors and return without doing any validation?
      }
      // the current Validate::validate fn would be impl here instead
    }
}

// And use it later
let data: SignupData = serde_json::from_str(data)?;

This way it's a one step action for users. The part I'm not sure about is if a user wants to hook into the serde error messages to add i18n for example or to have only one kind of error. I believe https://docs.serde.rs/serde/de/trait.Error.html would be enough for that but those errors will need to be passed to whatever Finalize function is so the validation library can format them as seen in the snippet above (type completely made up and probably incorrect).

@dtolnay
Copy link
Member

dtolnay commented Dec 17, 2017

We are brainstorming some ways to generalize this in #1118.

@bstrie
Copy link

bstrie commented Aug 27, 2019

Note that the try_from attribute on containers may often suffice for this ( https://serde.rs/container-attrs.html#from ). This issue's original example can be written as so:

#[derive(Serialize, Deserialize)]
#[serde(try_from="usize")]
struct GreaterThan10(usize);

impl TryFrom<usize> for GreaterThan10 {
    type Error = &'static str;
    fn try_from(value: usize) -> Result<Self, Self::Error> {
        if value > 10 {
            Ok(GreaterThan10(value))
        } else {
            Err("value not greater than 10")
        }
    }
}

@menixator
Copy link

menixator commented Aug 29, 2020

@bstrie's workaround is pretty good. For custom structs, you can do make a shadow type, implement deserialize on it, and do validations within the TryFrom implementation.

use serde::Deserialize;
use serde_json;
// The target is to not allow deserialization if option1 & option2 are none
#[derive(Deserialize, Debug)]
#[serde(try_from = "MyTypeShadow")]
pub struct MyType {
    option1: Option<usize>,
    option2: Option<usize>,
}

// The shadow type only has to implement Deserialize
#[derive(Deserialize)]
struct MyTypeShadow {
    option1: Option<usize>,
    option2: Option<usize>,
}

pub struct MyTypeValidationError;

// The error type has to implement Display
impl std::fmt::Display for MyTypeValidationError {
    fn fmt(&self, formatter: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(formatter, "option1 and option2 cannot be null")
    }
}

impl std::convert::TryFrom<MyTypeShadow> for MyType {
    type Error = MyTypeValidationError;
    fn try_from(shadow: MyTypeShadow) -> Result<Self, Self::Error> {
        let MyTypeShadow { option1, option2 } = shadow;
        if option1.is_none() && option2.is_none() {
            return Err(MyTypeValidationError);
        }
        // Any other validations
        Ok(MyType { option1, option2 })
    }
}

fn main() {
    // This will return an Err
    println!("{:?}", serde_json::from_str::<MyType>(r##"{}"##));
    // This will work
    println!(
        "{:?}",
        serde_json::from_str::<MyType>(r##"{"option1": 20}"##)
    );
}

@sffc
Copy link

sffc commented May 23, 2022

Would a PR implementing the approach in the OP (#642 (comment)) be accepted (a finalizer= attribute on the struct level), or is this issue blocked on something else?

@MolotovCherry
Copy link

MolotovCherry commented Jun 16, 2022

A finalizer fn for deserialization is also useful for when you were forced to do a #[serde(skip)] on a field due to it being from a 3rd party lib with no serialization support, so you have to reinitialize the data once the struct is deserialized (or some other post-setup code like a field's value that is skipped, but derives its value at runtime from another serialized field)

@kangalio
Copy link

kangalio commented Oct 26, 2022

Another use case: Discord API libraries serenity and twilight. Some JSON payloads omit important data from nested objects because it can be inferred from top-level fields. But when deserializing into a Rust type, we want to include that important data in the nested object itself.

Serenity's current workaround is a manual Deserialize impl that deserializes into a serde_json::Value, patches that, and then continues to deserialize into the target type (1). In other places, it manually parses from the patched serde_json::Value into the target type directly (2, 3)

Twilight's current workaround is a full blown 710LOC manual Deserialize impl (1)

@nathan-at-least
Copy link

nathan-at-least commented May 8, 2023

Just skimmed over this ticket, and IIUC, there's no built-in support for this in serde 1.0.x, correct?

The TryFrom pattern in #642 (comment) looks like a great option for my use case (which is post-deserialization validation).

Because this pattern seems useful and somewhat common, I just submitted serde-rs/serde-rs.github.io#148 to add it to https://serde.rs .

sowbug added a commit to sowbug/groove that referenced this issue Jul 11, 2023
- Introduce Serializable trait, which I think will solve the problem of
ephemeral struct fields getting reconstituted when we deserialize a
Serde struct. (Note serde-rs/serde#642, which could
eventually supersede this stuff.)
- Added a few more hardcoded [MidiNote]s.
- More [MusicalTime] math ops.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests