Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute to control behavior when deserializing duplicate fields #690

Closed
dtolnay opened this issue Jan 15, 2017 · 14 comments
Closed

Attribute to control behavior when deserializing duplicate fields #690

dtolnay opened this issue Jan 15, 2017 · 14 comments

Comments

@dtolnay
Copy link
Member

dtolnay commented Jan 15, 2017

  • Take the first value
  • Take the last value
  • Error (current behavior and sensible default)
  • Aggregate into a vector

cc @Arnavion

@Arnavion
Copy link

It might make sense to only change serde_json to allow duplicate keys and take the last value, since it seems JSON doesn't standardize one way or the other and some other JSON deserializers in other languages allow it. But I guess the deserializer is abstracted enough to not allow this.

@dtolnay
Copy link
Member Author

dtolnay commented Jan 15, 2017

I would prefer to address this in serde_derive. That makes the behavior configurable, performant, and consistent across all formats.

This is the relevant part of the generated code: serde_codegen/src/de.rs

@Arnavion
Copy link

Sure, it just feels to me that "This format transparently allows duplicate keys" is a a property of the particular serialization format. It shouldn't be expected to be consistent across formats, and not something the user should have to make a decision on for each of their deserializable structs. The decision might even have different answers for different formats - it can be legal to allow in JSON, but not legal to allow for some other format.

Aside: Deserializing into a serde_json::Value seems to have the expected behavior of taking the last key.

@dtolnay
Copy link
Member Author

dtolnay commented Jan 16, 2017

Agreed, and this is one of the common tradeoffs we make in Serde all the time. An existing example is #[serde(rename="newName")]. Case conventions and other aspects of naming shouldn't be expected to be consistent across formats, and not something the user should have to make a decision on for each of their deserializable structs. Maybe all JSON field names should be camelCase by default and all TOML field names should be snake_case by default.

The fact that Serialize/Deserialize allows your data to be represented in any Serde format is nice, but not because people actually use it that way. The vast majority of Serialize structs are only ever used with a single format, as is the case for the one you are dealing with. The reason it is nice for Serialize/Deserialize to support every format is that the techniques you learn in one format are transferable across the rest of the ecosystem. When you eventually use a different format for a different project, there is very little you need to learn or re-learn. That is one reason to handle these things with attributes in serde_derive.

@Arnavion
Copy link

That makes sense.

@nickbabcock
Copy link

Came across this issue and felt like I should throw in a fourth possibility.

I have data format akin to JSON (but following JSON is 100% valid) where duplicate fields should be aggregated into a Vec<T> like so

{
    "core": "FOO",
    "core": "BAR"
}
#[derive(Deserialize, Debug)]
struct Cores {
    core: Vec<String>,
    // other fields ...
}

@dtolnay
Copy link
Member Author

dtolnay commented Sep 5, 2017

Thanks! I added it to the list.

@dtolnay
Copy link
Member Author

dtolnay commented Feb 1, 2019

This should be implemented outside of Serde as a Deserializer adapter -- similar to how serde-ignored or serde-stacker wrap an existing Deserializer into their own Deserializer with extra behavior. Possible usage:

let mut j = serde_json::Deserializer::from_str(input);
let de = WhenDuplicateField::keep_first(&mut j);
let t = T::deserialize(de)?;

@dpc
Copy link
Contributor

dpc commented Jan 15, 2020

If there ever exists a crate that would allow rejecting deserialization of data with duplicate keys, please ping me.

@tones111
Copy link

I believe you can get that behavior (error on duplicate key) if you deserialize into a struct. The repo linked in #1716 has a test that demonstrates this. The error looks like value: Error("duplicate field label", line: 4, column: 7)

@dpc
Copy link
Contributor

dpc commented Jan 16, 2020

@tones111 I see. But it isn't there for collections, like HashMap. Which I also want to use sometimes and rejecting duplicate keys is 99% of the time what I want.

@dpc
Copy link
Contributor

dpc commented Jan 16, 2020

Hmmm... Now that I think about it ... couldn't we just have a

trait SerdeMap<K, V> {
   fn insert(k: K, v: V) -> Result<OfSomeKinf>;
}

impl it for the default collections that serde currently supports, and change serde to insert via this trait, so then people could just write a NewTypes that implement SerdeMap differently?

I probably don't know what I'm talking about, but just crossed my mind.

@tones111
Copy link

@dpc I think what you're looking for is implemented in serde_with

check out maps_duplicate_key_is_error

@dpc
Copy link
Contributor

dpc commented Jan 17, 2020

@tones111 Thank you!

@serde-rs serde-rs locked and limited conversation to collaborators May 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

5 participants