-
-
Notifications
You must be signed in to change notification settings - Fork 784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid lossy buffering in #[serde(flatten)] #2186
Comments
This is a very interesting idea. Currently, the I think that can be fixed by adding an It is unclear to me how this could be extended to more than one flattened field. Output
|
We should be able to handle this by making a custom error type used by the MapAccess implementation that intercepts unknown field errors and delegates the rest to the inner MapAccess's error type.
Wow, I was not aware that was a thing you could do! I think you are stuck buffering there to allow the key value to be deserialized multiple times unfortunately. We could continue using the old logic in that case. |
Hmm - we can't actually filter the errors since the error will terminate the deserialization of the inner value entirely... EDIT: Ah - we may be able to take an approach something like what FlatStructAccess does of remembering the delegate's field list and checking that in our ExtendedFooFieldsSeed. |
On the other hand, the docs on serde.rs do explicitly state that |
Here's a working implementation of that approach: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=b9b5b874c6f4c93d6d82c1bed4e6451e. If you did want to allow unknown fields in |
I think, that using a json is a bad example to demonstrate absence of "lossy deserialization", because JSON contains some information about value type. Try some untyped format, like XML (but there are no good XML serde adapter, so using some toy format would be better) |
I coincidentally made a fully working implementation of this this evening before even seeing this thread! You don't need to depend at all on looking for particular deserialize errors / missing_field notifications from the flattened inner type, because you already know ahead of time what the names of the fields are that you're interested in. You can minimize codegen by creating a new, private trait (I called mine pub enum KeyOutcome<T> {
Accepted(T),
Rejected,
}
pub trait KeyCapture<'de> {
// Returned from `try_send_key` to communicate state to `send_value`.
type Token;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result;
fn try_send_key<E>(&mut self, key: &[u8]) -> Result<KeyOutcome<Self::Token>, E>
where
E: de::Error;
fn send_value<D>(&mut self, token: Self::Token, value: D) -> Result<(), D::Error>
where
D: de::Deserializer<'de>;
} In the struct macro codegen, you implement this trait such that it "rejects" keys that are not part of the struct; these keys can then be transparently forwarded to the type being flattened. Once you have this trait, you can use a series of private, reusable adapter types to deserialize the inner flattened field while simultaneously sending data into the // From MapAccess:
fn next_key_seed<K>(&mut self, mut seed: K) -> Result<Option<K::Value>, Self::Error>
where
K: de::DeserializeSeed<'de>,
{
// We're extracting a key for the type. Try to send keys to the `KeyCapture`,
// and the first key that is rejected by the `KeyCapture` are is instead to the
// `seed`
loop {
seed = match self.map.next_key_seed(ExtractKeySeed {
seed,
capture: &mut self.capture,
})? {
None => {
self.drained = true;
return Ok(None);
}
Some(ExtractKeySeedResult::Deserialized(value)) => return Ok(Some(value)),
Some(ExtractKeySeedResult::Captured { seed, token }) => {
self.map.next_value_seed(ExtractValueSeed {
token,
capture: &mut self.capture,
})?;
seed
}
};
}
}
// From Visitor:
fn visit_str<E>(self, v: &str) -> Result<Self::Value, E>
where
E: de::Error,
{
match self.capture.try_send_key(v.as_bytes())? {
KeyOutcome::Accepted(token) => Ok(ExtractKeySeedResult::Captured {
seed: self.seed,
token,
}),
KeyOutcome::Rejected => self
.seed
.deserialize(v.into_deserializer())
.map(ExtractKeySeedResult::Deserialized),
}
} Here's what an example codegen might look like for a type called struct Data2 {
key3: String,
// #[serde(flatten)]
inner: Data1,
}
impl<'de> de::Deserialize<'de> for Data2 {
fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
where
D: serde::Deserializer<'de>,
{
enum Field {
key3
}
#[derive(Default)]
struct Data2Capture {
key3: Option<String>,
}
impl<'de, 'a> KeyCapture<'de> for &'a mut Data2Capture {
type Token = Field;
fn expecting(&self, formatter: &mut std::fmt::Formatter) -> std::fmt::Result {
write!(formatter, "a Data2 struct")
}
#[inline]
fn try_send_key<E>(&mut self, key: &[u8]) -> Result<KeyOutcome<Field>, E>
where
E: de::Error,
{
Ok(match key {
b"key3" => KeyOutcome::Accepted(Field::key3),
_ => KeyOutcome::Rejected,
})
}
#[inline]
fn send_value<D>(&mut self, token: Self::Token, value: D) -> Result<(), D::Error>
where
D: de::Deserializer<'de>,
{
match token {
Field::key3 => de::Deserialize::deserialize(value).map(|value| {
self.key3 = Some(value);
}),
}
}
}
let mut capture = Data2Capture::default();
let inner = Data1::deserialize(ExtractDeserializer {
deserializer,
capture: &mut capture,
})?;
let key3 = match capture.key3 {
Some(key3) => key3,
None => return Err(de::Error::missing_field("key3")),
};
Ok(Self { inner, key3 })
} I was additionally able to determine that it is outright impossible (without buffering) to have more than one flattened field. In order to pull that off, you'd need to implement
You can put together a series of wrapper types and get pretty far, but eventually you end up at this point:
It's basically the same reason that you can't unpack an |
Completed implementation for consideration: https://github.com/Lucretiel/serde-bufferless |
I think it is better to fix this generically, not just for the case with one flattened field, because in that case it creates a friction when users realising that small change in definitions of their types will broke deserialization. To achieve that, changes only on deserializer side is not enough. I work on this problem, and you can see my unfinished work at master...Mingun:flatten. The actual changes begins from 1071b39, the other is a prepare work. The key idea in that
I do not remember the exact status of readiness of this, but I remember that only one of the enum representation options remained to be implemented (seems untagged representation). Chosen approach has downsides, but their are not so big:
The two scenarios is following:
|
Ha, reinvented the same wheel (deserializing with just one flat field without buffers) recently and only now discovered this thread. I suppose I should try to switch to serde-bufferless linked above instead. |
@Lucretiel Any chance you'd want to add a proc-macro to your repo and publish it to crates.io? |
I’ve taken a similar approach to @Mingun, this issue cannot really be solved without API changes. The deserializers need to expose what fields they can take, and they need to accept fields individually rather than an entire map. I’ve added a pub trait DeserializeMap<'de>: Deserialize<'de> {
type Visitor: MapVisitor<'de, Value = Self>;
fn visitor() -> Self::Visitor;
}
pub trait MapVisitor<'de> {
type Value;
fn accepts_field(field: &str) -> bool;
fn list_fields(list: &mut Vec<&'static str>);
fn visit_field<D>(self, field: &str, deserializer: D) -> Result<Self, D::Error>
where
Self: Sized,
D: serde::de::Deserializer<'de>;
fn finalize<E>(self) -> Result<Self::Value, E>
where
E: serde::de::Error;
} Note: While The Implementation here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=ce2619418043d821e8bc67d1fc8609a4 I’ve got derive macros implemented as well but these are specialized for my use case. |
The
Deserialize
implementation of a type with a field annotated#[serde(flatten)]
currently buffers up all of the unknown map entries in memory, and then deserializes the flattened value from them at the end. This approach has a couple of downsides:Fortunately, there's alternate approach that avoids all of these issues! Instead of buffering, we can create a custom
MapAccess
which intercepts fields that we want to handle in the top-level type, and then returning the remainder to the nested deserialize implementation: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=96546d7ba882b70042268a0b48de6286The text was updated successfully, but these errors were encountered: