Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support #[serde(default)] for struct fields at the end of a struct #179

Closed
andrewjstone opened this issue Apr 30, 2017 · 10 comments
Closed

Comments

@andrewjstone
Copy link

Is it possible to support #[serde(default)] in structs? This would allow adding new fields to structs and still having the new code be able to read old structs missing those fields.

I'm currently getting the following error:

--- deserialize_old_version_bincode stdout ----
	thread 'deserialize_old_version_bincode' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Error { repr: Custom(Custom { kind: UnexpectedEof, error: StringError("failed to fill whole buffer") }) })', /buildslave/rust-buildbot/slave/nightly-dist-rustc-linux/build/src/libcore/result.rs:845
note: Run with `RUST_BACKTRACE=1` for a backtrace.

Let me state up front that I'm not familiar with the bincode format, but using the following logic, I think it may be possible.

I understand why using #[serde(default)] may be a problem if re-ordering of struct fields was done, since the types could either be the same, in which case the wrong field gets deserialized, or different in which case an unexpected error is returned. However, I think if default fields are only allowed to be added at the end of structs, deserialization should work fine. If the input ends early, meaning the rest of the fields aren't there, the defaults are used. This allows backwards compatibility that is very useful when building distributed systems.

Conversely, for forwards compatibility, any new fields at the end of a struct that weren't understood could just be ignored.

@tikue
Copy link
Contributor

tikue commented May 11, 2017

How would this work for nested structs?

@TyOverby
Copy link
Collaborator

TyOverby commented May 11, 2017

Bincode doesn't serialize any information that would allow it to determine if a field has been serialized or not. It assumes that everything has been serialized.

@andrewjstone
Copy link
Author

@tikue, I'm not familiar enough with the internals to know how nested structs would work. Presumably if the whole struct is missing a default would be used. If part of the nested struct is missing, that part would be defaulted as well.

@andrewjstone
Copy link
Author

It's a shame that bincode can't support this. It makes it much less useful for building long running distributed systems where all nodes can't be taken offline and upgraded at once in order to add a new field. The only option it seems is to add a new message, and additionally use a header identifying the message globally with an integer.

@tikue
Copy link
Contributor

tikue commented May 11, 2017

Unfortunately I think supporting api evolution would require fundamental changes to the binary format. Fields would have to be tagged, which goes against the "compact as possible" goal. I'd say it's probably just not the snuggest fit for large distributed systems. There are other serde-compatible binary formats that support API evolution -- have you looked into serde_cbor?

@andrewjstone
Copy link
Author

andrewjstone commented May 11, 2017 via email

@TyOverby
Copy link
Collaborator

I'm terribly sorry for what I consider to be a lack of good communication and documentation on my part.

Because bincode doesn't include any metadata about what is or isn't serialized, making backwards compatible structure changes is impossible. When I made bincode I designed it for use in cross-process communication from the same codebase, or for game communication where it is expected that everyone will be on the same version of the game.

@andrewjstone
Copy link
Author

andrewjstone commented May 12, 2017 via email

@dtolnay
Copy link
Contributor

dtolnay commented May 12, 2017

making backwards compatible structure changes is impossible.

I don't think this is accurate. Here is one possible way to solve the more general problem of changing schema (not limited to adding new fields at the end) in Bincode.

#[macro_use]
extern crate serde_derive;

extern crate serde;
extern crate bincode;

use std::fmt;

use bincode::Infinite;
use serde::de::{self, Deserialize, Deserializer, Visitor, SeqAccess};
use serde::ser::{Serialize, Serializer, SerializeTuple};

macro_rules! versioned_schema {
    ($($version:tt => $structure:ident,)+) => {
        #[derive(Debug)]
        enum Schema {
            $(
                $structure($structure),
            )+
        }

        impl Serialize for Schema {
            fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
                where S: Serializer
            {
                let mut tuple = serializer.serialize_tuple(2)?;
                match *self {
                    $(
                        Schema::$structure(ref v) => {
                            // serialize the version as one byte
                            tuple.serialize_element(&($version as u8))?;
                            // serialize the content
                            tuple.serialize_element(v)?;
                        }
                    )+
                }
                tuple.end()
            }
        }

        impl<'de> Deserialize<'de> for Schema
            where $($structure: Deserialize<'de>),+
        {
            fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
                where D: Deserializer<'de>
            {
                struct MyVisitor;

                impl<'de> Visitor<'de> for MyVisitor {
                    type Value = Schema;

                    fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
                        formatter.write_str("versioned schema")
                    }

                    fn visit_seq<S>(self, mut seq: S) -> Result<Self::Value, S::Error>
                        where S: SeqAccess<'de>
                    {
                        // deserialize the version as one byte
                        match seq.next_element::<u8>()?
                                 .ok_or_else(|| {
                                     de::Error::custom("missing schema version")
                                 })?
                        {
                            $(
                                $version => {
                                    // deserialize the content
                                    seq.next_element()?
                                       .ok_or_else(|| de::Error::custom(
                                            concat!("missing data for version ", $version)
                                       ))
                                       .map(Schema::$structure)
                                }
                            )+
                            other => {
                                Err(de::Error::custom(format_args!("unknown version: {}", other)))
                            }
                        }
                    }
                }

                deserializer.deserialize_tuple(2, MyVisitor)
            }
        }
    };
}

versioned_schema! {
    0 => Original,
    1 => Split,
    2 => Bytes,
}

#[derive(Serialize, Deserialize, Debug)]
struct Original {
    andrewjstone: u64,
}

#[derive(Serialize, Deserialize, Debug)]
struct Split {
    andrew: u32,
    j: u16,
    stone: u32,
}

#[derive(Serialize, Deserialize, Debug)]
struct Bytes {
    andrewjstone: Vec<u8>,
}

fn main() {
    let data = Schema::Original(Original { andrewjstone: 1 });
    let bytes = bincode::serialize(&data, Infinite).unwrap();
    println!("bytes: {:?}", bytes);

    let data: Schema = bincode::deserialize(&bytes).unwrap();
    println!("data: {:#?}", data);
}

@andrewjstone
Copy link
Author

Thanks @dtolnay. This does appear to solve the backwards compatibility at the cost of having to version each change. In that case, we could just have different messages as well. I was asking for a different goal of just adding new fields at the end of existing structs ala protobuf to make them extensible without versioning.

Even with the fixes for backwards compatibility here, they are not forward compatible because the new structures aren't known on older nodes and can't be decoded. This is fine for complete changes of protocol structures, when both sides have to negotiate versions and understand the changes. However, when you want to do something simple like add an extra metric or tag to a structure, the overhead of maintaining two separate messages and versioning the code becomes unreasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants