Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatten #119

Closed
arnm opened this issue Jul 24, 2015 · 40 comments
Closed

Flatten #119

arnm opened this issue Jul 24, 2015 · 40 comments

Comments

@arnm
Copy link

arnm commented Jul 24, 2015

I am interacting with an API which returns pagination information on most of their requests. Here is an example:

{
    "limit": 25,
    "offset": 0,
    "total": 1,
    "users": [
        {...}
    ]
}

limit, offset, and total are present for many of the responses received. Here is the struct I deserialize the JSON response into:

#[derive(Debug, Serialize, Deserialize)]
pub struct Users {
    limit: i32,
    offset: i32,
    total: i32,
    users: Vec<User>
}

The problem that I'm having is that I don't know how I can avoid repeating those same 3 fields in other structs.

I would like to have something similar to this:

#[derive(Debug, Serialize, Deserialize)]
pub struct Pagination {
    offset: i32,
    limit: i32,
    total: i32
}

#[derive(Debug, Serialize, Deserialize)]
pub struct Users {
    pagination: Pagination,
    users: Vec<User>
}

Here serde would see that no field pagination was present but the type of the struct field is Pagination which contains fields which are present in the response.

I guess this could be solved with some kind of struct inheritance but Rust doesn't have that at the moment which restricts us to use composition.

Any ideas?

@arnm arnm changed the title JSON Deserialization Smarter JSON Deserialization Jul 24, 2015
@oli-obk
Copy link
Member

oli-obk commented Jul 24, 2015

i think the title should be "flattening of struct member of struct type" or sth... The issue with this kind of feature is the same that struct inheritance has: fields defined in both the parent and the child will collide. I guess it would be possible to implement some kind of #[serde(flatten)] attribute. Maybe for now it would be easier for you to create a macro that inserts those fields.

@arnm arnm changed the title Smarter JSON Deserialization flattening of struct member of struct type Jul 24, 2015
@oli-obk
Copy link
Member

oli-obk commented Jul 24, 2015

An example:

macro_rules! pagination{
    (pub struct $name:ident {$($field:ident : $ty:ty)*}) => (
        #[derive(Debug)]
        pub struct $name {
            offset: i32,
            limit: i32,
            total: i32,
            $($field : $ty),*
        }
    )
}

#[derive(Debug)]
struct User;

pagination!{
    pub struct Users {
        users: Vec<User>
    }
}

PlayPen

I don't think this is a serde-related issue, flattening could be useful for structs in general... There could be a crate for this

@Diggsey
Copy link

Diggsey commented Aug 1, 2015

You can do this the other way aroound:

struct Flattened<A, B>(pub A, pub B);

#[derive(Serialize, Deserialize, ...)]
struct Pagination {
    offset: i32,
    limit: i32,
    total: i32,
}

type Paginated<T> = Flattened<T, Pagination>;

Then manually implement Serialize/Deserialize for Flattened<A, B>, such that for each field you encounter, you first try to deserialize it to A, and then if that errors with UnknownFieldError, try to deserialize it to B. Serialization is just a case of merging the two instead.

@dtolnay dtolnay added the derive label Jun 13, 2016
@dtolnay dtolnay assigned dtolnay and unassigned dtolnay Jun 13, 2016
@oli-obk
Copy link
Member

oli-obk commented Dec 8, 2016

A use case from http://stackoverflow.com/q/41042767/1103681:

#[serde(rename)] seems to be the right option, but the documentation does not state if it is possible or how to do it.

This JSON object:

    {
       "name" : "myobject"
       "info" : 
       {
          "counter" : "3"
          "foo" : "bar"
       }
    }

The corresponding flat Rust struct should be:

    #[derive(Deserialize)]
    struct Object {
        name: String,
        #[serde(rename="info.counter")] // wrong syntax here !!
        count: i32,
        #[serde(rename="info::foo")] // neither this works
        foo: String,
    }

@dtolnay
Copy link
Member

dtolnay commented Dec 8, 2016

Looks like that is the opposite use case actually? That post has hierarchy in the JSON but not the Rust type. Here we have hierarchy in the Rust type but not the JSON. I guess we need to think about both.

@dtolnay
Copy link
Member

dtolnay commented Feb 9, 2017

We would also want to support flattening members of enum type. This came up in IRC:

struct File {
    name: String,
    path: String,
    #[serde(flatten, rename = "fType")]
    f_type: FileType,
}

enum FileType {
    #[serde(rename = "file")]
    File,
    #[serde(rename = "directory")]
    Directory {
        contents: Vec<File>,
    }
}

and on the JavaScript side the server expects:

{ "name": "...", "path": "...", "fType": "directory", "contents": [{ "..." }] }

@rpjohnst
Copy link

rpjohnst commented Feb 9, 2017

A similar use case, where only the enum tag is flattened:

struct Event {
    index: i32,
    time: u64,
    line: u32,
    #[serde(flatten, tag = "eType")]
    data: EventKind,
}

enum EventKind {
    State { state: Vec<...>, ... },
    Output { ... },
    Terminate { ... },
}

serializes to this:

{ "index": ..., "time": ..., "line": ..., "eType": "State", "data": { "state": [ ... ], ... } }

@anowell
Copy link

anowell commented Feb 22, 2017

I hit a similar case of wanting to flatten an enum variant, where I wanted to deserialize any string into this enum, opportunistically as one of the named variants, but falling back to Custom(String):

enum Model {
    ImageNet,
    ImageNetLinear,
    Bilinear,
    Custom(String),
}

I ended up just using a String field in the containing struct, but it seems like a similar #[serde(flatten)] attribute on the Custom(String) might accomplish my intent (assuming flattening a tuple struct makes sense in the more generic case).

@sanmai-NL
Copy link

Another real-world use case for this. Serializing from a nested struct in which each nested struct has a ‘superclass’ Option field. This seems to be required to serialize Rust structs to schema.org JSON-LD.

@dtolnay dtolnay changed the title flattening of struct member of struct type Flatten Apr 11, 2017
@dtolnay
Copy link
Member

dtolnay commented Apr 23, 2017

Here is a use case from "kaoD" that is currently more convenient to handle using rustc-serialize.
https://gist.github.com/anonymous/9ac364b7517cd787b1102b4849345620

@gyscos
Copy link

gyscos commented Jun 7, 2017

I think we're not short on use-cases.
This seem familiar to jackson's JsonUnwrapped (which can be used as an inspiration):
https://fasterxml.github.io/jackson-annotations/javadoc/2.2.0/com/fasterxml/jackson/annotation/JsonUnwrapped.html

"Unwrapped" has unfortunately different connotation in Rust.

@WiSaGaN
Copy link
Contributor

WiSaGaN commented Jun 15, 2017

It would be ideal for #[serde(flatten)] to play nice with #[serde(tag = "t", content = "t")] too.
Say I have some external json protocol that has sample messages like
{"protocol":"1001","data":{"d1001foo":"foo","d1001bar":"bar"},"version":"1"}
{"protocol":"1002","data":{"d1002foo":"foo","d1002bar":"bar"},"version":"1"}
I want to be able to write it as:

#[serde(Deserialize, Serialize)]
#[serde(tag = "protocol", content = "data")]
pub enum Payload {
    #[serde(rename="1001")]
    P1001{d1001foo: String, d1001bar: String},
    #[serde(rename="1002")]
    P1002{d1002foo: String, d1002bar: String},
}
#[serde(Deserialize, Serialize)]
pub struct Message {
    #[serde(flatten)]
    payload: Payload,
    version: String,
}

Seems to me there is no way to express this right now.

@dtolnay
Copy link
Member

dtolnay commented Jun 15, 2017

I think at this point we have enough use cases to start working on this. I would love a PR for flatten!

@TedDriggs
Copy link
Contributor

@dtolnay do you have any rough thoughts on how to approach this without macros v2? All the approaches I can think of need to know what fields are in the flattened object to generate the outer object's deserialize functions.

@oli-obk
Copy link
Member

oli-obk commented Jun 28, 2017

@TedDriggs: For serialization that's true, we'll need some more work like exposing some method that allows us to tell a struct to serialize its fields, but nothing else. Maybe we need to create some helper traits and implement them for everything that uses SerializeMap internally.

For deserialization you can forward all unknown fields to the field tagged with #[serde(flatten)]. There might be a few dances to be done in order to deserialize interleaved fields for two types at the same time, but it should be doable. It's certainly possible to add new methods with erroring default impls to Deserializer if some additional information exchange is needed between the first deserialization and the flattened one.

@Marwes
Copy link
Contributor

Marwes commented Jun 30, 2017

@oli-obk @TedDriggs

To make deserializing more concrete. As far as I can see there are two ways of implementing this.

The first is that unknown fields should be buffered into an intermediate structure as with untagged enums #739. Then this structure can then be passed as the Deserializer to the flattened field. It is maybe possible to reuse much of the code for untagged enums to implement this. The problem with that approach however is that it will only work for self-describing formats such as JSON.

The other approach is to add an additional trait DeserializeFlatten (or something like it) with some methods which we can forward unknown fields to as we find them.

trait DeserializeFlatten {
   type Value;
   fn init() -> Self;
   fn receive_field(&mut self, key: &str, deserializer: D) -> Result<()>;
   fn finalize(self) -> Result<Self::Value>;
}

This has the drawback flattened fields must implement this trait to work and while it can be implement as a part of any derive(Deserialize) that will obviously not work for manually implemented Deserialize instances.

@dtolnay
Copy link
Member

dtolnay commented Jun 30, 2017

Thanks for the thorough proposals! I would prefer to stick with the first way, like untagged enums. Supporting only self-describing formats is fine because in practice that's where people will care about controlling the representation this way.

@dtolnay
Copy link
Member

dtolnay commented Aug 28, 2017

@cmyr I opened #1028 to discuss a custom deserialize approach using existing functionality. Do you think this is a common pattern for JSON APIs? I have never seen it before. If it is not common, a custom Serialize and Deserialize implementation for JsonRpc in your codebase would be the way to go. If it is common, can you find some other APIs that work this way?

@cmyr
Copy link

cmyr commented Aug 28, 2017

@dtolnay This pattern is used in the JSON-RPC spec, which is used by the Language Server Protocol, and a few other places. The implementation I'm working on (for the xi editor) is derived from JSON-RPC. I have not seen this pattern outside of this context.

Other projects dealing with this protocol that I've been able to find tend to be doing custom deserialization, afaict[1].

Anyway: if there's a reasonable way for me to do this with a custom Deserialize (and your post suggests there is!) than that's good enough for me. Thanks!

  1. https://paritytech.github.io/jsonrpc/src/jsonrpc_core/types/request.rs.html#72

@Ralith
Copy link

Ralith commented Sep 10, 2017

A simple #[serde(flatten)] attribute applicable to struct-typed fields would have substantially simplified dimforge/nalgebra#251, wherein the serialization of nalgebra's (in some cases deeply nested) newtypes was fixed to forward to the innermost struct to avoid silly and user-hostile toml serializations like [[[[1, 2, 3, 4]]]] for certain cases of 1-dimensional data.

@joshlf
Copy link

joshlf commented Oct 4, 2017

@dtolnay @cmyr Another example of the "does this field exist" pattern can be seen in the GDAX "full channel" API. For example, the "change" message can either have new_size and old_size fields (indicating a limit order) or instead have new_funds and old_funds fields (indicating a market order). Similarly, the price field can be either not present or be null, which also indicates a market order.

I think that right now this could be solved with the untagged enum representation, but I'm not sure how it'd compose with this proposed flatten feature (which, btw, I'm very in favor of - it'd probably solve this problem that I have).

@dtolnay
Copy link
Member

dtolnay commented Nov 6, 2017

Do people have preferences for how flatten is treated in non-self-describing formats? In particular, consider this scenario:

struct A {
    a: String,
    #[serde(flatten)]
    inner: B,
}

enum B {
    B1(String),
    B2(String),
}

Without flatten this is serialized to JSON as:

{
    "a": "...",
    "inner": {
        "B1": "..."
    }
}

So intuitively this would flatten to:

{
    "a": "...",
    "B1": "...",
}

But a compact serializer like Bincode could choose not to serialize struct keys, losing information about whether the variant is B1 or B2. Some possibilities:

  • Disallow externally tagged enums inside of flattened value. They would return an error at runtime. Internally tagged, adjacently tagged, and untagged enums would be allowed.
  • Serialize any struct containing flatten as a map instead of a struct, forcing compact formats to store the keys as strings.
  • Ignore flatten if the data format is not "human readable". (cc @Marwes)
  • Something else

@joshlf
Copy link

joshlf commented Nov 6, 2017

Disallow externally tagged enums inside of flattened value. They would return an error at runtime.

Would there be no way to detect this at compile time?

@dtolnay
Copy link
Member

dtolnay commented Nov 6, 2017

@joshlf the type that something serializes as is determined by the code in its Serialize impl, which is executed at runtime. I suppose there could be a marker trait that is implemented for types that pinky promise to serialize in a flattenable way, but bounds in generic code using flatten would be messy and it would be more complicated to use flatten with types that have handwritten Serialize impls. In general marker traits would make Serde feel like it needs more hand-holding rather than just doing the right thing. We already use runtime errors in similar cases such as int in an internally tagged newtype variant.

type Opaque = i32;

#[derive(Serialize, Deserialize)]
struct S {
    k: String,
}

#[derive(Serialize, Deserialize)]
#[serde(tag = "type")]
enum E {
    // okay, serializes as {"type":"A","k":"..."}
    A(S),
    // runtime error, what does this even mean? 123"type":"B"456
    B(Opaque),
}

@joshlf
Copy link

joshlf commented Nov 6, 2017

Oh, right - I hadn't thought about custom impls. Nevermind!

@Marwes
Copy link
Contributor

Marwes commented Nov 7, 2017

@dtolnay

Ignore flatten if the data format is not "human readable". (cc @Marwes)

A format is free to skip writing struct keys regardless of whether it is "human readable" though so I don't see how that helps (sure, this could be documented, but I'd argue this would be fairly surprising behavior).

So intuitively this would flatten to: ...

Isn't the problem here only that flatten is applied directly to an externally tagged enum? I'd expect flatten to only work with "struct-like" types (which externally tagged enum's aren't, they have the "tag in the way of their contents").

So

Disallow externally tagged enums inside of flattened value. They would return an error at runtime. Internally tagged, adjacently tagged, and untagged enums would be allowed.

seems like the way to go to me.

@jethrogb
Copy link

jethrogb commented Nov 7, 2017

@dtolnay

But a compact serializer like Bincode could choose not to serialize struct keys, losing information about whether the variant is B1 or B2.

Doesn't it already do that for externally-tagged enums? This issue seems orthogonal to flattening.

@estokes
Copy link

estokes commented Jan 22, 2018

Another use case. In MongoDb we want to use the newtype pattern to distinguish between different document ids at compile time, however we can't easily do it with the vanilla newtype pattern because MongoDb doesn't allow us to use an array in the _id field. e.g. we want

#[derive(Serialize, Deserialize, Debug)] 
pub struct Id(ObjectId)

To completely strip the Id and just emit an ObjectId. Right now it emits an array containing the object id.

Note that in this particular case there is a work around that isn't too horrible,

#[derive(Serialize, Deserialize, Debug)]
#[serde(untagged)]
enum Id {
  Id(ObjectId)
}

Still it'd be nice to use the idiomatic newtype pattern.

@panicbit
Copy link

panicbit commented Jan 22, 2018

@estokes I think for your use case a simple workaround is to simply manually implement Serialize and just forward the calls to the inner value. It's probably a good idea to use a macro for that. (The same goes for Deserialize of course.)

@mitsuhiko
Copy link
Contributor

For what it's worth while working on implementing #941 I cam to the same conclusion that others voiced before: they are effectively the same feature and implementing flatten is what should be done. Working on changing this over to fully supporting flatten now.

@mitsuhiko
Copy link
Contributor

mitsuhiko commented Mar 15, 2018

I attempted to implement the proposal by @dtolnay from here but I can't make it work because from my understanding of what the adapter has to do it needs to "fake" the return value from SerializeMap::end which it cannot do.

I currently generate out something like this for the flattens in serialize (extra here is #[serde(flatten)]:

            let mut __serde_state = match _serde::Serializer::serialize_map(__serializer, None) {
                ::result::Result::Ok(val) => val,
                ::result::Result::Err(err) => {
                    return ::result::Result::Err(::convert::From::from(err))
                }
            };
            // ...
            match (&self.extra)
                .serialize(_serde::private::ser::FlatMapSerializer(&mut __serde_state))
            {
                ::result::Result::Ok(val) => val,
                ::result::Result::Err(err) => {
                    return ::result::Result::Err(::convert::From::from(err))
                }
            };
            _serde::ser::SerializeMap::end(__serde_state)

The FlatMapSerializer's job is to provide an adapter serializer that returns a new SerializeMapif one attempts to make one that dispatches to the underlyingSerializeMapbut ignores the calls toend(). The return value of that however is S::Okand cannot be changed. The signature ofendisfn end(self) -> Result<Self::Ok, Self::Error>`. The issue here is obviously that end can only ever be called once (which makes sense) and there is no other return value that can be provided to satisfy the adapter.

This makes me believe that a change to traits is necessary to make this work unless someone has something I missed.

@dtolnay
Copy link
Member

dtolnay commented Mar 15, 2018

From the sketch:

let mut map = serializer.serialize_map(None)?;
map.serialize_entry("outer1", &self.outer1)?;
self.inner.serialize(SerializeMapAdapter(&mut map))?;
self.other.serialize(SerializeMapAdapter(&mut map))?;
map.end()

I think we just want to ignore the Ok value from SerializeMapAdapter i.e. use () as the Ok value.

The return value of that however is S::Ok and cannot be changed.

We can't change the end() signature but we can change the choice of S::Ok right?

@mitsuhiko
Copy link
Contributor

Oh indeed. Since we're the only user of that there is no need to propagate the value.

@mitsuhiko mitsuhiko mentioned this issue Mar 16, 2018
8 tasks
@mitsuhiko
Copy link
Contributor

I have an implementation in #1179 for this now. Would love to get some input on it and suggestions of how to deal with the limitations outlined there. I think it's generally in a mergeable state feature wise but it might need some review from more experienced serde people. To support flattening enums with tag only or nesting flatten some bigger changes are needed which I won't work on until the approach is validated by someone else.

@dtolnay
Copy link
Member

dtolnay commented Mar 22, 2018

A flatten attribute has been implemented in #1179 and released in Serde 1.0.34. There are some related ideas discussed in this thread; let's file separate issues for those that have not been addressed yet by this initial implementation.

@dtolnay dtolnay closed this as completed Mar 22, 2018
@philippludwig

This comment has been minimized.

@mitsuhiko

This comment has been minimized.

@mpiannucci

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests