Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repeated oneof support? #2592

Closed
Xopherus opened this issue Jan 13, 2017 · 16 comments
Closed

repeated oneof support? #2592

Xopherus opened this issue Jan 13, 2017 · 16 comments

Comments

@Xopherus
Copy link

I have a use case where I need to develop protobuf specs for certain models in the Facebook API. They have one particular model (Post) which has the following pattern:

{
    # references Profile model
    # description: Information about the Profile that posted the message.
    "from": {
        "id": "user1"
    },
    
    # references Profile model
    # description: Profiles mentioned or targeted in this post.
    "to": [
        {"id": "user2"},
        {"id": "page1"},
        {"id": "user3"}
    ]
}

From the docs, a Profile could be one of 5 different types, and each of those have their own specs. Naturally, that is a perfect fit for the oneof type in proto3. Here's how I would like to write the spec:

message Post {
    string id = 1;
    string description = 2;
    string link = 3;
    string message = 4;
    string name = 5;

    # this is possible
    oneof from {
        Page page = 6
        User user = 7;
    }

    # i'd like to be able to do this
    repeated oneof to {
        Page page = 8
        User user = 9;
       // Event, Group, etc.
    }
    
    message Page {
        string id = 1;
    }
    message User {
        string id = 1;
    }
}

Now there are other workarounds that we can use, but they basically require us to deviate from the original spec. So when we have to serialize JSON data to protobuf, we have to do it manually (i.e. without using a built-in JSON-to-Proto parser (e.g. python). This becomes particularly time-consuming as there are a lot of fields in each model and the Facebook API has dozens and dozens of models. Adding more custom code to handle serialization of these edge cases takes is possible, but I'd like to find a cleaner solution.

Is there a particular reason why it's not currently possible to have this feature? The docs mention it's not possible but I haven't figured out why.

@VolkerKamin
Copy link

Is it impossible?
"You then add your oneof fields to the oneof definition. You can add fields of any type, but cannot use repeated fields." (https://developers.google.com/protocol-buffers/docs/proto#oneof)
On the outside "repeated" should be legal.

From a wire-encoding point of view, there is no reason not to allow it. "repeated" already sets the semantics that additional fields need to be collected (instead of overwriting prior values).

Note: your code will not work because "page" and "user" are part of "from" and "to". But "oneof" members are in the namespace of the message! Here you have two "page" elements and two "user" elements in the namespace of "Post".

@liujisi
Copy link
Contributor

liujisi commented Mar 6, 2017

Repeated oneof was in the initial proposal but as we later found out, there were lots many complicated corner cases in wire and API. We decided not to add repeated for oneof.

You can always define a message which wraps a oneof and have a repeated field of that message instead.

@liujisi liujisi closed this as completed Mar 6, 2017
@prismofeverything
Copy link

I just want to be able to have one of the fields in the oneof be repeated:

oneof action {
    string get = 1;
    repeated string gather = 2;
    ....
}

Wrapping a repeated value in another message is verbose and creates unwieldy schemas. Since the key is unique there is no ambiguity about which value to use. It seems like a strange constraint that also prevents me from writing protobuffer schemas for existing data structures (which have no problem with lists living under keys).

Any insight into why this is disallowed? It is very limiting and ultimately prevents me from using protobuffer in many cases where it would be useful.

Thanks for considering this!

@acozzette
Copy link
Member

I wasn't around for the initial design of the oneof feature, but my guess would be that the repeated-inside-oneof capability was left out just because the extra complexity was not considered worth it given that you can just use a wrapper message to accomplish roughly the same thing (even if it's a little more verbose).

@prismofeverything
Copy link

I understand that it may make the implementation more complex, but what it's done is create a whole class of message-shapes that cannot be expressed in protobuffer.

For instance, one of the huge use cases I see for protobuffer is to take an existing implicit schema that has already been in use for some time in a large system and define an explicit schema that can validate all of these messages flying around and bring some order to this madness. I feel this is a pretty common concern that leads people to a schema language like this.

So then the important question becomes: given any arbitrary pre-existing message shape, can I write a protobuffer schema that expresses it? It turns out that having arrays live under keys is an extremely common scenario that comes up... all the time. Not being able to put an array under a key at the top level rules out what may be one of the most common patterns found in the wild. So do you tell these people that they have to start nesting their lists under yet-another-key? And that they have to change all of this code, existing in many different places (some in scary legacy places that no one wants to touch), and go through the long debugging process of things that settled long ago and we don't want to mess with?

It seems to me this limits the applicability of protobuffer to an unreasonable degree. I can understand if you are just generating schemas from scratch, then you have no concern for nesting things however you want. If you have the luxury of creating something without regard for pre-existing conditions, maybe life is okay. But in the vast world of software persisting far beyond when it should be reasonably put down and forgotten, I feel the greatest value and potential of protobuffer is currently lost.

It seems a goal of protobuffer would be the ability to express the range of possible message shapes, but I guess that is not the case?

Thanks @acozzette for the response!

@Xopherus
Copy link
Author

That was my initial observation as well @prismofeverything : that a common use case for protobuf is to replicate an already existing schema (i.e. JSON). Doesn't mean protobuf is wrong (or that JSON is right), but it would be a nice feature to have just to make the JSON-to-protobuf conversion seamless. Fortunately I realized that all of the models which could be mapped within the to field have a common subset of attributes (id and name) so I could just make a placeholder model for that subset instead. Unfortunately that probably will not work for everyone.

For now I'd say the best you can do is try and define well-known patterns for handling this situation. Say your JSON looks like this:

// mymodel
{
  "id": 1,
  "tags": [
    
    // model A
    {
      "modelAfieldA": 2,
      "modelAfieldB": "hello,world"
    },
    // model B
    {
      "modelBfieldA": "foobar",
      "modelBfieldB": [1, 2,3]
    },

    // model C
    {
      "modelCfieldA": {"id": 1}
    }
  ]
}

One way you can structure your protobuf spec is like this:

message MyModel {
  string id = 1;
   
  repeated tags_modelA = 2;
  repeated tags_modelB = 3;
  repeated tags_modelC = 4;
}

message ModelA {
  int32 modelAfieldA = 1;
  string modelAfieldB = 2;
}

message ModelB {
  string modelBfieldA = 1;
  repeated int32 modelBfieldB = 2;
}

message ModelC {
  ModelCFieldA modelCfieldA = 1;

  message ModelCFieldA {
    int32 id = 1;
  }
}

It will require more manual effort on your part during the conversion because you'll have to try and match each tag to one of your predefined models, but assuming the JSON schema doesn't change often and you have good test coverage around it, it will work fine for your needs.

@acozzette
Copy link
Member

@prismofeverything That makes sense that it's often useful to create protobuf messages that match existing schemas which are already in use, but still I think we have to weigh the benefit of supporting more kinds of schemas against the cost of adding that support. Adding this kind of feature has a heavy cost because it has to be done independently for the ~10 languages we support, and when you consider that for some languages we're working on multiple implementations, this is logic that we would probably have to implement 15 times over just for this small feature, and then we have to commit to maintaining it forever.

I would say that another thing you could try in this situation is to just not use the oneof. E.g. instead of trying this:

oneof action {
    string get = 1;
    repeated string gather = 2;
    ....
}

Just have the two fields at the top level without a oneof:

string get = 1;
repeated string gather = 2;

Then protobuf won't enforce the invariant that only one or the other is set, but you can always enforce this in your own application logic and keep the same basic message structure you wanted to represent.

@ckiosidis
Copy link

ckiosidis commented Feb 13, 2018

You could wrap the oneof around an object:

message Post {
    ...

    repeated ToWrapper toWrapper = 1;
   
    message ToWrapper {
        oneof to {
            Page page = 8
            User user = 9;
            // Event, Group, etc.
        }
    }
    ...
}

@ghost
Copy link

ghost commented Oct 14, 2020

@acozzette @ckiosidis @jskeet @TeBoring @jtattermusch @haberman

Such issue is still a pain.

I try to have both packed (to be more space efficient) and repeated primitives types, in a oneof context.

I try the following approach :

main.proto:

message Object {
    
    string name = 1;
    repeated ArrayValue values = 2 [packed=true] (????);
    
}

message ArrayValue {
    oneof {
        string str_values = 1;
        double d_values = 2;
    }
}

However, it is impossible to have packed on things other that primitives types. So I expect to have space overhead on each value, because of the ArrayValue boxing message. Which is not compatible with the huge dataset I have to serialize.

So, in order to be space efficient, what I plan to do, is to have nested serialization with multiples proto files, like that :

main.proto:

    message Object {
        
        string name = 1;
        string data_type = 2;
        bytes array_data = 3;
        
    }

stringarray.proto:

    message RepeatedString {
        
        repeated string myarray [packed=true] = 1;
        
    }

doublearray.proto:

    message RepeatedDouble {
        
        repeated double myarray [packed=true] = 1;
        
    }

As you guess array_data will contains an other protobuff serialized blob.

Do you think it is a valid approach ?

Do you have better way to do that ?

Thanks

@isc30
Copy link

isc30 commented Mar 4, 2021

TLDR

message VehicleSearchResult {
    repeated VehicleSearchEntity entities = 1;
}

message VehicleSearchEntity {
    oneof entity {
        Car car = 1;
        Motorbike moto = 2;
    }
}

accessed via:

if (result.entities[0].car) { ... }
else if (result.entities[0].moto) { ... }

@alexagriffith
Copy link

TLDR

message VehicleSearchResult {
    repeated SearchEntity entities = 1;
}

message VehicleSearchEntity {
    oneof entity {
        Car car = 1;
        Motorbike moto = 2;
    }
}

accessed via:

if (result.entities[0].car) { ... }
else if (result.entities[0].moto) { ... }

is repeated SearchEntity entities = 1; on line 2 supposed to be repeated VehicleSearchEntity entities = 1;?
I'm running into this issue and trying to follow what you did as well. Just wanted to point this out.

@isc30
Copy link

isc30 commented Mar 12, 2021

Thanks @alexagriffith, fixed!

@yiakwy-mapping-team
Copy link

yiakwy-mapping-team commented May 19, 2021

@isc30 @Xopherus @VolkerKamin @liujisi @acozzette @alexagriffith I think the problem is not fully resolved, and could you have a look at this relevant issue #8628?

@naquin
Copy link

naquin commented Dec 16, 2021

As noted by @prismofeverything in 2017, support for repeated fields in oneof would be a tremendous design tool. The engineering effort to support this would be well worth the effort.

Still worth reconsidering.

@JasonPolisAdmin
Copy link

@liujisi - What's just one of the complicated corner cases ?

@kb-sp
Copy link

kb-sp commented Aug 15, 2023

Something to keep in mind is that protobuf is a wire protocol. Not only does that speak to a panoply of corner cases when you think about the wire format, but protobuf isn't a "schema language". I think of things like AVRO in those circumstances if you need a non-NP-complete schema shape ;), or just JSONSchema if you need some flexibility. Fast and flexible rarely go hand in hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests