repeated oneof support? #2592

Xopherus · 2017-01-13T17:18:22Z

I have a use case where I need to develop protobuf specs for certain models in the Facebook API. They have one particular model (Post) which has the following pattern:

{
    # references Profile model
    # description: Information about the Profile that posted the message.
    "from": {
        "id": "user1"
    },
    
    # references Profile model
    # description: Profiles mentioned or targeted in this post.
    "to": [
        {"id": "user2"},
        {"id": "page1"},
        {"id": "user3"}
    ]
}

From the docs, a Profile could be one of 5 different types, and each of those have their own specs. Naturally, that is a perfect fit for the oneof type in proto3. Here's how I would like to write the spec:

message Post {
    string id = 1;
    string description = 2;
    string link = 3;
    string message = 4;
    string name = 5;

    # this is possible
    oneof from {
        Page page = 6
        User user = 7;
    }

    # i'd like to be able to do this
    repeated oneof to {
        Page page = 8
        User user = 9;
       // Event, Group, etc.
    }
    
    message Page {
        string id = 1;
    }
    message User {
        string id = 1;
    }
}

Now there are other workarounds that we can use, but they basically require us to deviate from the original spec. So when we have to serialize JSON data to protobuf, we have to do it manually (i.e. without using a built-in JSON-to-Proto parser (e.g. python). This becomes particularly time-consuming as there are a lot of fields in each model and the Facebook API has dozens and dozens of models. Adding more custom code to handle serialization of these edge cases takes is possible, but I'd like to find a cleaner solution.

Is there a particular reason why it's not currently possible to have this feature? The docs mention it's not possible but I haven't figured out why.

The text was updated successfully, but these errors were encountered:

VolkerKamin · 2017-01-19T17:36:25Z

Is it impossible?
"You then add your oneof fields to the oneof definition. You can add fields of any type, but cannot use repeated fields." (https://developers.google.com/protocol-buffers/docs/proto#oneof)
On the outside "repeated" should be legal.

From a wire-encoding point of view, there is no reason not to allow it. "repeated" already sets the semantics that additional fields need to be collected (instead of overwriting prior values).

Note: your code will not work because "page" and "user" are part of "from" and "to". But "oneof" members are in the namespace of the message! Here you have two "page" elements and two "user" elements in the namespace of "Post".

liujisi · 2017-03-06T21:34:22Z

Repeated oneof was in the initial proposal but as we later found out, there were lots many complicated corner cases in wire and API. We decided not to add repeated for oneof.

You can always define a message which wraps a oneof and have a repeated field of that message instead.

prismofeverything · 2017-04-13T18:08:06Z

I just want to be able to have one of the fields in the oneof be repeated:

oneof action {
    string get = 1;
    repeated string gather = 2;
    ....
}

Wrapping a repeated value in another message is verbose and creates unwieldy schemas. Since the key is unique there is no ambiguity about which value to use. It seems like a strange constraint that also prevents me from writing protobuffer schemas for existing data structures (which have no problem with lists living under keys).

Any insight into why this is disallowed? It is very limiting and ultimately prevents me from using protobuffer in many cases where it would be useful.

Thanks for considering this!

acozzette · 2017-04-14T23:15:01Z

I wasn't around for the initial design of the oneof feature, but my guess would be that the repeated-inside-oneof capability was left out just because the extra complexity was not considered worth it given that you can just use a wrapper message to accomplish roughly the same thing (even if it's a little more verbose).

prismofeverything · 2017-04-14T23:49:19Z

I understand that it may make the implementation more complex, but what it's done is create a whole class of message-shapes that cannot be expressed in protobuffer.

For instance, one of the huge use cases I see for protobuffer is to take an existing implicit schema that has already been in use for some time in a large system and define an explicit schema that can validate all of these messages flying around and bring some order to this madness. I feel this is a pretty common concern that leads people to a schema language like this.

So then the important question becomes: given any arbitrary pre-existing message shape, can I write a protobuffer schema that expresses it? It turns out that having arrays live under keys is an extremely common scenario that comes up... all the time. Not being able to put an array under a key at the top level rules out what may be one of the most common patterns found in the wild. So do you tell these people that they have to start nesting their lists under yet-another-key? And that they have to change all of this code, existing in many different places (some in scary legacy places that no one wants to touch), and go through the long debugging process of things that settled long ago and we don't want to mess with?

It seems to me this limits the applicability of protobuffer to an unreasonable degree. I can understand if you are just generating schemas from scratch, then you have no concern for nesting things however you want. If you have the luxury of creating something without regard for pre-existing conditions, maybe life is okay. But in the vast world of software persisting far beyond when it should be reasonably put down and forgotten, I feel the greatest value and potential of protobuffer is currently lost.

It seems a goal of protobuffer would be the ability to express the range of possible message shapes, but I guess that is not the case?

Thanks @acozzette for the response!

Xopherus · 2017-04-15T15:08:57Z

That was my initial observation as well @prismofeverything : that a common use case for protobuf is to replicate an already existing schema (i.e. JSON). Doesn't mean protobuf is wrong (or that JSON is right), but it would be a nice feature to have just to make the JSON-to-protobuf conversion seamless. Fortunately I realized that all of the models which could be mapped within the to field have a common subset of attributes (id and name) so I could just make a placeholder model for that subset instead. Unfortunately that probably will not work for everyone.

For now I'd say the best you can do is try and define well-known patterns for handling this situation. Say your JSON looks like this:

// mymodel
{
  "id": 1,
  "tags": [
    
    // model A
    {
      "modelAfieldA": 2,
      "modelAfieldB": "hello,world"
    },
    // model B
    {
      "modelBfieldA": "foobar",
      "modelBfieldB": [1, 2,3]
    },

    // model C
    {
      "modelCfieldA": {"id": 1}
    }
  ]
}

One way you can structure your protobuf spec is like this:

message MyModel {
  string id = 1;
   
  repeated tags_modelA = 2;
  repeated tags_modelB = 3;
  repeated tags_modelC = 4;
}

message ModelA {
  int32 modelAfieldA = 1;
  string modelAfieldB = 2;
}

message ModelB {
  string modelBfieldA = 1;
  repeated int32 modelBfieldB = 2;
}

message ModelC {
  ModelCFieldA modelCfieldA = 1;

  message ModelCFieldA {
    int32 id = 1;
  }
}

It will require more manual effort on your part during the conversion because you'll have to try and match each tag to one of your predefined models, but assuming the JSON schema doesn't change often and you have good test coverage around it, it will work fine for your needs.

acozzette · 2017-04-17T16:31:20Z

@prismofeverything That makes sense that it's often useful to create protobuf messages that match existing schemas which are already in use, but still I think we have to weigh the benefit of supporting more kinds of schemas against the cost of adding that support. Adding this kind of feature has a heavy cost because it has to be done independently for the ~10 languages we support, and when you consider that for some languages we're working on multiple implementations, this is logic that we would probably have to implement 15 times over just for this small feature, and then we have to commit to maintaining it forever.

I would say that another thing you could try in this situation is to just not use the oneof. E.g. instead of trying this:

oneof action {
    string get = 1;
    repeated string gather = 2;
    ....
}

Just have the two fields at the top level without a oneof:

string get = 1;
repeated string gather = 2;

Then protobuf won't enforce the invariant that only one or the other is set, but you can always enforce this in your own application logic and keep the same basic message structure you wanted to represent.

ckiosidis · 2018-02-13T15:58:47Z

You could wrap the oneof around an object:

message Post {
    ...

    repeated ToWrapper toWrapper = 1;
   
    message ToWrapper {
        oneof to {
            Page page = 8
            User user = 9;
            // Event, Group, etc.
        }
    }
    ...
}

ghost · 2020-10-14T17:41:42Z

@acozzette @ckiosidis @jskeet @TeBoring @jtattermusch @haberman

Such issue is still a pain.

I try to have both packed (to be more space efficient) and repeated primitives types, in a oneof context.

I try the following approach :

main.proto:

message Object {
    
    string name = 1;
    repeated ArrayValue values = 2 [packed=true] (????);
    
}

message ArrayValue {
    oneof {
        string str_values = 1;
        double d_values = 2;
    }
}

However, it is impossible to have packed on things other that primitives types. So I expect to have space overhead on each value, because of the ArrayValue boxing message. Which is not compatible with the huge dataset I have to serialize.

So, in order to be space efficient, what I plan to do, is to have nested serialization with multiples proto files, like that :

main.proto:

    message Object {
        
        string name = 1;
        string data_type = 2;
        bytes array_data = 3;
        
    }

stringarray.proto:

    message RepeatedString {
        
        repeated string myarray [packed=true] = 1;
        
    }

doublearray.proto:

    message RepeatedDouble {
        
        repeated double myarray [packed=true] = 1;
        
    }

As you guess array_data will contains an other protobuff serialized blob.

Do you think it is a valid approach ?

Do you have better way to do that ?

Thanks

isc30 · 2021-03-04T09:48:21Z

TLDR

message VehicleSearchResult {
    repeated VehicleSearchEntity entities = 1;
}

message VehicleSearchEntity {
    oneof entity {
        Car car = 1;
        Motorbike moto = 2;
    }
}

accessed via:

if (result.entities[0].car) { ... }
else if (result.entities[0].moto) { ... }

alexagriffith · 2021-03-12T21:42:57Z

TLDR

message VehicleSearchResult {
    repeated SearchEntity entities = 1;
}

message VehicleSearchEntity {
    oneof entity {
        Car car = 1;
        Motorbike moto = 2;
    }
}

accessed via:

if (result.entities[0].car) { ... }
else if (result.entities[0].moto) { ... }

is repeated SearchEntity entities = 1; on line 2 supposed to be repeated VehicleSearchEntity entities = 1;?
I'm running into this issue and trying to follow what you did as well. Just wanted to point this out.

isc30 · 2021-03-12T21:45:29Z

Thanks @alexagriffith, fixed!

yiakwy-mapping-team · 2021-05-19T07:28:26Z

@isc30 @Xopherus @VolkerKamin @liujisi @acozzette @alexagriffith I think the problem is not fully resolved, and could you have a look at this relevant issue #8628?

naquin · 2021-12-16T02:23:16Z

As noted by @prismofeverything in 2017, support for repeated fields in oneof would be a tremendous design tool. The engineering effort to support this would be well worth the effort.

Still worth reconsidering.

JasonPolisAdmin · 2022-01-17T11:32:17Z

@liujisi - What's just one of the complicated corner cases ?

kb-sp · 2023-08-15T22:05:11Z

Something to keep in mind is that protobuf is a wire protocol. Not only does that speak to a panoply of corner cases when you think about the wire format, but protobuf isn't a "schema language". I think of things like AVRO in those circumstances if you need a non-NP-complete schema shape ;), or just JSONSchema if you need some flexibility. Fast and flexible rarely go hand in hand.

liujisi closed this as completed Mar 6, 2017

dio mentioned this issue Apr 23, 2018

Allow healthcheck on different port for non-EDS cluster types envoyproxy/envoy#439

Closed

ilackarms mentioned this issue Sep 4, 2019

add RouteTable api k8sgateway/k8sgateway#1112

Merged

yiakwy-mapping-team mentioned this issue May 19, 2021

Repeated of message which wraps "oneof" filed is not packed. #8628

Closed

assaframan mentioned this issue Mar 29, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 5.0.0 assaframan/protobuf#24

Open

tsmith023 mentioned this issue Apr 23, 2024

Add an external gRPC method for getting tenant information weaviate/weaviate#4741

Merged

4 tasks

njhill mentioned this issue May 22, 2024

Add guided decoding to TGIS gRPC API IBM/vllm#31

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repeated oneof support? #2592

repeated oneof support? #2592

Xopherus commented Jan 13, 2017

VolkerKamin commented Jan 19, 2017

liujisi commented Mar 6, 2017

prismofeverything commented Apr 13, 2017

acozzette commented Apr 14, 2017

prismofeverything commented Apr 14, 2017

Xopherus commented Apr 15, 2017

acozzette commented Apr 17, 2017

ckiosidis commented Feb 13, 2018 •

edited

Loading

ghost commented Oct 14, 2020 •

edited by ghost

Loading

isc30 commented Mar 4, 2021 •

edited

Loading

alexagriffith commented Mar 12, 2021

TLDR

isc30 commented Mar 12, 2021

yiakwy-mapping-team commented May 19, 2021 •

edited

Loading

naquin commented Dec 16, 2021

JasonPolisAdmin commented Jan 17, 2022

kb-sp commented Aug 15, 2023

repeated oneof support? #2592

repeated oneof support? #2592

Comments

Xopherus commented Jan 13, 2017

VolkerKamin commented Jan 19, 2017

liujisi commented Mar 6, 2017

prismofeverything commented Apr 13, 2017

acozzette commented Apr 14, 2017

prismofeverything commented Apr 14, 2017

Xopherus commented Apr 15, 2017

acozzette commented Apr 17, 2017

ckiosidis commented Feb 13, 2018 • edited Loading

ghost commented Oct 14, 2020 • edited by ghost Loading

isc30 commented Mar 4, 2021 • edited Loading

TLDR

alexagriffith commented Mar 12, 2021

TLDR

isc30 commented Mar 12, 2021

yiakwy-mapping-team commented May 19, 2021 • edited Loading

naquin commented Dec 16, 2021

JasonPolisAdmin commented Jan 17, 2022

kb-sp commented Aug 15, 2023

ckiosidis commented Feb 13, 2018 •

edited

Loading

ghost commented Oct 14, 2020 •

edited by ghost

Loading

isc30 commented Mar 4, 2021 •

edited

Loading

yiakwy-mapping-team commented May 19, 2021 •

edited

Loading