-
Notifications
You must be signed in to change notification settings - Fork 15.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repeated oneof support? #2592
Comments
Is it impossible? From a wire-encoding point of view, there is no reason not to allow it. "repeated" already sets the semantics that additional fields need to be collected (instead of overwriting prior values). Note: your code will not work because "page" and "user" are part of "from" and "to". But "oneof" members are in the namespace of the message! Here you have two "page" elements and two "user" elements in the namespace of "Post". |
Repeated oneof was in the initial proposal but as we later found out, there were lots many complicated corner cases in wire and API. We decided not to add repeated for oneof. You can always define a message which wraps a oneof and have a repeated field of that message instead. |
I just want to be able to have one of the fields in the oneof be repeated:
Wrapping a repeated value in another message is verbose and creates unwieldy schemas. Since the key is unique there is no ambiguity about which value to use. It seems like a strange constraint that also prevents me from writing protobuffer schemas for existing data structures (which have no problem with lists living under keys). Any insight into why this is disallowed? It is very limiting and ultimately prevents me from using protobuffer in many cases where it would be useful. Thanks for considering this! |
I wasn't around for the initial design of the oneof feature, but my guess would be that the repeated-inside-oneof capability was left out just because the extra complexity was not considered worth it given that you can just use a wrapper message to accomplish roughly the same thing (even if it's a little more verbose). |
I understand that it may make the implementation more complex, but what it's done is create a whole class of message-shapes that cannot be expressed in protobuffer. For instance, one of the huge use cases I see for protobuffer is to take an existing implicit schema that has already been in use for some time in a large system and define an explicit schema that can validate all of these messages flying around and bring some order to this madness. I feel this is a pretty common concern that leads people to a schema language like this. So then the important question becomes: given any arbitrary pre-existing message shape, can I write a protobuffer schema that expresses it? It turns out that having arrays live under keys is an extremely common scenario that comes up... all the time. Not being able to put an array under a key at the top level rules out what may be one of the most common patterns found in the wild. So do you tell these people that they have to start nesting their lists under yet-another-key? And that they have to change all of this code, existing in many different places (some in scary legacy places that no one wants to touch), and go through the long debugging process of things that settled long ago and we don't want to mess with? It seems to me this limits the applicability of protobuffer to an unreasonable degree. I can understand if you are just generating schemas from scratch, then you have no concern for nesting things however you want. If you have the luxury of creating something without regard for pre-existing conditions, maybe life is okay. But in the vast world of software persisting far beyond when it should be reasonably put down and forgotten, I feel the greatest value and potential of protobuffer is currently lost. It seems a goal of protobuffer would be the ability to express the range of possible message shapes, but I guess that is not the case? Thanks @acozzette for the response! |
That was my initial observation as well @prismofeverything : that a common use case for protobuf is to replicate an already existing schema (i.e. JSON). Doesn't mean protobuf is wrong (or that JSON is right), but it would be a nice feature to have just to make the JSON-to-protobuf conversion seamless. Fortunately I realized that all of the models which could be mapped within the For now I'd say the best you can do is try and define well-known patterns for handling this situation. Say your JSON looks like this:
One way you can structure your protobuf spec is like this:
It will require more manual effort on your part during the conversion because you'll have to try and match each tag to one of your predefined models, but assuming the JSON schema doesn't change often and you have good test coverage around it, it will work fine for your needs. |
@prismofeverything That makes sense that it's often useful to create protobuf messages that match existing schemas which are already in use, but still I think we have to weigh the benefit of supporting more kinds of schemas against the cost of adding that support. Adding this kind of feature has a heavy cost because it has to be done independently for the ~10 languages we support, and when you consider that for some languages we're working on multiple implementations, this is logic that we would probably have to implement 15 times over just for this small feature, and then we have to commit to maintaining it forever. I would say that another thing you could try in this situation is to just not use the oneof. E.g. instead of trying this:
Just have the two fields at the top level without a oneof:
Then protobuf won't enforce the invariant that only one or the other is set, but you can always enforce this in your own application logic and keep the same basic message structure you wanted to represent. |
You could wrap the oneof around an object:
|
@acozzette @ckiosidis @jskeet @TeBoring @jtattermusch @haberman Such issue is still a pain. I try to have both I try the following approach : main.proto:
However, it is impossible to have packed on things other that primitives types. So I expect to have space overhead on each value, because of the ArrayValue boxing message. Which is not compatible with the huge dataset I have to serialize. So, in order to be space efficient, what I plan to do, is to have nested serialization with multiples proto files, like that : main.proto:
stringarray.proto:
doublearray.proto:
As you guess Do you think it is a valid approach ? Do you have better way to do that ? Thanks |
TLDRmessage VehicleSearchResult {
repeated VehicleSearchEntity entities = 1;
}
message VehicleSearchEntity {
oneof entity {
Car car = 1;
Motorbike moto = 2;
}
} accessed via: if (result.entities[0].car) { ... }
else if (result.entities[0].moto) { ... } |
is |
Thanks @alexagriffith, fixed! |
@isc30 @Xopherus @VolkerKamin @liujisi @acozzette @alexagriffith I think the problem is not fully resolved, and could you have a look at this relevant issue #8628? |
As noted by @prismofeverything in 2017, support for repeated fields in oneof would be a tremendous design tool. The engineering effort to support this would be well worth the effort. Still worth reconsidering. |
@liujisi - What's just one of the complicated corner cases ? |
Something to keep in mind is that protobuf is a wire protocol. Not only does that speak to a panoply of corner cases when you think about the wire format, but protobuf isn't a "schema language". I think of things like AVRO in those circumstances if you need a non-NP-complete schema shape ;), or just JSONSchema if you need some flexibility. Fast and flexible rarely go hand in hand. |
I have a use case where I need to develop protobuf specs for certain models in the Facebook API. They have one particular model (Post) which has the following pattern:
From the docs, a Profile could be one of 5 different types, and each of those have their own specs. Naturally, that is a perfect fit for the
oneof
type in proto3. Here's how I would like to write the spec:Now there are other workarounds that we can use, but they basically require us to deviate from the original spec. So when we have to serialize JSON data to protobuf, we have to do it manually (i.e. without using a built-in JSON-to-Proto parser (e.g. python). This becomes particularly time-consuming as there are a lot of fields in each model and the Facebook API has dozens and dozens of models. Adding more custom code to handle serialization of these edge cases takes is possible, but I'd like to find a cleaner solution.
Is there a particular reason why it's not currently possible to have this feature? The docs mention it's not possible but I haven't figured out why.
The text was updated successfully, but these errors were encountered: