-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider proto.Any / proto compatible message for registered types #267
Comments
As another alternative, we can tweak the prefix bytes generation to make it proto field bytes. The available range of field numbers is |
Interesting!
There's also this issue: protobufjs/protobuf.js#435 I'm not very familiar with protobuf in general, so I'm not sure how to determine whether this is supported. |
Looks like it has at least some support in JS. https://github.com/protobufjs/protobuf.js#using-custom-classes |
@jordansexton I think proto.Any is well supported by now in JS implementations. I have not experimented with it a lot though, at least not in JS. @mossid Thanks! If I understand you correctly, you are suggesting to encode registered messages like this: message SomeRegisteredType {
bytes prefix = 1; // prefix & potentially disfix bytes
ActualMsg msg = 2; // actual concrete struct
}
message ActualMsg { /* ... */} or even (?): message SomeRegisteredType {
bytes prefix = 1; // prefix & potentially disfix bytes
bytes value = 2; // proto encoding of actual concrete struct
} The main benefit / motivation for this instead of |
Not exactly, but I think that will also work. message MessageContainingInterface {
oneof interf_message {
RegisteredTypeA rta = 3856223;
RegisteredTypeB rtb = 21998056;
}
} is closer, where the fields values are encoded into varlength prefix bytes. @liamsi And yes both alternatives will take much smaller size on encoded form than storing full type url. One problem with this alternative is that the size is limited to 29bit, so we cannot add disamb prefix when collision happens. Using large field numbers by default & use explicit prefix field when collision seems good combination of two methods. |
In your message example definition above: why would prefix bytes be necessary at all? My understanding is that |
After chatting with @mossid over lunch: he is indeed proposing to use the prefixes as the field number. Some things to consider:
https://developers.google.com/protocol-buffers/docs/proto#assigning-field-numbers I think collisions are less of a concern. Wouldn't we only need to prevent collisions per interface (oneof message)? These should be less likely than "global collisions". This approach would definitely work as well and could potentially save us some bytes on the wire. Thanks Joon! |
I think using The I think the first better fits with the vision of apps and app devs easily and autonomously can add their own types. Any thoughts on this? |
After discussing: We can define a new message type like |
Speaking in terms of protobuf this means introducing another wrapper type which is very similar to proto.Any. The wrapper would be message like this: message BytePrefixedAny {
bytes amino_prefix = 1; // the amino prefix bytes (default will be 4 bytes but this is not size limited and could also be with disambiguation bytes)
// Must be a valid serialized protocol buffer of the above specified type.
bytes value = 2;
} Instead of switching over the URL, users will need to switch over the prefix bytes; the prefix bytes will be computed by via the name as usual in amino: Lines 747 to 761 in dc14acf
We can provide minimalistic helpers for other languages that deal with these cases. |
Note: @mossid also suggested that the message could also look like this (to make sure the prefix bytes are indeed just 4 bytes). message BytePrefixedAny {
sfixed32 prefix = 1; // the amino prefix bytes
sfixed64 disfix = 2; // Disamb+Prefix
// Must be a valid serialized protocol buffer of the above specified type.
bytes value = 3;
} Joon is thinking of ways to further optimize this and will post his thoughts later. |
Found out that the fixed32 and fixed64 are efficient than varint only if the size is bigger than 3bytes 4bit and 7bytes, respectively, which is not for the prefix/disfix case, as each takes 3 bytes and 7 bytes. Using varint will be more efficient. message BytePrefixedAny {
oneof prefix {
uint32 prefix = 1;
uint64 disfix = 2;
}
bytes value = 3;
} The encoder will:
The decoder will:
|
I still vouch for the simplest form here from #267 (comment) message BytePrefixedAny {
bytes amino_prefix = 1; // the amino prefix bytes (default will be 4 bytes but this is not size limited and could also be with disambiguation bytes)
// Must be a valid serialized protocol buffer of the above specified type.
bytes value = 2;
} This won't waste any bytes either and this is similar to the current simple prefixing. |
It does though... there is an extra key-length (since amino_prefix and value are now two separate arrays), and the BytePrefixedAny itself is also length-prefixed, so there are 2 extra length prefix bytes, as well as the overhead of the top level BytePrefixedAny length prefixing. I have an idea on how to extend Amino so that we can use OneOf as well as support the existing prefix bytes. I think there is real merit to the existing design, and that it should be supported, as well as the ability to compress further via OneOf. |
That's absolutely right. What I meant was that it does not waste more bytes than necessary, e.g. compared to the separate encoding of pre- and disfix, like described here: #267 (comment), or like using a string based URL scheme (like in proto.Any).
Awesome! I would like to know more. Can you provide a simple protobuf message to explain that OneOf idea? |
Is something along these lines still going to happen? Has a decision on the best approach been made? I remember hearing several months ago that full proto3 compatibility was just around the corner, seems like it's still a WIP |
@aaronc We are still working to integrate it through the stack. We have also recently resourced this work again and we should see some movement here. |
If I understood the original intention, it was to have byte compatibility and the ability to (dynamically) export protobuf messages that correspond to amino structs, so it is easy to build client in multiple languages. Amino works great as long as all code using the data is in go. And porting amino to multiple languages seems to be a slow process... but maybe finally coming to javascript via jordan's work. It would be great to hear about the specific goal and design of the approach. And if there is someone resourced to work on it, maybe they can chime in. |
@ethanfrey the latest work I could find on this is this PR: #276. I have a bunch of thoughts on the current amino design and what is proposed in the above PR that I intend to comment on soon. Also, I have heard that @jordansexton has made progress on exporting amino types into typescript or at least has the typescript implementation working pretty well. Maybe he can comment. |
the current progress with related to this issue is: in the Tendermint codebase bez is aiming to solve this issue: tendermint/tendermint#4208, then we start back up again with testing throughout the stack and writing more tests soon after this we hope to cut all the necessary releases to get the changes into the SDK. If you have design recommendations, please post them here or in the PR soonish so we can address them and hopefully discuss them in detail if they vary to the current design. We started a migration doc for amino for the coming months, I hope to post that in this repo in the coming weeks. |
https://github.com/cosmos/amino-js seems to be working decently (based on user reports) for Hub 2 / gaia 1.x. The Go code it contains (for compiling with gopherJS) was manually stripped from the SDK and Tendermint as a proof of concept. The TypeScript types it contains were derived from these, but are fairly incomplete (only some of the interfaces have fields, not much is documented). I'm currently working on amino-js compatibility for Hub 3 / gaia 2, which involves generating complete Go code and TypeScript types from the SDK and Tendermint. Once this is done, I will backport it and generate for previous versions of the SDK. The changes to Amino to become Proto3 compatible should not affect the public API of amino-js. Eventually, once we have .proto files for the registered types, amino-js will presumably become a wrapper around generated Proto3 JS code, and gopherJS won't be used anymore. |
I have been writing some quite lengthy thoughts. We are just wrapping up a team retreat here but I see this conversation as quite important and definitely intend to post something this week.
Could you possibly point to where you're working on this export code @jordansexton ? |
Apologies for what turned out to be a very long post. I first want to say that I understand a lot of people have invested a lot of time in this project. I've met most of you in person and appreciate all of you, and I'm trying to approach this discussion from the standpoint of trying to enable a healthy decentralized network of interoperable protocols. The current design of amino, unfortunately, seems to be a big roadblock in the ecosystem. I see some of this discussion as a step in the right direction, but it's moving slowly and I'm not sure the things that need to get addressed are going to get addressed. I feel the need to speak up about it so that Regen Network and the other projects in the Cosmos ecosystem can have a solid foundational encoding layer that they can evolve upon. I first want to address this 4 vs 7 prefix bytes issue. The current implementation uses this and Say we have version 1 of a blockchain where none of the types need the extra 3 disamb bytes. We can add new types and as long as we don't do anything crazy like reordering fields in a struct (which is its own issue I'll address below), we can evolve our schema and add new types. Later versions of the blockchain will still be able to read the state stored by version 1 until we register enough types such that disamb bytes are now needed for some types. At this point, the new version of the blockchain will no longer be able to unambiguously read state from the old version that in the new version needs disamb bytes. The old version was fine with 4 bytes and stored those structs as such. Now the new version demands 7 bytes and fails to read the old structs (https://github.com/tendermint/go-amino/blob/master/codec.go#L407). This tradeoff is unacceptable. There is no infrastructure that I'm aware of that would alert anyone in testing as to whether the amino codec is going to switch from 4 to 7 bytes. None of this would be detected in any testing unless there was extensive testing of an upgrade from an earlier blockchain state that happened to come across this case. Likely this would show up later in production where all of the sudden some state is unreadable. I would like everyone to take a moment and consider what is necessary for a healthy network of interoperable, decentralized blockchains. I see a lot of discussion here about trying to optimize message size. Is it worth it to save 3 bytes some of the time if this means that protocols can break in arbitrary ways with no warning? There are a lot of other things besides message size that are important for a healthy decentralized network. I urge everyone to please put aside personal investments in particular approaches for a moment. We are trying to build the future of the internet here. Because there is no textual representation of a schema like a Here are some places where breaking backwards compatibility can cause issues in an IBC world:
The last one regarding smart contracts seems almost impossible to fix without more discipline upstream and could be super problematic. Coming back to this issue, I see this as being a step in the right direction. Making amino protobuf compatible will make client side development possible on many more platforms, and having a .proto file that can be extracted will make schema evolution possible so that breaking changes can be prevented with prototool break check. Potential SolutionsJust use
|
The design we ended up with in protocolbuffers/protobuf#276 was mainly because it was an explicit requirement to keep exactly the same public API go-amino is currently exposing together with it the 4 bytes prefix (the 7 bytes version is never used anyways as far as I know). As far as I remember this was mainly due to not wanting to change all the places where amino currently is used (and at the same time wanting full protobuf compatibility). Oh, and I think keeping the messages as short as possible was also an explicit requirement... I currently don't work on amino anymore but wanted to give this as some context. I agree with you TBH. I'm also in favour of just using Thanks for the well worded and thoughtful write-up! ❤️ |
Just wanted to raise awareness of proposal protocolbuffers/protobuf#302 (see just above this), still championing that until someone changes my mind about the justifications there. |
Currently "registered types" are done via prefixing the (proto) encoded struct bytes with its "prefix bytes" as described in the Readme.
An alternative we could consider is to use proto.Any:
Basically we would use the type URL to differentiate between the actual send bytes.
Pros:
Cons:
Update: The discussion (here and on slack) evolved quite a bit. It could be worthwhile to write our own proto wrapper for any registered type and keep the prefix bytes instead of a string based URL scheme (like proto.Any).
Here I summarize pros/cons of both approaches:
proto.Any
Pros:
Cons
our own proto message / BytePrefixedAny
Pros:
Cons:
to be continued
The text was updated successfully, but these errors were encountered: