-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Full Proto Encoding #5444
Comments
There is one concern with using protobufs for blockchain protocols. Protobuf supports arbitrary maps, but serialization of maps is not deterministic. And usually depends on the library and the language, some libraries sort maps before serializing them. But afaik there is no specification or decision for it, and developers must not depend on sorting even if it happens that library provides such feature. gogo-proto serializes map in the order of iteration, and hence it is almost guaranteed that two consecutive serialization will produce different results. easy to imagine all sort of bugs if someone will misuse it, consider example: syntax = "proto3";
message Block {
repeated Tx tx = 1;
}
message Tx {
map<string, string> OpCodes = 1;
} package types
import (
"crypto/ed25519"
"strconv"
"testing"
)
func TestNondeterministic(t *testing.T) {
tx := Tx{
OpCodes: map[string]string{},
}
block := Block{
Tx: []*Tx{&tx},
}
for i := 0; i < 100; i++ {
ia := strconv.Itoa(i)
tx.OpCodes[ia] = ia
}
data1, _ := block.Marshal()
data2, _ := block.Marshal()
pubk, pk, _ := ed25519.GenerateKey(nil)
sig := ed25519.Sign(pk, data1)
if !ed25519.Verify(pubk, data2, sig) {
t.Error("how come?")
}
} I know that ethereum 2 client developers opted out from using protobuf because of this reason, but the protocol requirements might be a bit different. not sure if there is a safe workaround, that will make protobuf less error prone for these use cases. |
i thought initially that the nondeterminism issue is only present if maps are used, but turns out there are more subtle cases https://gist.github.com/kchristidis/39c8b310fd9da43d515c4394c3cd9510 |
Thanks for sharing @dshulyak. We are aware of these concerns and gist you linked above. First of all, we are not intending to use maps at all but yes there are still potential issues. Here is my current take on what can be done with protobuf:
I would love it if someone else could comment on the relative pros and cons of the above approaches. If a canonical encoding is really important, I have been arguing that we should use Cap'n Proto where this is defined in the spec. The downsides for doing this now are a) less mature client support in general - some implementations are active, others less so, and b) we would likely need to fork and improve the go implementation - no json support, getters are hard to use. There are however other benefits like zero copy which I think are not completely negated by using a packed encoding. Anyway, if there were really strong argument for going this approach I could be convinced that the benefits outweigh the effort. For the time-being, my current thinking is that we can make protobuf work with one of the above approaches, but definitely welcome other input. |
We don't and will not use maps, so that's not even a concern. Essentially, I second what @aaronc stated. |
I was mainly worried that cosmos module (non-core) developers will be able to accidentally use encoding features, without full understanding what that can lead to. |
So I actually think even maps wouldn't be an issue if we figure out the correct signature scheme. Sorting map keys canonically would address this. We are currently leaning to approach (2) FYI. |
Cool stuff! Why do you say making go-amino proto compatible doesn't solve any problems? It doesn't solve performance, but doesn't it solve cross-client/language support (everyone else can use protobuf libs)? My main concern with a change like this is the impact on the codebase, in terms of large PRs and API changes. There also does seem to be desire to consider experiments with other encoding formats, and so I would hope this work will proceed in a direction that could facilitate that (so if someone later wanted to try cap n' proto, it would be easier than it is today). I think this means preserving something like a In general, gogo-proto seems like a fine direction. It's what we use in Tendermint for the ABCI and many popular projects use. But in Tendermint we don't use the proto generated types for business logic directly, only for communicating over the ABCI wire. We translate back and forth between the generated types and our Go types. While gogo-proto does have support for custom types, which in theory solve this, I recall this being a source of headaches when I experimented with it. I've found the proto generated types much less ergonomic, and I would consider depending on the output of their generators for your every day types to be a significant risk. So I would be a bit wary, and would probably recommend having distinct types for serialization and business logic and shuffling back and forth as necessary. Granted, shuffling back and forth between proto and native types can be a source of errors, but it can mitigate the dependency risk, and it can be very helpful for enforcing boundaries in the code and preserving APIs into the future in the case there may be changes, so it might be a net good thing. As for determinism, where can I learn about the gogo-proto deterministic mode? In theory the SDK was always supposed to come with some level of static analysis tooling, since you can already shoot yourself in the foot by doing things like using maps, iteration over which is randomized by Go. Users shouldn't use maps and we should some day have tooling to check for it, both in proto types and in Go types. As for unknown fields, I would expect this to be covered by a strict protocol versioning policy, such that where an unknown field would cause a problem (ie. it's expected to be in a hash), there would be a protocol version bump, so peers would already know they're incompatible. In any case, championing deterministic protobuf for use in blockchains and pushing upstream to gogo-proto seem like valiant causes. As for CanonicalJSON, seems right, and I think we basically already do that. But note for instance in Tendermint we don't and won't use JSON for validator signing because the performance impact is too large, there's no humans in the loop, and it's much more painful to verify in the EVM. |
AFAIR, gogo-proto's deterministic mode only deals with maps. Seems like nowadays this is achievable via the marhsalto plugin and the |
Is this under the context of necessary changes being made to amino like @liamsi's PR? Do we not see performance as a primary motivating factor? Overall, I'm just simply not optimistic about major features like this being upstreamed to amino any time soon. Not to mention the lack of domain knowledge and overall set of eyes on it.
Mine as well -- it's a very big risk. But all technical decisions come with tradeoffs. The question is, is this decision worth those tradeoffs?
What would this codec interface look like? Would it be unique per module's needs?
Can you shed some light on as to why this is a big risk and why you recommend doing this? What issues do you come across that would warrant this? As for determinism, protobuf provides us this already, no? The only thing that needs to be guaranteed deterministic outside of this context, is the tx bytes over which you sign and we're using canonical JSON for this. |
Yes
Sure.
Fair enough!
I think the change can be done iteratively and that doing so will probably be worth it.
I expect it would just have Marshal/Unmarshal methods, maybe one for Binary and one for JSON.
I don't remember the exact issues now. But I do remember it being very painful. Looking at the generated ABCI types, they're riddled with garbage fields, for starters. Maybe we're doing something wrong, but that's kind of the point, you become very dependent on the particulars of gogo-proto to get things working the way you want and that's an awfully tight coupling that is almost guaranteed to cause problems, especially for a large project that is itself a dependency of others. Perhaps this can be mitigated somewhat by looking at how other similar projects are using/doing things, but I think that investigation needs to happen first to derisk the effort. And regardless, I think having separate types for use in the code and use in serialization is still a new positive and good code hygiene. It also may enable these changes to proceed more incrementally.
So AFAIK there's three sources of serialization in the SDK:
The first one is most likely to be handled by code in many languages and so any differences in proto impls may cause certain clients to fail. Just flagging in, but as far as I understood it should work properly ... In sum, my main recomendation is that work be done to ensure this change can proceed incrementally as much as possible. This will both have the effect of improving the structure of the SDK codebase, it will make the work more manageable, and it should keep us flexible to experiment with other encoding schemes down the road |
Thanks @ebuchman for chiming in here! I want to say that I suspect the issue with gogo proto generated structs is mostly related to configuration. By default there are all these I agree it's possible to create an interface that looks sort of like amino and that will actually make the refactoring quicker with less code change. I will give it a try in #5478. I do think, however, that it's unrealistic to think that we can easily swap in or out some other encoding scheme. I spent some time exploring Cap'n proto in #5438 and will maybe comment more on it in a separate post. Basically there's going to be some overhead to whatever encoding scheme you choose and it's good to choose wisely. Maybe there are some levels where things are hot-swappable - i.e. we can support json, yaml, xml, etc. representations on some level. But what we are discussing here is a high performance binary encoding and that comes at a relatively high cost. This reminds me a bit of the phase of ORM's when people wanted to do everything they could to be compatible with all the SQL engines, only to later find out that vendor lock-in was 1) almost impossible to avoid and 2) not always a bad thing because embracing some of the vendor's custom stuff sometimes brought a lot of value. To me actually getting into some of these encoding decisions in practice feels more akin to marriage - you want to go in with eyes wide open and embrace what you're getting. There are subtle things at many levels of the system (how oneof's behave, determinism, etc.). Maybe it is worth discussing whether cap'n proto really would be the right solution, or something else. But having worked with these encodings for the past couple weeks, I think we're best off making the choice consciously and investing in the ecosystem of whatever we choose. With gogo proto, there is the issue of generating switch statements to convert |
Based on other discussions that are happening I want to share some additional thoughts that may help make this an incremental transition. We can separate where encoding is needed into client-facing and store-facing. These can actually be different things and evolve differently. Client EncodingCurrently Cosmos supports amino binary and json encoding and ripping these out would break all existing client integrations. With careful design we can likely maintain support for a client-facing amino API and add support for a client-facing protobuf SDK. These can likely live side by side. Maybe the existing amino API eventually becomes legacy but is still useful for the time-being for existing wallets, block explorers, etc. Even if these tools were mostly using the amino json (for both tx's and queries) and not binary format, changing the json still does break things. So how about we add a client-facing protobuf API without breaking the amino API? We could also add a Cap'n Proto API. There are ways to make these things live side-by-side (for instance we can prefix protobuf tx's with some magic bytes). We could even support different signature formats with flags. For instance, we could have flags to support all 3 of the protobuf signing formats I outlined above - deterministic binary, canonical JSON, and raw bytes. Store EncodingFor a given module, only a single store encoding scheme can be supported, but different modules within an app can actually use different encoding schemes. Some could use protobuf, others could use cap'n proto for example. I think protobuf is actually a pretty good choice for this level because it's relatively easy to ensure all implementations encode deterministically and protobuf provides tools for maintaining backwards compatibility. I want to re-empahisize that this is also a pretty important thing because changing the block structure in Tendermint makes it hard to upgrade smoothly (tendermint/tendermint#4078). Having proto files would allow us to flag deprecated fields as Just a footnote here... let's all keep in mind this is all dependent on how things go in practice. Certain things are nice in theory and don't play out so well in practice for various reasons. @alexanderbez and I have been trying things in practice and are learning about the gotchas. As I noted in my previous comment - some of the consequences of amino or proto in different places are larger than one might expect and we'll only know once we dig into the code... I would propose we give this incremental approach I'm presenting a try and see how it actually works in practice... |
Thanks Aaron, that's helpful.
We probably shouldn't do this, seems like unnecessary complexity, especially if it means we need to support all these on the hardware signers! Probably should stick with the JSON-based signing like we're doing now. The cosmos ledger app is awesome (thanks Juan! https://twitter.com/JTremback/status/1148756902370664448) |
Hello, Everyone. We (the CoinEx Chain Team) happen to be developing a serialization library named codon (https://github.com/coinexchain/codon) to replace go-amino. It is proto3-compatible and fast (use code generation instead of reflection). Besides, it keeps the same API as go-amino (almost), so the interference to Cosmos-SDK is minimized. If you want to integrate it into Cosmos-SDK, we'd be happy to help. |
Hey @wangkui0508, pretty neat! I took a look at the code and I don't really see any specification or anything like that. I know this is a direction @jaekwon wanted to take, so I would definitely recommend he take a look at it and work together with you. Can it generate proto messages/schemas? Note performance is not the only reason why we're looking to migrate. We need rich multi-language and cross-client support. That being said, I think going with protobuf encoding is still the best strategy forward atm. |
I am writing documents for it. Please wait another 12 hours. Before that, maybe you are interested in how it can be integrated into Cosmos-SDK. Here is a branch using it: https://github.com/coinexchain/cosmos-sdk/tree/use_codon It can generate proto schemas. But there are some bugs. Maybe this weekend I can fix these bugs. |
I really don't feel comfortable with this approach as I'd rather use something more battle-tested and with richer client support. Happy to have other's opinions on this though 👍 |
I can not fully understand why you are uncomfortable. codon uses protobuf3 encoding and it generates .proto files for you to use any battle-tested and richer protobuf implementations, in any programming language. Here is an example .proto file generated by codon: If you do not like the codec source codes generated by codon, you can use gogo-proto to generate them, with this types.proto file. At least codon avoids manually writing .proto files. I am still working on the document. Here is rough introduction: https://github.com/coinexchain/codon/blob/master/README.md |
@wangkui0508 codon has merit and the work done on it is great! I just have a hard time visualizing what it really buys us? We're using gogo-proto to write our proto files and we're doing so intentionally to utilize the rich set of features it provides us, so unless codon can hook in gogo proto somehow, what does it give us atm? I would also argue that we should take great care in writing the proto messages manually and not have the process automated as that may increase the surface area of bugs as developers are lazy. |
Well it is a matter of different trade-offs. It is OK to manually manage the proto messages and use the .pb.go files generated by gogo-proto when writing modules. If go-amino was not created and Cosmos-SDK used gogo-proto for serialization from the beginning, there would be no argue now. Because this method CAN WORK. But go-amino was created and Cosmos-SDK used it in many places. Later projects based on Cosmos-SDK, such as CoinEx Chain, also used go-amino. Go-amino do have advantages: simplicity and easy-to-use. You just register plain Go structs and interfaces and do not care much about serialization, because the codec will take care of it. When there are lots of code based on go-amino's API, migration will be a huge pain. In CoinEx Chain we developed a lot of modules. We must re-write a lot of files to migrate from go-amino to gogo-proto. And without go-amino's simplicity, we must write more lines of code than before. So, when we were aware of the performance and compatibility problems of go-amino, we decided to enhance it to solve the problems, instead of abandoning it. Thus we can avoid the efforts of code rewriting and continue to enjoy the simplicity of go-amino. Of cause you team have the rights to decide what's the best path for Cosmos-SDK. But since Cosmos-SDK wants to be the corner stone for many different blockchains, I think other blockchains' opinions are also important. We DO want to use our existing modules. So if Cosmos-SDK can be backward-compatibile and do not introduce breaking changes, we would be very thankful. And according to our practice, the rich set of features of gogo-proto are not needed for blockchain. And many of such features are golang-specific and harmful to cross-language interoperation. In a word, we think Cosmos-SDK is great platform and good enough now. Thank you very much for developing such a good SDK. Please make it stable (to a reasonable degree) and do not make breaking changes. |
Thanks for sharing @wangkui0508. First of all codon looks really interesting and I applaud what you guys have come up with. I share your concerns being a zone author building on top of the Cosmos SDK as well. I first want to address the backwards compatibility concerns. The approach we are currently taking would preserve an interface that is compatible with amino and would preserve essentially the same structs that currently exist in the SDK. These structs will, however, be enhanced with protobuf marshal methods via gogo protobuf. We have been working towards a path that allows this smoothly. With this approach, the hope is that amino or protobuf encoding could be chosen for each module. So I believe this would mean that you could still use codon even with the gogo protobuf code added. We have also been discussing maintaining the existing APIs that use amino JSON format for backwards compatibility. Would these address your immediate concerns? My hope long term is that an approach with Now regarding the difference in approach between using an amino compatible interface vs codon files, I think there are a few trade-offs. I do think your approach of generating protobuf marshal methods for amino types is very interesting and solves a lot of problems. Here are what I see as the advantages of a hand-written
Anyway, hopefully this helps address some of your concerns and also explains some of our thinking. |
Thank you very much for the explaination @aaronc It would be smooth to allow amino or protobuf encoding be chosen for each module. I do hope the interface of codec.Codec would not be changed. So many functions depend on it, such as I agree with you that codon and gogo-proto are effectively different approaches to the same encoding (proto3). So it is nature to allow module authors to choose the suitable solution for them. Simplicity or cross-chain awareness, different trade-offs. When the scheme allowing .proto files to reference each other are finally fixed, we will try to tweak codon to make it compatible. |
Thanks @aaronc and @wangkui0508 for sharing perspective here. We certainly are keeping backward compatibility and current applications built on the SDK in mind! That being said, don't worry @wangkui0508, the amino interface will not change. In fact, we are going for a more flexible solution in that modules and client logic will work by using an interface instead of a |
- ref #5444 Signed-off-by: Marko Baricevic <[email protected]>
added ibc and capability modules for reference |
Is there any experiments on
To my knowledge to other blockchains such as Eth, the major bottleneck is disk IO rather than CPU cost on encoding/decoding. My concern is that without experiments with concrete profiling numbers, the gain of using gogo-proto in practice is not clear. |
@qizhou you can take a look at #4977 for some reference. In the context of the SDK, the bottleneck was Amino, and then DB IO. Amino is an extreme performance hit we've seen 100x improvements in simulation speeds by switching to proto. After the migration to Proto, yes, then the bottleneck becomes network and DB IO. |
Thanks for the info. I take a look at the benchmark. First of all, this is really good work and helps to identify the real problem by replaying mainnet blocks. However, my main concern is that replaying the current mainnet blocks may not reflect the actual bottleneck when the network is heavily used such as ETH - the current mainnet blocks contain a very few transactions in each block, and thus I believe the major cost is from BeginBlock/EndBlock/CommitBlock rather than DeliverTx. So for the optimizations, my questions are:
Actually, we want to start some experiments on 1, but not sure if Cosmos team has done that and any results/tools that we could re-use (like the one in #4977). Thanks. |
Great questions, but unfortunately, I'm not aware of any true throughput measurements against the Gaia state machine. I would personally love to see this. I can, however, state that regardless of the ABCI method (e.g. DeliverTx vs BeginBlock), the bottleneck on the state-machine side will be Amino (e.g. getting all validators and decoding them). The low hanging fruit on the state-machine is to really improve encoding performance and then tweaking various design aspects of modules (e.g. x/slashing). |
Summary
Encoding in the SDK is currently handled by the go-amino wire codec. The implementation and use of this encoding protocol has introduced several major concerns over time, with the two biggest being performance and cross-client/language support.
The context and discussions had over amino are dense, so the above is solely meant to be a high-level primer.
Problem Definition
In order to facilitate easier integration for clients and drastically improve the usability of the SDK and its performance, alternative encoding methods need to be explored.
Proposal
As it stands, the proposals are as follows:
a. This is probably the path of least resistance but doesn't actually solve any of our problems directly and seems like a fragile attempt that will require handling corner-cases increasing the surface area for bugs and poor UX support.
a. This seems to be the most viable and promising approach. It will address both performance and client UX concerns.
a. Similar to proposal (2) but with zero-copy and canonicalization features. However, the concern is somewhat of "vendor-lockin" in terms of not having as rich of a multi-client support as pure proto. AFAIK, encoding uses fixed-width integers so this approach could lead to larger encoding size. Compaction is available, but doing so negates the zero-copy feature.
a. Too late for this most likely (atm). It would require heavy research and pretty compelling arguments to steer toward this direction. In addition, it would probably take drastically more time than any other proposal.
Accepted Proposal
Proposal (2) will be adopted. However, @jaekwon will continue to develop go-amino with the goal of it being fully wire-compatible with this proposal. The go-amino proposal is tracked in tendermint/go-amino#302.
What does this mean?
oneof
implementation where theoneof
is the only field member and the sum fields must start at 1.cosmos_sdk.{module}.v1
.oneof
to handle messages.Roadmap & Actionables
State
Client
TX CLI/REST
Tutorials
Documentation & Upgrade Guide
Client libs
@jordansexton @aaronc
For Admin Use
The text was updated successfully, but these errors were encountered: