-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPIC: determinism with maps #13039
Comments
I would strongly advocate for a generic ordered map so we don't suffer the performance hit of sorting maps all over the place. We just need to make sure we choose an implementation which is well tested - might make sense to bring something into our source tree even. |
I'd personally prefer to not have an added object layer. It adds complexity and makes things less portable. In my opinion, a better solution would be a linter that complains/fails when |
The big downside is you always have to do sorting at runtime and pay that performance cost. Performance has been a big issue for a while and if we want a sorted data structure, we should just use one. |
If you've got a map that is primarily used in loops, then I agree, a sorted data structure is called for. But if you only ever loop on it once or in sporadic instances (where its primary purpose is lookup), just sort the keys when you want to loop. It'd be annoying if all maps were changed to a non-native data structure just in the off chance that they're needed in a loop. |
uhh, I promise from live benchmarks, that sorting in application data OR map insertions is not showing up in any performance problems right now. I really think that its misguided to focus on performance here The only place that sorting shows as taking forever is in the CacheKV store, which is a different problem where the entire design needs to be fixed. I don;t really care about the decision of either a generics based BTree, or a Hashmap wrapper that sorts for iteration / has deterministic serialization (cref #12816 ). If the design is done right, this should really not be state machine breaking to change later, as they can have the same serialization/deserialization from proto. (Up to gas parameters, which we can just make the same) |
It should fail. It already complains, but this apparently isn't loud enough.
This is the best design to me as it's barely a design at all. If we see performance problems with map iteration outside of store/cachekv we can revisit.
A recent performance fix #12885 improved cache sorting significantly, but from what I've seen the current cache impl could be transitioned to a single BTree backed map instead of separating sorted/unsorted. This is smaller scope than the entire rewrite talked about in #12986. What do you think?
Serialization seems orthogonal to the problem at hand, am I missing something? If we always sort keys prior to iterating the form of the serialized data is irrelevant, provided we're OK with the runtime sorting performance hit. |
From what I'm gathering and my two cents, it sounds like using generics over ranges is the most straightforward approach. BTrees being a second alternative, but might be overkill for most situations. |
The Go
https://go.dev/ref/spec#RangeClause Code that needs deterministic iteration order of key-value associative containers can't use the If you want a map with ordered keys, you have basically two options. As suggested, you can use a separate type (like a BTree-based container) locally, and perform conversions to and from Go maps at API boundaries. But this is arduous, error-prone, and the impact of allocations on the GC can easily become significant. The better option is almost always to use the regular func print(m map[string]int) {
keys := make([]string, 0, len(m))
for k := range m { keys = append(keys, k) }
sort.Slice(keys, func(i, j int) bool { return keys[i] < keys[j] })
for _, k := range keys {
v := m[k]
// ...
}
} Generics can make this easier, for sure.
@ValarDragon is correct in observing that the performance cost of this O(n) range clause is almost certainly statistically zero relative to other performance inefficiencies in Cosmos, and not something worth designing around. |
Anther potential source of problem is when we return a map in a query. This must be prohibited as well. |
We don't need to prohibit that per se, since we're going with the annotation approach -- queries that are deemed safe for state machine use, will be annotated as such. |
I would prefer to make such prohibition for the sake of completeness. |
That's just not practical. It's not going to happen. |
So myself and team at Orijtech spent time and we did implement in 2022 a gosec static analyzer https://github.com/cosmos/gosec that would run on all cosmos-sdk code and it helped identify a bunch of map iterations which we sent in various PRs like #13554 but that got closed without merged. However the pass was later deemed to be noisy and disabled per cosmos/gosec@bf28a33 It could be re-enabled if folks see fit or more work done on the pass, but the tools do exist and as we saw it might have been nice to discuss but in reality doesn't seem like a priority that was cared about. Should we be closing this issue otherwise? |
Summary
As mentioned in CosmWasm/wasmd#941 non-determinism in maps is a concern for users of cosmwasm and others. There are places the Cosmos-SDK uses maps in many places we should aim to replace this with a deterministic sort with maps or something like a btree.
Problem Definition
Maps are non-deterministic if handled incorrectly. Cosmos-sdk has a few places where maps could end in non-determinism
Proposal
Replace maps with a deterministic ordering data structure or sort for maps where needed.
cc @ValarDragon @ethanfrey
The text was updated successfully, but these errors were encountered: