-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize CBOR representation of data in Cadence storage layer #738
Comments
I'd like to provide a PR for this unless there are any concerns or objections. |
This looks great, thank you for writing this up Faye, and thanks for the great call discussing it. We agreed to:
Here are the next steps:
|
I updated this issue's text and benchmark comparisons to include more recent info. |
For reference, dapperlabs/flow-go#5422 will implement the state migration function |
@turbolent @fxamacker; is this epic good to close? |
@studio1vc yes, all issues are done now 🎉 |
Issue To Be Solved
Cadence storage layer's use of CBOR can be optimized to improve speed, memory use, and stored data size.
Suggestion
Performance gains are possible by tweaking the CBOR representation of Cadence
Value
types. For example, replacing CBOR map with CBOR array when feasible.In addition to performance gains, replacing CBOR map with CBOR array preserves order (e.g.
DictionaryValue.Entries
).UPDATE: Additional performance gains (especially for decoding) are possible by removing backwards compatible decoding (thanks @turbolent) and not saving key strings twice (thanks @SupunS).
No changes to fxamacker/cbor are required to support these optimizations.
The current CBOR representation in Cadence is very nice — so this optimization is kind of like taking a nicely normalized RDBMS schema and denormalizing some of it to gain speed.
Performance Comparisons
A preliminary comparison (from PR #752) with non-production data showed potential performance improvements. Although faster execution speed was the primary objective, memory use and stored data size also improved.
Comparisons using production data (in PR #761) shows performance gains are cumulative. On a very large value (776,155 bytes), optimization of composite + dictionary types showed:
benchstat
, so "50% faster" here means 2x speedup.Additional performance gains are possible by optimizing remaining
Value
types. Smaller data will probably begin to catch up in percent improved as more remaining types are optimized. Larger data with big composites and dictionaries will show more modest gains with remaining types.Cumulative performance gains from PR #752, PR #761, PR #778, and PR #788 confirmed expectations about small data catching up in performance gains.
For a 167-byte LinkValue (extracted from a mainnet snapshot):
For a 776155-byte CompositeValue (extracted from a mainnet snapshot):
Additional performance gains are possible by:
Click to expand:
Preliminary Comparisons (non-production dictionary data)
Stored CBOR Data Size Comparisons (for dictionaries, non-production data)
Size reduction is data dependent. Comparisons using production data is recommended.
Encoding Speed & Memory Comparisons (for dictionaries, non-production data)
Decoding Speed & Memory Comparisons (for dictionaries, non-production data)
Benchmarks used linux_amd64 with Go 1.15.10. See Caveats.
Performance of encoding and decoding is data dependent. Comparisons using production data is recommended.
Comparisons using production data (cumulative gains from from PR #752, PR #761, PR #778, and PR #788):
LinkValue 🚀
Encoding Comparisons
Decoding Comparisons
Benchmarks used Go 1.15.10 on linux_amd64 using production data. See Caveats.
CompositeValue without dictionaries 🚀
Encoding Comparisons
Decoding Comparisons
Benchmarks used Go 1.15.10 on linux_amd64 using production data. See Caveats.
Large CompositeValue with dictionaries and etc. 🚀
Encoding Comparisons
Decoding Comparisons
Benchmarks used Go 1.15.10 on linux_amd64 using production data. See Caveats.
Proposed Changes
Encoding and decoding arrays are faster than maps. So the proposed change replaces CBOR maps with CBOR arrays wherever feasible.
As mentioned earlier, this change has the added benefit of preserving the order of
DictionaryValue.Entries
.The changes are very simple and the minimally modified tests pass. Modified files are limited to encoding.go, encoding_test.go, and decoding.go.
For example, changes to
func (e *Encoder) prepareDictionaryValue
might include:declaring
entries
as an array here:cadence/runtime/interpreter/encode.go
Line 686 in 3bbc8c8
replacing
Content: cborMap{
withContent: cborArray{
here:cadence/runtime/interpreter/encode.go
Lines 762 to 768 in 3bbc8c8
As mentioned earlier, additional speed gains (especially for decoding) are possible by removing backwards compatible decoding (thanks @turbolent) and not saving key strings twice (thanks @SupunS).
Caveats
Changing the CBOR representation of data is a breaking change -- hopefully, it's just an internal implementation detail within the storage layer. Guidance from @turbolent would be appreciated to make sure I didn't miss anything.
I don't know how closely the speed gains will translate to other platforms (other CPU archs, compiled to wasm, and etc.)
Having this change to CBOR representation completed before I proceed with API changes to my CBOR library can save time.
Performance comparisons are data dependent. Comparisons should use realistic Cadence data (most common mix of data types, values, & sizes encountered in production).
Updates (not exhaustive list)
April 9, 2021 at 2:30 PM PT - Include cumulative benchmark comparisons to master (commit 7c7dd55) with optimizations from PR #752, PR #761, PR #778, and PR #788 using production data (values extracted from mainnet snapshot).
April 5, 2021 at 6:03 AM PT - Include benchmark comparisons using production data. Remove section "Data Used for Benchmark Comparisons" because it described non-production data.
The text was updated successfully, but these errors were encountered: