Optimize encoding of composites #761

fxamacker · 2021-04-04T04:06:00Z

Closes #744
Updates #738

Description

Changes

Switch from CBOR map to CBOR array to improve speed and preserve ordering.
Remove unnecessary field sorting when decoding.
Remove old backwards-compatibility decoding code (decoding type ID).
Add tests, including round-trip for decoding old format and encoding new format.

Benchmark comparisons

⚠️ NOTE: "faster" refers to time delta in `benchstat`, so "50% faster" here means 2x speedup.

Encoding speed improved by 44.35% and decoding speed improved by 34.56% for a 776,155 byte value from production data that contains both composites and dictionaries. Memory use also improved. Encoded/decoded data includes Cadence Value types that have not yet been optimized.

Speed gains were only 15-27% for encoding and 16-24% for decoding for smaller production data that did not include dictionaries. However, these will gain more from optimizing remaining Value types (outside the scope of this PR).

Click to expand:

Big Value with Composites and Dictionaries 🚀🚀

Encoding Comparisons

name                                           old time/op    new time/op    delta
EncodeComposite/870comp_837dict_776155bytes-4    35.2ms ± 1%    19.6ms ± 1%  -44.35%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
EncodeComposite/870comp_837dict_776155bytes-4    7.50MB ± 5%    4.35MB ± 0%  -41.99%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
EncodeComposite/870comp_837dict_776155bytes-4      141k ± 0%       76k ± 0%  -46.02%  (p=0.000 n=10+10)

Decoding Comparisons

name                                           old time/op    new time/op    delta
DecodeComposite/870comp_837dict_776155bytes-4    40.3ms ± 1%    26.3ms ± 1%  -34.65%  (p=0.000 n=10+10)

name                                           old alloc/op   new alloc/op   delta
DecodeComposite/870comp_837dict_776155bytes-4    15.0MB ± 0%    10.5MB ± 0%  -29.67%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
DecodeComposite/870comp_837dict_776155bytes-4      265k ± 0%      216k ± 0%  -18.56%  (p=0.000 n=10+10)

Values with Composites and No Dictionaries 🚀

Encoding Comparisons

name                                           old time/op    new time/op    delta
EncodeComposite/1comp_0dict_139bytes-4           16.0µs ± 1%    13.6µs ± 1%  -15.27%  (p=0.000 n=10+10)
EncodeComposite/2comp_0dict_160bytes-4           20.9µs ± 1%    15.7µs ± 1%  -25.16%  (p=0.000 n=10+10)
EncodeComposite/104comp_0dict_14850bytes-4        910µs ± 1%     657µs ± 1%  -27.78%  (p=0.000 n=9+10)

name                                           old alloc/op   new alloc/op   delta
EncodeComposite/1comp_0dict_139bytes-4           3.79kB ± 1%    2.98kB ± 2%  -21.32%  (p=0.000 n=10+10)
EncodeComposite/2comp_0dict_160bytes-4           4.85kB ± 2%    3.36kB ± 0%  -30.74%  (p=0.000 n=9+10)
EncodeComposite/104comp_0dict_14850bytes-4        247kB ± 1%     188kB ± 0%  -24.16%  (p=0.000 n=8+10)

name                                           old allocs/op  new allocs/op  delta
EncodeComposite/1comp_0dict_139bytes-4             76.0 ± 0%      62.0 ± 0%  -18.42%  (p=0.000 n=10+9)
EncodeComposite/2comp_0dict_160bytes-4              103 ± 0%        73 ± 0%  -29.13%  (p=0.000 n=7+10)
EncodeComposite/104comp_0dict_14850bytes-4        5.03k ± 0%     3.45k ± 0%  -31.28%  (p=0.000 n=8+10)

Decoding Comparisons

name                                           old time/op    new time/op    delta
DecodeComposite/1comp_0dict_139bytes-4           11.6µs ± 1%     9.7µs ± 1%  -16.38%  (p=0.000 n=9+10)
DecodeComposite/2comp_0dict_160bytes-4           16.9µs ± 1%    12.7µs ± 1%  -24.47%  (p=0.000 n=10+9)
DecodeComposite/104comp_0dict_14850bytes-4        757µs ± 1%     569µs ± 1%  -24.90%  (p=0.000 n=9+10)

name                                           old alloc/op   new alloc/op   delta
DecodeComposite/1comp_0dict_139bytes-4           3.90kB ± 0%    3.35kB ± 0%  -13.96%  (p=0.000 n=10+10)
DecodeComposite/2comp_0dict_160bytes-4           5.54kB ± 0%    4.38kB ± 0%  -20.92%  (p=0.000 n=10+10)
DecodeComposite/104comp_0dict_14850bytes-4        296kB ± 0%     238kB ± 0%  -19.61%  (p=0.000 n=10+10)

name                                           old allocs/op  new allocs/op  delta
DecodeComposite/1comp_0dict_139bytes-4             56.0 ± 0%      53.0 ± 0%   -5.36%  (p=0.000 n=10+10)
DecodeComposite/2comp_0dict_160bytes-4             97.0 ± 0%      89.0 ± 0%   -8.25%  (p=0.000 n=10+10)
DecodeComposite/104comp_0dict_14850bytes-4        5.05k ± 0%     4.65k ± 0%   -7.98%  (p=0.000 n=10+10)

Benchmarks used Go 1.15.10 on linux_amd64 using production data.

Special thanks to @turbolent for the state decoding tool in #759 and @ramtinms for production data in jsonl!

For contributor use:

Targeted PR against feature/storage-optimizations branch
Linked to Github issue with discussion and accepted design OR link to spec that describes this work
Code follows the standards mentioned here
Updated relevant documentation
Re-reviewed Files changed in the Github PR explorer
Added appropriate labels

Switch from CBOR map to CBOR array to improve speed and preserve ordering. Remove unnecessary field sorting when decoding. Remove old backwards-compatibility decoding code (decoding type ID). Add tests, including round-trip for decoding old format and encoding new format. Closes onflow#744

turbolent

Nice! Just one comment regarding the encoding of fields

turbolent · 2021-04-05T20:05:38Z

runtime/interpreter/encode.go

+	fieldPairs := v.Fields.pairs
+	fieldLen := len(fieldPairs)
+
+	// Sort field names lexicographically.
+	fieldNames := make([]string, fieldLen)
+
+	index := 0
+	for name := range fieldPairs {
+		fieldNames[index] = name
+		index++
+	}

-	for pair := v.Fields.Oldest(); pair != nil; pair = pair.Next() {
-		fieldName := pair.Key
-		value := pair.Value
+	sort.Strings(fieldNames)
+
+	// Create fields (key and value) array, sorted by field name.
+	fields := make(cborArray, fieldLen*2)


I think we can just keep the existing iteration over the ordered map, because that already results in a deterministic encoding. Sorting the fields also works, but is not necessary I think

I tried the unsorted approach (it's faster and cleaner) but it causes this test to fail:

TestInterpretCompositeValueFieldEncodingOrder in interpreter_test.go.

I strongly prefer your suggestion and am hoping the test might be obsolete or unneeded.

This is the code block incorporating the faster and cleaner code you suggested:

fields := make(cborArray, v.Fields.Len()*2) i := 0 for pair := v.Fields.Oldest(); pair != nil; pair = pair.Next() { fieldName := pair.Key value := pair.Value valuePath := append(path[:], fieldName) prepared, err := e.prepare(value, valuePath, deferrals) if err != nil { return nil, err } fields[i], fields[i+1] = fieldName, prepared i += 2 }

I suspect TestInterpretCompositeValueFieldEncodingOrder tests if encoding composites with entries of different orderings is deterministic.

Test expects:

CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"a": 1, "b": 2, "c": 3}.

CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"b": 2, "c": 3, "a": 1}.

Without sorting, this implementation produces:

CBOR data ["a", 1, "b", 2, "c", 3] when encoding composite with entries {"a": 1, "b": 2, "c": 3}.

CBOR data ["b", 2, "c", 3, "a", 1] when encoding composite with entries {"b": 2, "c": 3, "a": 1}.

Any thoughts or suggestions?

This test checks that no matter what the order of initialization is, the encoding is always the same. That's not required though, we only have to ensure the encoding order is always deterministic, and the ordered map for the fields ensures that.

I've adjusted the test case in f2a268b to prepare the expected encodings for all permutations of initializing the fields

turbolent · 2021-04-05T20:06:17Z

runtime/interpreter/encode.go


 		valuePath := append(path[:], fieldName)

 		prepared, err := e.prepare(value, valuePath, deferrals)
 		if err != nil {
 			return nil, err
 		}
-		fields[fieldName] = prepared
+
+		fields[i*2], fields[i*2+1] = fieldName, prepared


turbolent · 2021-04-05T20:08:38Z

runtime/interpreter/decode.go

-	// Decode all fields in lexicographic order
-
-	sort.Strings(fieldNames)


It's great we don't need this anymore 👍

turbolent

I've switched sorting to iterating over the ordered map of fields in f2a268b.

I enabled CI for feature branches, including from forks, CI ran, and everything is green 👏

⛵

fxamacker requested a review from turbolent as a code owner April 4, 2021 04:06

fxamacker added the Improvement label Apr 4, 2021

This was referenced Apr 5, 2021

Optimize CBOR representation of data in Cadence storage layer #738

Closed

Create CBOR encoding and decoding benchmarks using production data #0

Closed

turbolent mentioned this pull request Apr 5, 2021

Benchmark CBOR encoding/decoding using production data #765

Closed

turbolent reviewed Apr 5, 2021

View reviewed changes

turbolent added 2 commits April 5, 2021 17:28

no need to sort, fields are already ordered

f2a268b

trigger CI

1b16430

turbolent approved these changes Apr 6, 2021

View reviewed changes

turbolent merged commit d302379 into onflow:feature/storage-optimizations Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize encoding of composites #761

Optimize encoding of composites #761

fxamacker commented Apr 4, 2021 •

edited

Loading

turbolent left a comment

turbolent Apr 5, 2021

fxamacker Apr 5, 2021 •

edited

Loading

turbolent Apr 6, 2021

turbolent Apr 5, 2021

turbolent Apr 5, 2021

turbolent left a comment

		// Decode all fields in lexicographic order

		sort.Strings(fieldNames)

Optimize encoding of composites #761

Optimize encoding of composites #761

Conversation

fxamacker commented Apr 4, 2021 • edited Loading

Description

Changes

Benchmark comparisons

⚠️ NOTE: "faster" refers to time delta in benchstat, so "50% faster" here means 2x speedup.

Encoding Comparisons

Decoding Comparisons

Encoding Comparisons

Decoding Comparisons

turbolent left a comment

Choose a reason for hiding this comment

turbolent Apr 5, 2021

Choose a reason for hiding this comment

fxamacker Apr 5, 2021 • edited Loading

Choose a reason for hiding this comment

turbolent Apr 6, 2021

Choose a reason for hiding this comment

turbolent Apr 5, 2021

Choose a reason for hiding this comment

turbolent Apr 5, 2021

Choose a reason for hiding this comment

turbolent left a comment

Choose a reason for hiding this comment

fxamacker commented Apr 4, 2021 •

edited

Loading

⚠️ NOTE: "faster" refers to time delta in `benchstat`, so "50% faster" here means 2x speedup.

fxamacker Apr 5, 2021 •

edited

Loading