Optimize decoding #34

Stebalien · 2020-08-05T05:08:03Z

This optimizes:

readByte: It turns out that frequently casting an interface to an interface is really slow, but casting an interface to a concrete type is fast.
Deferred: This avoids most of the unnecessary allocations and copies in Deferred.UnmarshalCBOR, and switches to an iterative (rather than recursive) implementation.
ScanForLinks: As with Deferred, this is now non-recursive.

Note: depth limits have been removed from Deferred as the implementation is no longer recursive. ScanForLinks never appeared to have depth limits.

Buffering could lead to reading over the end of the object, corrupting the next object. This patch also gets rid of "PeekByte" and uses the standard ReadByte/UnreadByte interfaces. That way, we can avoid wrapping the byte reader in the happy path, saving some overhead.

Type asserting to a concrete type is ~10x faster than type asserting to an interface. This change has a significant performance impact in my test. readByte used to account for 10.8% of the time, now it accounts for 3.4%.

We use this quite frequently so it should be fast. Note: this removes the depth restriction because the algorithm is no longer recursive.

This way, we can't blow out our stack.

Stebalien · 2020-08-05T05:09:24Z

Based on #33.

whyrusleeping

Looks pretty tasty.

Stebalien · 2020-08-06T01:29:06Z

Benchmarks:

name            old time/op    new time/op    delta
Marshaling-4       878ns ± 2%     874ns ± 2%     ~     (p=0.579 n=5+5)
Unmarshaling-4    4.96µs ± 3%    4.40µs ± 2%  -11.26%  (p=0.008 n=5+5)
LinkScan-4        4.74µs ± 1%    4.51µs ± 1%   -4.94%  (p=0.008 n=5+5)
Deferred-4        12.2µs ± 0%     3.5µs ± 1%  -71.09%  (p=0.016 n=4+5)

name            old alloc/op   new alloc/op   delta
Marshaling-4        160B ± 0%      160B ± 0%     ~     (all equal)
Unmarshaling-4    3.48kB ± 0%    3.46kB ± 0%   -0.69%  (p=0.008 n=5+5)
LinkScan-4          880B ± 0%      880B ± 0%     ~     (all equal)
Deferred-4        17.3kB ± 0%     0.1kB ± 0%  -99.45%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
Marshaling-4        10.0 ± 0%      10.0 ± 0%     ~     (all equal)
Unmarshaling-4      47.0 ± 0%      43.0 ± 0%   -8.51%  (p=0.008 n=5+5)
LinkScan-4          25.0 ± 0%      25.0 ± 0%     ~     (all equal)
Deferred-4           222 ± 0%         3 ± 0%  -98.65%  (p=0.008 n=5+5)

Stebalien added 4 commits August 4, 2020 21:55

Optimize readByte

3c783b9

Type asserting to a concrete type is ~10x faster than type asserting to an interface. This change has a significant performance impact in my test. readByte used to account for 10.8% of the time, now it accounts for 3.4%.

Optimize Deferred

cdf4113

We use this quite frequently so it should be fast. Note: this removes the depth restriction because the algorithm is no longer recursive.

Make ScanForLinks non-recursive

f6390fe

This way, we can't blow out our stack.

whyrusleeping approved these changes Aug 6, 2020

View reviewed changes

Fix benchmarks

e6c8c84

whyrusleeping merged commit 63aa96c into whyrusleeping:master Aug 6, 2020

Stebalien deleted the steb/optimize branch August 7, 2020 01:15

aschmahmann mentioned this pull request May 14, 2021

Release v0.9.0 ipfs/kubo#8058

Closed

71 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize decoding #34

Optimize decoding #34

Stebalien commented Aug 5, 2020

Stebalien commented Aug 5, 2020

whyrusleeping left a comment

Stebalien commented Aug 6, 2020

Optimize decoding #34

Optimize decoding #34

Conversation

Stebalien commented Aug 5, 2020

Stebalien commented Aug 5, 2020

whyrusleeping left a comment

Choose a reason for hiding this comment

Stebalien commented Aug 6, 2020