Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize decoding #34

Merged
merged 5 commits into from
Aug 6, 2020
Merged

Conversation

Stebalien
Copy link
Collaborator

This optimizes:

  1. readByte: It turns out that frequently casting an interface to an interface is really slow, but casting an interface to a concrete type is fast.
  2. Deferred: This avoids most of the unnecessary allocations and copies in Deferred.UnmarshalCBOR, and switches to an iterative (rather than recursive) implementation.
  3. ScanForLinks: As with Deferred, this is now non-recursive.

Note: depth limits have been removed from Deferred as the implementation is no longer recursive. ScanForLinks never appeared to have depth limits.

Buffering could lead to reading over the end of the object, corrupting the next object.

This patch also gets rid of "PeekByte" and uses the standard ReadByte/UnreadByte
interfaces. That way, we can avoid wrapping the byte reader in the happy path,
saving some overhead.
Type asserting to a concrete type is ~10x faster than type asserting to an
interface. This change has a significant performance impact in my test. readByte
used to account for 10.8% of the time, now it accounts for 3.4%.
We use this quite frequently so it should be fast.

Note: this removes the depth restriction because the algorithm is no longer recursive.
This way, we can't blow out our stack.
@Stebalien
Copy link
Collaborator Author

Based on #33.

Copy link
Owner

@whyrusleeping whyrusleeping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty tasty.

@Stebalien
Copy link
Collaborator Author

Benchmarks:

name            old time/op    new time/op    delta
Marshaling-4       878ns ± 2%     874ns ± 2%     ~     (p=0.579 n=5+5)
Unmarshaling-4    4.96µs ± 3%    4.40µs ± 2%  -11.26%  (p=0.008 n=5+5)
LinkScan-4        4.74µs ± 1%    4.51µs ± 1%   -4.94%  (p=0.008 n=5+5)
Deferred-4        12.2µs ± 0%     3.5µs ± 1%  -71.09%  (p=0.016 n=4+5)

name            old alloc/op   new alloc/op   delta
Marshaling-4        160B ± 0%      160B ± 0%     ~     (all equal)
Unmarshaling-4    3.48kB ± 0%    3.46kB ± 0%   -0.69%  (p=0.008 n=5+5)
LinkScan-4          880B ± 0%      880B ± 0%     ~     (all equal)
Deferred-4        17.3kB ± 0%     0.1kB ± 0%  -99.45%  (p=0.008 n=5+5)

name            old allocs/op  new allocs/op  delta
Marshaling-4        10.0 ± 0%      10.0 ± 0%     ~     (all equal)
Unmarshaling-4      47.0 ± 0%      43.0 ± 0%   -8.51%  (p=0.008 n=5+5)
LinkScan-4          25.0 ± 0%      25.0 ± 0%     ~     (all equal)
Deferred-4           222 ± 0%         3 ± 0%  -98.65%  (p=0.008 n=5+5)

@whyrusleeping whyrusleeping merged commit 63aa96c into whyrusleeping:master Aug 6, 2020
@Stebalien Stebalien deleted the steb/optimize branch August 7, 2020 01:15
@aschmahmann aschmahmann mentioned this pull request May 14, 2021
71 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants