Shrank EncodedLevel to speed up step_in/step_out. #113

zslayton · 2020-12-28T19:04:22Z

Description of changes:

The EncodedLevel struct is just large enough that variations of it (like Option<EncodedLevel> and Result<EncodedLevel, _>) can cross LLVM's threshold for using memcpy to move values around. This shows up in profiling as substantial overhead in the step_in and step_out functions, which move instances of EncodedLevel to and from a Vec of container levels.

In a particularly deeply nested test file (a single struct with ~800 levels of nested child structs), this caused the reader to be painfully slow. A 15MB file took ~230ms to read through with next()/step_in()/step_out(), loading each scalar value encountered.

This PR shrinks the EncodedLevel struct by:

Replacing a number of usize offsets (8 bytes apiece on x86_64) with u8 lengths from which offsets can be calculated if necessary.
Replacing the Vec of annotations on each EncodedLevel with a common Vec that lives on CursorState. Each EncodedLevel now tracks the number of annotations it has pushed onto that communal Vec, allowing the reader to use a single Vec/allocation across the entire stream. This dropped the size of EncodedLevel by a further 23 bytes.

Performance test

15MB binary Ion test file containing a single struct with 773 levels of nested values.

Before: 230ms
After: 125ms (-45.65%)

Memory layout

Before

print-type-size type: `binary::cursor::EncodedValue`: 120 bytes, alignment: 8 bytes
print-type-size     field `.index_at_depth`: 8 bytes
print-type-size     field `.field_id`: 16 bytes
print-type-size     field `.annotations`: 24 bytes
print-type-size     field `.parent_index`: 16 bytes
print-type-size     field `.field_id_offset`: 8 bytes
print-type-size     field `.annotations_offset`: 8 bytes
print-type-size     field `.header_offset`: 8 bytes
print-type-size     field `.value_offset`: 8 bytes
print-type-size     field `.value_length`: 8 bytes
print-type-size     field `.value_end`: 8 bytes
print-type-size     field `.ion_type`: 1 bytes
print-type-size     field `.header`: 3 bytes
print-type-size     field `.is_null`: 1 bytes
print-type-size     end padding: 3 bytes
print-type-size type:

Note that depending on layout/alignment a size of 120 bytes means that, Option<EncodedValue> and Result<EncodedValue, _> can take 128 bytes even though they only add a single discriminator byte.

After

print-type-size type: `binary::cursor::EncodedValue`: 56 bytes, alignment: 8 bytes
print-type-size     field `.index_at_depth`: 8 bytes
print-type-size     field `.field_id`: 16 bytes
print-type-size     field `.header_offset`: 8 bytes
print-type-size     field `.value_length`: 8 bytes
print-type-size     field `.ion_type`: 1 bytes
print-type-size     field `.header`: 3 bytes
print-type-size     field `.is_null`: 1 bytes
print-type-size     field `.number_of_annotations`: 1 bytes
print-type-size     field `.field_id_length`: 1 bytes
print-type-size     field `.annotations_length`: 1 bytes
print-type-size     field `.header_length`: 1 bytes
print-type-size     end padding: 7 bytes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Shrank EncodedLevel to speed up step_in/step_out.

499cb86

zslayton requested a review from therapon December 28, 2020 19:04

therapon approved these changes Jan 8, 2021

View reviewed changes

zslayton merged commit 9c2c742 into master Jan 8, 2021

zslayton deleted the optimize-step-in-out branch January 8, 2021 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shrank EncodedLevel to speed up step_in/step_out. #113

Shrank EncodedLevel to speed up step_in/step_out. #113

zslayton commented Dec 28, 2020

Shrank EncodedLevel to speed up step_in/step_out. #113

Shrank EncodedLevel to speed up step_in/step_out. #113

Conversation

zslayton commented Dec 28, 2020

Performance test

Memory layout

Before

After