Static layout analysis hooks for near-zero-cost serialization/deserialization #24

pavel-kirienko · 2019-04-24T12:58:25Z

In Libuavcan, there is a huge chunk of heavily templated code responsible for data serialization, which upon monomorphization resolves into a series of invocations of low-level bit-level data copy routine: https://github.com/UAVCAN/libuavcan/blob/fd8ba19bc9c09c05a1ab60289b3e7158810e9bd0/libuavcan/src/marshal/uc_bit_array_copy.cpp#L12-L58. In Libcanard, there are basic functions canardEncodePrimitive(..) and canardDecodePrimitive(..) which serve a similar purpose, except that they lack any type safety because C.

Bit-level copying is very slow. I don't have exact numbers at hand, but I'm considering to obtain them later once our new UAVCAN v1.0 implementations are available. Bit-level serialization is slow, but it is also generic, meaning that it is applicable always regardless of data alignment; yet, if one were to cast a very careful look at our standard definitions (and most of the known third-party definitions, e.g., ArduPilot, OlliW), they would see that majority of fields are always at least byte-aligned, meaning that slow bit-level copying could be avoided if we could determine such always-aligned fields statically.

Earlier we invented the concept of compile-time alignment checks, which was discussed at length here: https://forum.uavcan.org/t/new-dsdl-directive-for-simple-compile-time-checks/190. This same concept can be extended to facilitate static bit layout analysis in order to allow code generators to emit the fastest possible serialization code, resorting to slower methods only when faster ones cannot be proven to be safe. One can define the following arbitrary categories of serialization approaches:

The fastest, zero-cost serialization: direct memcpy() applied to the whole data structure. This is applicable when the alignment, size, and byte order requirements of the native platform match those of UAVCAN. As you will see later, these properties are discoverable and checkable at the code generation time, provided that we can make reliable assumptions about the target platform (such as byte order and type alignment requirement, which is usually sizeof(type); violation of such assumptions can be detected later at compile time so they are safe).
Field-level zero-cost serialization: direct memcpy() applied to a given field. E.g., if we have a uint64 field in a data structure, the byte order of the platform is little-endian, and the field is always aligned at the byte boundary, we can directly memcpy the field into the final buffer, again avoiding bit-level copying.
Last resort: bit-level serialization. This one is chosen when the code generator cannot determine alignment statically.

The proof of alignment is obtained by PyDSDL by manipulating bit length sets of serialized representations of data types. A code generator can request PyDSDL to determine if there is such serialized representation of a composite or array type which would NOT meet a specified alignment goal (say, a byte (8 bit), or 64-bit if we're serializing a field of uint64), and then use the answer to choose the appropriate (fastest safe) serialization strategy at code generation time (there are some known performance issues: #23).

This new logic is exposed to the user via the following new API entries:

class BitLengthSet and its overloaded + operator.
CompositeType.iterate_fields_with_offsets(base_offset: BitLengthSet = None) -> Iterator[Tuple[Field, BitLengthSet]]
FixedLengthArrayType.enumerate_elements_with_offsets(base_offset: BitLengthSet = None) -> Iterator[Tuple[int, BitLengthSet]]

Early selection of the serialization strategy has an important implication on the serialization of nested data structures. A data structure can be nested into another one at an arbitrary alignment, which would defeat the purpose of static layout analysis since the code generator wouldn't be able to make any assumptions about the base offset. Additionally, the misaligned origin of a data structure does not necessarily imply that every following field of it will be misaligned as well. Consider the following example:

@union
uint8 a
uint8 b

Due to the one-bit union tag preceding the actual value, both a and b are misaligned. Now, imagine that the above union is nested into another structure as follows:

void7
U.1.0 the_union

The union is padded so that the one-bit tag brings the following data fields into alignment. The example demonstrates that in order to take full advantage of the layout analysis, a code generator must model nested object hierarchies holistically rather than atomically. The per-type generated serialization functions/methods may have some basic alignment requirement chosen by the author of the code generator (for example, they may require the serialization buffer to be always byte-aligned, or require that its alignment matches the largest alignment requirement of any nested type); the serialization code of an outer (containing) type would then determine statically whether the alignment requirement of a serialization function is met. If the alignment requirement is not met, the code generator would emit serialization code in-place, as if the definition of the included type were copy-pasted into the outer type, instead of invoking its serialization function.

You can see field iterators in action in this carefully crafted unit test: https://github.com/UAVCAN/pydsdl/blob/f998ad6f744b853d9b97e240ab0302df27ddd598/pydsdl/_serializable.py#L1284-L1383

...also in this Jinja2 code generation template for PyUAVCAN:

{%- macro _serialize_variable_length_array(t, ref, offset) -%}
    # Length field byte-aligned: {{ offset.is_aligned_at_byte() }}; {# -#}
     first element byte-aligned: {{ (offset + t.length_field_type.bit_length).is_aligned_at_byte() }}; {# -#}
      all elements byte-aligned: {{ (offset + t.bit_length_set).is_aligned_at_byte() }}.
    assert len({{ ref }}) <= {{ t.capacity }}, '{{ ref }}: {{ t }}'
    {{ _serialize_integer(t.length_field_type, 'len(%s)'|format(ref), offset) }}
{%- if t.element_type is BooleanType %}
    _ser_.add_{{ (offset + t.length_field_type.bit_length)|alignment_prefix }}_array_of_bits({{ ref }})
{%- elif t.element_type is PrimitiveType and t.element_type.standard_bit_length %}
    _ser_.add_{{ (offset + t.length_field_type.bit_length)|alignment_prefix -}}
          _array_of_standard_bit_length_primitives({{ ref }})
{%- else %}
    for _element_ in {{ ref }}:
        {{ _serialize_any(t.element_type, '_element_', offset + t.bit_length_set)|indent }}
{%- endif %}
{%- endmacro -%}

Where alignment_prefix is defined as: 'aligned' if offset.is_aligned_at_byte() else 'unaligned'.

In the following example (sourced from PyUAVCAN as well) look at the case of CompositeType, where we select whether the current item matches the alignment requirement of the serialization method of t (which is _serialize_aligned_(..)). If there is a match, we invoke the method; otherwise, we emit serialization code in-place.

{%- macro _serialize_any(t, ref, offset) -%}
    {%- if offset.is_aligned_at_byte() -%}
    assert _ser_.current_bit_length % 8 == 0, '{{ ref }}: {{ t }}'
    {% endif -%}
    {%- if t is VoidType -%}                    _ser_.skip_bits({{ t.bit_length }})
    {%- elif t is BooleanType -%}               _ser_.add_unaligned_bit({{ ref }})
    {%- elif t is IntegerType -%}               {{ _serialize_integer(t, ref, offset) }}
    {%- elif t is FloatType -%}                 {{ _serialize_float(t, ref, offset) }}
    {#- Despite the apparent similarities, fixed and variable arrays are very different when it comes to serialization,
     #- mostly because of the different logic of offset computation. -#}
    {%- elif t is FixedLengthArrayType -%}      {{ _serialize_fixed_length_array(t, ref, offset) }}
    {%- elif t is VariableLengthArrayType -%}   {{ _serialize_variable_length_array(t, ref, offset) }}
    {%- elif t is CompositeType -%}
        {%- if offset.is_aligned_at_byte() -%}
    {{ ref }}._serialize_aligned_(_ser_)  # Delegating because this object is always byte-aligned.
        {%- else -%}
    # Object {{ ref }} is not always byte-aligned, serializing in-place.
    {{ _serialize_composite(t, ref, offset) }}
        {%- endif -%}
    {%- else -%}{%- assert False -%}
    {%- endif -%}
{%- endmacro -%}

One can see more examples in my PyUAVCAN repo (which is still a WIP of course): https://github.com/pavel-kirienko/pyuavcan/blob/uavcan-v1.0/pyuavcan/dsdl/_templates/serialization.j2, also https://github.com/pavel-kirienko/pyuavcan/blob/6f9234ab918beefce56ef3f58920773df29079e5/pyuavcan/dsdl/_serialized_representation/_serializer.py#L56-L187

I expect that this feature will allow us to greatly simplify implementations. Particularly, libuavcan may no longer need the rather complex primitive marshaling templates, since generated serialization methods can operate on raw byte pointers now.

…atch the specification

… seem to be shaping up the way I want them to.

…ported

…arate attributes compute_bit_length_set() and bit_length_range

…for implicit fields (variable-length arrays and unions)

coveralls · 2019-04-24T13:01:28Z

Pull Request Test Coverage Report for Build 213

276 of 276 (100.0%) changed or added relevant lines in 8 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 194:	0.0%
Covered Lines:	2652
Relevant Lines:	2652

💛 - Coveralls

coveralls · 2019-04-24T13:01:28Z

Pull Request Test Coverage Report for Build 213

276 of 276 (100.0%) changed or added relevant lines in 8 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 194:	0.0%
Covered Lines:	2652
Relevant Lines:	2652

💛 - Coveralls

coveralls · 2019-04-24T13:01:28Z

Pull Request Test Coverage Report for Build 223

287 of 287 (100.0%) changed or added relevant lines in 8 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 100.0%

Totals
Change from base Build 194:	0.0%
Covered Lines:	2657
Relevant Lines:	2657

💛 - Coveralls

…t 63

…ableType

pavel-kirienko added 27 commits April 19, 2019 16:19

Version bump

3e25ea8

Renamed compute_bit_length_values() --> compute_bit_length_set() to m…

fa5ff87

…atch the specification

Updated the project description for PyPI

61b7b2e

A new class - BitLengthSet - for bit length and offset representation

0598066

Bit length set computation methods moved into the BitLengthSet class

e6ea326

More structured bit length computation

dbaa288

Renamed property

97e9ea8

More detailed logging

5d24c56

More logging

54ae178

Field offset iterator WIP; more tests are needed but generally things…

c9e5efc

… seem to be shaping up the way I want them to.

Finished the field iterator unit tests

6bc4dd3

Grammar

ab34591

Improved logging

1008837

field iterator docs

8c645f2

Had to change the print handler type because DSDLDefinition is not ex…

d887504

…ported

API simplification: single property bit_length_set instead of two sep…

c34e6ed

…arate attributes compute_bit_length_set() and bit_length_range

Bit length set extracted into a separate file

8803fa0

Better method names for bit length set

5716721

Significant acceleration of parsing by caching of bit_length_set value

704911f

Fixed mypy issues & docs

bc23c61

Overloaded operators and doctests for BitLengthSet

7278699

README style update

6a3c097

Updating the demo.py script

8971d06

Extended demo.py with docs

3c36a50

Allowing one-bit-long unsigned integers, automatic type construction …

0df3beb

…for implicit fields (variable-length arrays and unions)

enumerate_elements_with_offsets()

14f7412

Using exact floating point range constants

f998ad6

pavel-kirienko added the class-feature label Apr 24, 2019

pavel-kirienko requested a review from thirtytwobits April 24, 2019 12:58

pavel-kirienko self-assigned this Apr 24, 2019

Removed an unneccessary coverage annotation and fixed a docstring

f4374e4

pavel-kirienko added 2 commits April 26, 2019 13:52

Fixed spec conformance: the max type name length is 50 characters, no…

17d48ce

…t 63

Comments on the importance and expectations of __str__() for Serializ…

17f7835

…ableType

pavel-kirienko mentioned this pull request Apr 26, 2019

Chapter 3 - DSDL specification OpenCyphal/specification#46

Merged

pavel-kirienko added 2 commits April 26, 2019 15:35

Fixed issues discovered by Codacy

bc0baf1

Prevent scope leakage at __init__.py

5e5e8b0

pavel-kirienko merged commit 9cef74c into master Apr 28, 2019

pavel-kirienko deleted the static-layout-analysis-hooks branch April 28, 2019 15:45

pavel-kirienko mentioned this pull request Jun 4, 2019

Build Nunavut templates for v1 types. OpenCyphal-Garage/libcyphal#234

Closed

pavel-kirienko mentioned this pull request Jul 9, 2020

Add C serialization support OpenCyphal/nunavut#115

Closed

skeetsaz mentioned this pull request Jul 14, 2023

Implement zero-cost, memcpy style serialization for C++ OpenCyphal/nunavut#315

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Static layout analysis hooks for near-zero-cost serialization/deserialization #24

Static layout analysis hooks for near-zero-cost serialization/deserialization #24

pavel-kirienko commented Apr 24, 2019 •

edited

Loading

coveralls commented Apr 24, 2019

coveralls commented Apr 24, 2019

coveralls commented Apr 24, 2019 •

edited

Loading

Static layout analysis hooks for near-zero-cost serialization/deserialization #24

Static layout analysis hooks for near-zero-cost serialization/deserialization #24

Conversation

pavel-kirienko commented Apr 24, 2019 • edited Loading

coveralls commented Apr 24, 2019

Pull Request Test Coverage Report for Build 213

💛 - Coveralls

coveralls commented Apr 24, 2019

Pull Request Test Coverage Report for Build 213

💛 - Coveralls

coveralls commented Apr 24, 2019 • edited Loading

Pull Request Test Coverage Report for Build 223

💛 - Coveralls

pavel-kirienko commented Apr 24, 2019 •

edited

Loading

coveralls commented Apr 24, 2019 •

edited

Loading