Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Static layout analysis hooks for near-zero-cost serialization/deserialization #24

Merged
merged 32 commits into from
Apr 28, 2019

Conversation

pavel-kirienko
Copy link
Member

@pavel-kirienko pavel-kirienko commented Apr 24, 2019

In Libuavcan, there is a huge chunk of heavily templated code responsible for data serialization, which upon monomorphization resolves into a series of invocations of low-level bit-level data copy routine: https://github.com/UAVCAN/libuavcan/blob/fd8ba19bc9c09c05a1ab60289b3e7158810e9bd0/libuavcan/src/marshal/uc_bit_array_copy.cpp#L12-L58. In Libcanard, there are basic functions canardEncodePrimitive(..) and canardDecodePrimitive(..) which serve a similar purpose, except that they lack any type safety because C.

Bit-level copying is very slow. I don't have exact numbers at hand, but I'm considering to obtain them later once our new UAVCAN v1.0 implementations are available. Bit-level serialization is slow, but it is also generic, meaning that it is applicable always regardless of data alignment; yet, if one were to cast a very careful look at our standard definitions (and most of the known third-party definitions, e.g., ArduPilot, OlliW), they would see that majority of fields are always at least byte-aligned, meaning that slow bit-level copying could be avoided if we could determine such always-aligned fields statically.

Earlier we invented the concept of compile-time alignment checks, which was discussed at length here: https://forum.uavcan.org/t/new-dsdl-directive-for-simple-compile-time-checks/190. This same concept can be extended to facilitate static bit layout analysis in order to allow code generators to emit the fastest possible serialization code, resorting to slower methods only when faster ones cannot be proven to be safe. One can define the following arbitrary categories of serialization approaches:

  • The fastest, zero-cost serialization: direct memcpy() applied to the whole data structure. This is applicable when the alignment, size, and byte order requirements of the native platform match those of UAVCAN. As you will see later, these properties are discoverable and checkable at the code generation time, provided that we can make reliable assumptions about the target platform (such as byte order and type alignment requirement, which is usually sizeof(type); violation of such assumptions can be detected later at compile time so they are safe).
  • Field-level zero-cost serialization: direct memcpy() applied to a given field. E.g., if we have a uint64 field in a data structure, the byte order of the platform is little-endian, and the field is always aligned at the byte boundary, we can directly memcpy the field into the final buffer, again avoiding bit-level copying.
  • Last resort: bit-level serialization. This one is chosen when the code generator cannot determine alignment statically.

The proof of alignment is obtained by PyDSDL by manipulating bit length sets of serialized representations of data types. A code generator can request PyDSDL to determine if there is such serialized representation of a composite or array type which would NOT meet a specified alignment goal (say, a byte (8 bit), or 64-bit if we're serializing a field of uint64), and then use the answer to choose the appropriate (fastest safe) serialization strategy at code generation time (there are some known performance issues: #23).

This new logic is exposed to the user via the following new API entries:

Early selection of the serialization strategy has an important implication on the serialization of nested data structures. A data structure can be nested into another one at an arbitrary alignment, which would defeat the purpose of static layout analysis since the code generator wouldn't be able to make any assumptions about the base offset. Additionally, the misaligned origin of a data structure does not necessarily imply that every following field of it will be misaligned as well. Consider the following example:

@union
uint8 a
uint8 b

Due to the one-bit union tag preceding the actual value, both a and b are misaligned. Now, imagine that the above union is nested into another structure as follows:

void7
U.1.0 the_union

The union is padded so that the one-bit tag brings the following data fields into alignment. The example demonstrates that in order to take full advantage of the layout analysis, a code generator must model nested object hierarchies holistically rather than atomically. The per-type generated serialization functions/methods may have some basic alignment requirement chosen by the author of the code generator (for example, they may require the serialization buffer to be always byte-aligned, or require that its alignment matches the largest alignment requirement of any nested type); the serialization code of an outer (containing) type would then determine statically whether the alignment requirement of a serialization function is met. If the alignment requirement is not met, the code generator would emit serialization code in-place, as if the definition of the included type were copy-pasted into the outer type, instead of invoking its serialization function.

You can see field iterators in action in this carefully crafted unit test: https://github.com/UAVCAN/pydsdl/blob/f998ad6f744b853d9b97e240ab0302df27ddd598/pydsdl/_serializable.py#L1284-L1383

...also in this Jinja2 code generation template for PyUAVCAN:

{%- macro _serialize_variable_length_array(t, ref, offset) -%}
    # Length field byte-aligned: {{ offset.is_aligned_at_byte() }}; {# -#}
     first element byte-aligned: {{ (offset + t.length_field_type.bit_length).is_aligned_at_byte() }}; {# -#}
      all elements byte-aligned: {{ (offset + t.bit_length_set).is_aligned_at_byte() }}.
    assert len({{ ref }}) <= {{ t.capacity }}, '{{ ref }}: {{ t }}'
    {{ _serialize_integer(t.length_field_type, 'len(%s)'|format(ref), offset) }}
{%- if t.element_type is BooleanType %}
    _ser_.add_{{ (offset + t.length_field_type.bit_length)|alignment_prefix }}_array_of_bits({{ ref }})
{%- elif t.element_type is PrimitiveType and t.element_type.standard_bit_length %}
    _ser_.add_{{ (offset + t.length_field_type.bit_length)|alignment_prefix -}}
          _array_of_standard_bit_length_primitives({{ ref }})
{%- else %}
    for _element_ in {{ ref }}:
        {{ _serialize_any(t.element_type, '_element_', offset + t.bit_length_set)|indent }}
{%- endif %}
{%- endmacro -%}

Where alignment_prefix is defined as: 'aligned' if offset.is_aligned_at_byte() else 'unaligned'.

In the following example (sourced from PyUAVCAN as well) look at the case of CompositeType, where we select whether the current item matches the alignment requirement of the serialization method of t (which is _serialize_aligned_(..)). If there is a match, we invoke the method; otherwise, we emit serialization code in-place.

{%- macro _serialize_any(t, ref, offset) -%}
    {%- if offset.is_aligned_at_byte() -%}
    assert _ser_.current_bit_length % 8 == 0, '{{ ref }}: {{ t }}'
    {% endif -%}
    {%- if t is VoidType -%}                    _ser_.skip_bits({{ t.bit_length }})
    {%- elif t is BooleanType -%}               _ser_.add_unaligned_bit({{ ref }})
    {%- elif t is IntegerType -%}               {{ _serialize_integer(t, ref, offset) }}
    {%- elif t is FloatType -%}                 {{ _serialize_float(t, ref, offset) }}
    {#- Despite the apparent similarities, fixed and variable arrays are very different when it comes to serialization,
     #- mostly because of the different logic of offset computation. -#}
    {%- elif t is FixedLengthArrayType -%}      {{ _serialize_fixed_length_array(t, ref, offset) }}
    {%- elif t is VariableLengthArrayType -%}   {{ _serialize_variable_length_array(t, ref, offset) }}
    {%- elif t is CompositeType -%}
        {%- if offset.is_aligned_at_byte() -%}
    {{ ref }}._serialize_aligned_(_ser_)  # Delegating because this object is always byte-aligned.
        {%- else -%}
    # Object {{ ref }} is not always byte-aligned, serializing in-place.
    {{ _serialize_composite(t, ref, offset) }}
        {%- endif -%}
    {%- else -%}{%- assert False -%}
    {%- endif -%}
{%- endmacro -%}

One can see more examples in my PyUAVCAN repo (which is still a WIP of course): https://github.com/pavel-kirienko/pyuavcan/blob/uavcan-v1.0/pyuavcan/dsdl/_templates/serialization.j2, also https://github.com/pavel-kirienko/pyuavcan/blob/6f9234ab918beefce56ef3f58920773df29079e5/pyuavcan/dsdl/_serialized_representation/_serializer.py#L56-L187

I expect that this feature will allow us to greatly simplify implementations. Particularly, libuavcan may no longer need the rather complex primitive marshaling templates, since generated serialization methods can operate on raw byte pointers now.

… seem to be shaping up the way I want them to.
…arate attributes compute_bit_length_set() and bit_length_range
…for implicit fields (variable-length arrays and unions)
@coveralls
Copy link

Pull Request Test Coverage Report for Build 213

  • 276 of 276 (100.0%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 194: 0.0%
Covered Lines: 2652
Relevant Lines: 2652

💛 - Coveralls

1 similar comment
@coveralls
Copy link

Pull Request Test Coverage Report for Build 213

  • 276 of 276 (100.0%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 194: 0.0%
Covered Lines: 2652
Relevant Lines: 2652

💛 - Coveralls

@coveralls
Copy link

coveralls commented Apr 24, 2019

Pull Request Test Coverage Report for Build 223

  • 287 of 287 (100.0%) changed or added relevant lines in 8 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 100.0%

Totals Coverage Status
Change from base Build 194: 0.0%
Covered Lines: 2657
Relevant Lines: 2657

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants