serialize: make new serialization traits object-safe #871

piodul · 2023-12-08T06:26:35Z

As @Lorak-mmk stated in #858 (comment), the new serialization traits are incompatible with dynamic dispatch - dyn SerializeCql/dyn SerializeRow just isn't usable, while this was supported for Value/ValueList.

This is due to the fact that Rust disallows using traits via dyn unless they meet some conditions - and our new traits don't meet them:

Both traits have two methods: preliminary_type_check and serialize. Having a separate type checking phase helps reduce the amount of actual type checking that needs to be performed when serializing (e.g. in case of collections). However, preliminary_type_check is an associated function instead of a method by design (i.e. it doesn't take a self parameter), and that prevents from making it object-safe.
The serialize method takes a generic "writer" parameter that it can use to serialize the object. The idea behind the generic writer was to have two implementations: one that doesn't write anything but only counts bytes, and the other that actually appends bytes into a byte buffer. This would allow us to use serialize for both things: calculating the needed buffer size and also to actually serialize. However, as Rust uses monomorphisation to implement generics, they cannot be directly used with dynamic dispatch.

While it would be possible to keep the current definitions intact and work around the problem by introducing "erased" versions of the traits (credit to erased_serde for inspiration), it increases complexity and could make serialization of type-erased types inefficient (due to the writers needing to be type-erased, too). Instead, let's simplify it:

In current design, it is perfectly fine to do type checking in serialize, so move all type checking there from the existing preliminary_type_check methods and then remove them. While it introduces more type checks than necessary, it is not clear whether they have a measurable performance impact and it will be possible to extend the trait to reintroduce the optimization in a backwards-compatible way - for now, let's keep it simple.
Remove the byte counting writers and only leave the "proper" ones, substituting the current writer traits with the latter ones. Later, we can consider introducing a size_hint method for the traits; it would be optional, but would have to be manually implemented (which is not a huge amount of work).

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
~~I have adjusted the documentation in ./docs/source/.~~
~~I added appropriate Fixes: annotations to PR description.~~

They are not going to be used after in the adjusted design, so remove them.

The tests in the `writers` module used a helper function to run the same code with both counting writers and buffer-backed writers, which required implementing a trait (Rust doesn't have generic closures). This complexity is no longer necessary.

The writer traits are using a type-level trick to force implementors of `SerializeCql::serialize` to call `CellWriter::finish()`: the latter returns `CellWriter::WrittenCellProof` which the former is required to return. The fact that the proof is an associated type of the writer and the `serialize` method is generic over the writer type makes sure that the proof truly comes from the `finish()` call on the writer provided to the method and not from any other writer. We are going to change `CellWriter` into a struct, so this trick will no longer be applicable. Instead, introduce a `WrittenCellProof` struct, which is a zero-sized type generic over an invariant lifetime. The `CellWriter` struct will consume itself and return a `WrittenCellProof` with the same lifetime parameter as the original cell writer; `serialize` will require the implementor to return it. The lifetime invariance makes sure that one proof cannot be assigned to another if their lifetimes are not exactly the same.

Adjust the interface to use the buffer backed row writer. Rename BufBackedRowWriter to RowWriter to make our life easier later when we replace the writer trait with the buffer-backed row writer struct.

Adjust the interface to use the buffer backed cell writer. Rename BufBackedCellWriter to CellWriter to make our life easier later when we replace the writer trait with the buffer-backed cell writer struct.

Remove RowWriter/CellWriter/CellValueBuilder traits and rename their buffer-backed implementations to take their place as structs. Fix the existing direct uses of buf-backed structs to refer to them by their new name.

Put them in the same `impl` block as other methods of the same structs for improved visual clarity.

We are going to remove `preliminary_type_check` methods by moving the type checking logic to `serialize`. Add default `preliminary_type_check` impls in the trait definitions to make it possible to gradually remove their impls.

Moves the type checking logic of types that used `impl_exact_preliminary_type_check` into their `serialize` method.

As we get rid of `preliminary_type_check`, error variants that were used to represent type check failure of some sub-type (e.g. a tuple element) will not be used anymore. Remove them now and adjust the rest of the code.

Type checks for compound types only check that the type is of the right shape (e.g. it's a UDT, a list, etc.). This is already checked in `serialize`, so remove the `preliminary_type_check` impls.

Remaining `SerializeCql::preliminary_type_check` impls can be removed because they either not do anything or call `preliminary_type_check` on other types.

In case when serialization of one of the values fails, a row represented by a map would return a type check error. A serialization error should be returned instead.

There's not many of them, so remove them all in one go. The ColumnTypeCheckFailed serialization error variant is also removed as it is no longer used.

As we removed all the non-default impls of `preliminary_type_check`, the method now does nothing. Remove it and all the remaining callers.

The bound on `impl<T: SerializeRow> SerializeRow for &T` implicitly requires the type `T` to be sized, preventing it from being used with `dyn SerializeRow`. Relax the restriction by adding `+ ?Sized` to be able to use it with trait objects.

As a final confirmation of the work in the PR and to prevent regressions, add tests which explicitly use `dyn SerializeCql` and `dyn SerializeRow`.

piodul requested a review from Lorak-mmk December 8, 2023 06:26

piodul force-pushed the serialize-trait-genericide branch from e53348b to b945435 Compare December 8, 2023 06:36

piodul added 19 commits December 8, 2023 12:19

serialize/writers: get rid of counting writers

9119af4

They are not going to be used after in the adjusted design, so remove them.

serialize/writers: simplify tests

ff9e363

The tests in the `writers` module used a helper function to run the same code with both counting writers and buffer-backed writers, which required implementing a trait (Rust doesn't have generic closures). This complexity is no longer necessary.

serialize/row: de-genericise SerializeRow

e696663

Adjust the interface to use the buffer backed row writer. Rename BufBackedRowWriter to RowWriter to make our life easier later when we replace the writer trait with the buffer-backed row writer struct.

serialize/value: de-genericise SerializeCql

f374ba7

Adjust the interface to use the buffer backed cell writer. Rename BufBackedCellWriter to CellWriter to make our life easier later when we replace the writer trait with the buffer-backed cell writer struct.

serialize/writers: replace writer traits with buffer-backed impls

d37b2c4

Remove RowWriter/CellWriter/CellValueBuilder traits and rename their buffer-backed implementations to take their place as structs. Fix the existing direct uses of buf-backed structs to refer to them by their new name.

serialze/writers: move new/value_count methods higher

458c4de

Put them in the same `impl` block as other methods of the same structs for improved visual clarity.

serialize: provide default preliminary typecheck impls

9dc39fd

We are going to remove `preliminary_type_check` methods by moving the type checking logic to `serialize`. Add default `preliminary_type_check` impls in the trait definitions to make it possible to gradually remove their impls.

serialize/value: move exact typechecks to serialize

6d7cc98

Moves the type checking logic of types that used `impl_exact_preliminary_type_check` into their `serialize` method.

serialize/value: move CqlValue's typecheck to serialize

cc0a1a0

serialize/value: remove nested typecheck errors

69cb8f4

As we get rid of `preliminary_type_check`, error variants that were used to represent type check failure of some sub-type (e.g. a tuple element) will not be used anymore. Remove them now and adjust the rest of the code.

serialize/value: remove typechecks of compound types

53b7349

Type checks for compound types only check that the type is of the right shape (e.g. it's a UDT, a list, etc.). This is already checked in `serialize`, so remove the `preliminary_type_check` impls.

serialize/value: remove remaining preliminary type check impls

e8a92ad

Remaining `SerializeCql::preliminary_type_check` impls can be removed because they either not do anything or call `preliminary_type_check` on other types.

serialize/row: fix incorrect error being returned

6e074d5

In case when serialization of one of the values fails, a row represented by a map would return a type check error. A serialization error should be returned instead.

serialize/row: remove preliminary type check impls

5eddc08

There's not many of them, so remove them all in one go. The ColumnTypeCheckFailed serialization error variant is also removed as it is no longer used.

serialize: remove preliminary_type_check method

006dc35

As we removed all the non-default impls of `preliminary_type_check`, the method now does nothing. Remove it and all the remaining callers.

serialize/value: inline check_and_serialize into callers

ef6e428

serialize: tests for dyn SerializeCql/dyn SerializeRow

5733a2f

As a final confirmation of the work in the PR and to prevent regressions, add tests which explicitly use `dyn SerializeCql` and `dyn SerializeRow`.

piodul force-pushed the serialize-trait-genericide branch from b945435 to 5733a2f Compare December 8, 2023 11:20

Lorak-mmk approved these changes Dec 9, 2023

View reviewed changes

piodul merged commit 6c9ec8f into scylladb:main Dec 9, 2023
8 checks passed

Lorak-mmk mentioned this pull request Dec 21, 2023

Serialization refactor: add new serialization traits #801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

serialize: make new serialization traits object-safe #871

serialize: make new serialization traits object-safe #871

piodul commented Dec 8, 2023

serialize: make new serialization traits object-safe #871

serialize: make new serialization traits object-safe #871

Conversation

piodul commented Dec 8, 2023

Pre-review checklist