Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

serialize: make new serialization traits object-safe #871

Merged
merged 19 commits into from
Dec 9, 2023

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Dec 8, 2023

As @Lorak-mmk stated in #858 (comment), the new serialization traits are incompatible with dynamic dispatch - dyn SerializeCql/dyn SerializeRow just isn't usable, while this was supported for Value/ValueList.

This is due to the fact that Rust disallows using traits via dyn unless they meet some conditions - and our new traits don't meet them:

  1. Both traits have two methods: preliminary_type_check and serialize. Having a separate type checking phase helps reduce the amount of actual type checking that needs to be performed when serializing (e.g. in case of collections). However, preliminary_type_check is an associated function instead of a method by design (i.e. it doesn't take a self parameter), and that prevents from making it object-safe.
  2. The serialize method takes a generic "writer" parameter that it can use to serialize the object. The idea behind the generic writer was to have two implementations: one that doesn't write anything but only counts bytes, and the other that actually appends bytes into a byte buffer. This would allow us to use serialize for both things: calculating the needed buffer size and also to actually serialize. However, as Rust uses monomorphisation to implement generics, they cannot be directly used with dynamic dispatch.

While it would be possible to keep the current definitions intact and work around the problem by introducing "erased" versions of the traits (credit to erased_serde for inspiration), it increases complexity and could make serialization of type-erased types inefficient (due to the writers needing to be type-erased, too). Instead, let's simplify it:

  1. In current design, it is perfectly fine to do type checking in serialize, so move all type checking there from the existing preliminary_type_check methods and then remove them. While it introduces more type checks than necessary, it is not clear whether they have a measurable performance impact and it will be possible to extend the trait to reintroduce the optimization in a backwards-compatible way - for now, let's keep it simple.
  2. Remove the byte counting writers and only leave the "proper" ones, substituting the current writer traits with the latter ones. Later, we can consider introducing a size_hint method for the traits; it would be optional, but would have to be manually implemented (which is not a huge amount of work).

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@piodul piodul requested a review from Lorak-mmk December 8, 2023 06:26
@piodul piodul force-pushed the serialize-trait-genericide branch from e53348b to b945435 Compare December 8, 2023 06:36
piodul added 19 commits December 8, 2023 12:19
They are not going to be used after in the adjusted design, so remove
them.
The tests in the `writers` module used a helper function to run the same
code with both counting writers and buffer-backed writers, which
required implementing a trait (Rust doesn't have generic closures). This
complexity is no longer necessary.
The writer traits are using a type-level trick to force implementors of
`SerializeCql::serialize` to call `CellWriter::finish()`: the latter
returns `CellWriter::WrittenCellProof` which the former is required to
return. The fact that the proof is an associated type of the writer and
the `serialize` method is generic over the writer type makes sure that
the proof truly comes from the `finish()` call on the writer provided to
the method and not from any other writer. We are going to change
`CellWriter` into a struct, so this trick will no longer be applicable.

Instead, introduce a `WrittenCellProof` struct, which is a zero-sized
type generic over an invariant lifetime. The `CellWriter` struct will
consume itself and return a `WrittenCellProof` with the same lifetime
parameter as the original cell writer; `serialize` will require the
implementor to return it. The lifetime invariance makes sure that one
proof cannot be assigned to another if their lifetimes are not exactly
the same.
Adjust the interface to use the buffer backed row writer. Rename
BufBackedRowWriter to RowWriter to make our life easier later when we
replace the writer trait with the buffer-backed row writer struct.
Adjust the interface to use the buffer backed cell writer. Rename
BufBackedCellWriter to CellWriter to make our life easier later when we
replace the writer trait with the buffer-backed cell writer struct.
Remove RowWriter/CellWriter/CellValueBuilder traits and rename their
buffer-backed implementations to take their place as structs.

Fix the existing direct uses of buf-backed structs to refer to them by
their new name.
Put them in the same `impl` block as other methods of the same structs
for improved visual clarity.
We are going to remove `preliminary_type_check` methods by moving the
type checking logic to `serialize`. Add default `preliminary_type_check`
impls in the trait definitions to make it possible to gradually remove
their impls.
Moves the type checking logic of types that used
`impl_exact_preliminary_type_check` into their `serialize` method.
As we get rid of `preliminary_type_check`, error variants that were used
to represent type check failure of some sub-type (e.g. a tuple element)
will not be used anymore. Remove them now and adjust the rest of the
code.
Type checks for compound types only check that the type is of the right
shape (e.g. it's a UDT, a list, etc.). This is already checked in
`serialize`, so remove the `preliminary_type_check` impls.
Remaining `SerializeCql::preliminary_type_check` impls can be removed
because they either not do anything or call `preliminary_type_check` on
other types.
In case when serialization of one of the values fails, a row represented
by a map would return a type check error. A serialization error should
be returned instead.
There's not many of them, so remove them all in one go. The
ColumnTypeCheckFailed serialization error variant is also removed as it
is no longer used.
As we removed all the non-default impls of `preliminary_type_check`, the
method now does nothing. Remove it and all the remaining callers.
The bound on `impl<T: SerializeRow> SerializeRow for &T` implicitly
requires the type `T` to be sized, preventing it from being used with
`dyn SerializeRow`. Relax the restriction by adding `+ ?Sized` to be
able to use it with trait objects.
As a final confirmation of the work in the PR and to prevent
regressions, add tests which explicitly use `dyn SerializeCql` and
`dyn SerializeRow`.
@piodul piodul force-pushed the serialize-trait-genericide branch from b945435 to 5733a2f Compare December 8, 2023 11:20
@piodul piodul merged commit 6c9ec8f into scylladb:main Dec 9, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants