Switch Session to new serialization traits #858

Lorak-mmk · 2023-11-02T16:40:53Z

This PR is based on #855 and is the main part of serialization refactor - switching public API to new trait.
The last commit makes the actual change. Previous ones are different changes that were necessary (or helpful) to make the switch.

There are still some things to do:

Comments on new traits / structs
Documentation
Rename SerializeCQL / SerializeRow / NewSerializedValues to some sensible names
(Maybe) Check who calls query_filter_keyspace_name and use prepared statements in those places to avoid unnecessary prepares.

But the code changes are pretty much ready for review.

Pre-review checklist

I have split my patch into logically separate commits.
All commit messages clearly explain what they change and why.
I added relevant tests for new features and bug fixes.
All commits compile, pass static checks and pass test.
PR description sums up the changes and reasons why they should be introduced.
I have provided docstrings for the public items that I want to introduce.
I have adjusted the documentation in ./docs/source/.
I added appropriate Fixes: annotations to PR description.

piodul

The last commit is quite large and hard to review. I think the changes there should be done in a more gradual fashion, utilizing the into_old_serialized_values method more.

scylla-cql/src/errors.rs

scylla-proxy/src/proxy.rs

scylla-cql/src/types/serialize/row.rs

scylla/src/transport/topology.rs

scylla/benches/benchmark.rs

piodul · 2023-11-16T09:37:32Z

Also, the improvements in the proxy related to the detection of control connection could be sent in a separate PR in order to merge them more quickly.

Lorak-mmk · 2023-11-20T15:24:18Z

Also, the improvements in the proxy related to the detection of control connection could be sent in a separate PR in order to merge them more quickly.

Opened #863

Lorak-mmk · 2023-11-20T15:24:30Z

Ooops, misslick

scylla-cql/src/types/serialize/row.rs

scylla-cql/src/types/serialize/mod.rs

piodul · 2023-11-23T14:22:48Z

scylla/src/transport/connection.rs

@@ -4,6 +4,7 @@ use scylla_cql::errors::TranslationError;
 use scylla_cql::frame::request::options::Options;
 use scylla_cql::frame::response::Error;
 use scylla_cql::frame::types::SerialConsistency;
+use scylla_cql::frame::value::SerializedValues;


connection.rs: Don't allow queries with values

After serialization refactor it will be impossible to perform unprepared
query with values, because serializing values will require knowing
column types.

In order to make refactor easier and better split responsibility, this
commit removes values arguments from Connection methods, so that
it is callers responsibility to prepare the query if necessary.

I'm not sure if I see why it makes the remaining refactor easier. It looks like you are simplifying the API of Connection but this is just pushing this concern to the caller. This requires modifying lots of places in the code, including tests and query_filter_keyspace_name which manually checks whether values is empty and dispatches to query or execute anyway.

I wonder whether it wouldn't be just simpler to drop this commit and then rework the last one, but I may be missing something.

My thinking is that we don't want to send values in unprepared queries, and removing such capability from Connection is a way to guarantee that. If the only users of this functionality are tests and query_filter_keyspace_name, then I think it's not worth it to have this functionality.

I'll look into this matter again - I remember that some changes were less pleasant than I hoped, maybe reverting those changes in connections would help with that.

My thinking is that we don't want to send values in unprepared queries, and removing such capability from Connection is a way to guarantee that.

What do you mean exactly by "send values in unprepared queries"? We are no longer doing that on the protocol level after #806 and #812 . If you mean to disallow calling .query() with values, then I'm not sure if it makes sense to disallow it for Connection but not for Session (and AFAIK we didn't decide to do it for Session at this point).

Removing this commit opens us up to the problems like following:
topology::query_filter_keyspace_name calls Connection::query_iter, potentially with values.
Connection::query_iter will call RowIterator::new_for_connection_query_iter, which will,
for each page, call Connection::query_with_consistency - which, if there are values,
prepares the query underneath to serialize them.
Such problems seem pretty hard to notice normally but removing values from Connection pretty much prevents them.

Currently, I tried to revert removal of values from Connection and from RowIterator::new_for_query, as it forced me to prepare a query globally in Session::query_iter if there are values - now it is again prepared for each sub query.
Doing this created another problem I'm not yet sure how to solve: - if RowIterator::new_for_query accepts values, how do we get a size of it for request span? In either variant of Connection API, new_for_query won't serialize the values, Connection will do it. There is no place where RowIterator sees serialized values, so it can't record it's size for span.

For a user of Connection struct (e.g. struct Session), API without values args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.

One advantage I see of removing values from Connection is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneath Connection::query* methods, so we are more aware of it while writing other code.

Removing this commit opens us up to the problems like following: topology::query_filter_keyspace_name calls Connection::query_iter, potentially with values. Connection::query_iter will call RowIterator::new_for_connection_query_iter, which will, for each page, call Connection::query_with_consistency - which, if there are values, prepares the query underneath to serialize them. Such problems seem pretty hard to notice normally but removing values from Connection pretty much prevents them.

This issue is easy to fix - prepare the query if needed in Connection::query_iter and then change RowIterator::new_connection_query_iter to be RowIterator::new_connection_execute_iter.

Currently, I tried to revert removal of values from Connection and from RowIterator::new_for_query, as it forced me to prepare a query globally in Session::query_iter if there are values - now it is again prepared for each sub query.

This sounds like a harder problem because, unlike Connection::query_iter, the iterator might switch to another connection when performing a query fails. Perhaps we could be smarter here and prepare the query when RowIteratorWorker::work chooses a connection, but doing that sounds like too much work to be worth it, so your current approach looks OK for this one.

Doing this created another problem I'm not yet sure how to solve: - if RowIterator::new_for_query accepts values, how do we get a size of it for request span? In either variant of Connection API, new_for_query won't serialize the values, Connection will do it. There is no place where RowIterator sees serialized values, so it can't record it's size for span.

This problem should go away if you apply my first suggestion.

For a user of Connection struct (e.g. struct Session), API without values args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.

One advantage I see of removing values from Connection is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneath Connection::query* methods, so we are more aware of it while writing other code.

You can look at it from another angle - having to do manual prepare in metadata fetch code sounds like polluting it with unnecessary details which should be handled on a lower level.

The examples with query_iter convince me that it might be a good idea after all to automatically prepare on the highest layer instead on the lowest, because otherwise unnecessary preparing may occur. Though I'd prefer that query_iter API did prepare automatically like I described (I see it as a user API in some sense, where the control connection is the user), there is an issue about changing internal CQL statements to use prepared statements (#417) so I suppose choosing to use Connection::{query_iter,execute_iter} explicitly is acceptable for now.

To sum up - let's not pursue my original suggestion from my first comment in this thread, I don't think it's worth it anymore.

scylla-cql/src/types/serialize/row.rs

scylla/src/statement/prepared_statement.rs

Lorak-mmk · 2023-11-30T19:22:55Z

Rebased on current ~~adjust_interface~~ main branch
Addressed most of review comments (I marked those as resolved)

piodul · 2023-12-01T05:21:04Z

scylla-cql: make SerializationError a newtype instead of alias

Please send this commit as a separate PR. It will introduce lots of conflicts with my other PRs, so let's merge this first so that I can rebase on top of it.

This commit also adjusts `test_unusual_valuelists` in session_test.rs, because it used `&dyn Value` type explicitly.

We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.

… APIs

This is in preparation to switch `Connection::execute_iter` to new serialization APIs.

…ation API

…ew` methods to new serialization API This commit switches all those methods at once, because switching them one by one is too problematic - lifetime issue of PrimaryKey and temporary LegacySerializedValues.

…ization API

There are fallback implementations of SerializeRow for LegacySerializedValues provided to help users transition to new traits. This commit updates most of the code that uses those implementations, so that removing them in the future is easier. Uses that are not removed: - Session::batch: will be removed when batches are switched to new API - value_tests::map_value_list: will be removed while removing LegacySerializedValues - serialize/row.rs tests: will be removed while removing LegacySerializedValues

During the switch to new serialization API `#[allow(dead_code)]` was added in several places as a temporary measure. Those can now be removed.

Lorak-mmk · 2023-12-12T15:09:25Z

Addressed review comments, fixed a book build errors (in a minimal way - I only introduced changes that were strictly necessary, I don't want to expand scope of this PR)

Lorak-mmk marked this pull request as draft November 2, 2023 16:40

Lorak-mmk force-pushed the switch-serialization-traits branch 13 times, most recently from 5c2657e to ef0cb49 Compare November 8, 2023 17:12

Lorak-mmk force-pushed the switch-serialization-traits branch from ef0cb49 to 6f14f8c Compare November 12, 2023 17:19

Lorak-mmk requested a review from piodul November 12, 2023 17:37

Lorak-mmk marked this pull request as ready for review November 12, 2023 17:37

piodul reviewed Nov 16, 2023

View reviewed changes

Lorak-mmk closed this Nov 20, 2023

Lorak-mmk reopened this Nov 20, 2023

Lorak-mmk force-pushed the switch-serialization-traits branch from 6f14f8c to 0adab94 Compare November 23, 2023 02:11

Lorak-mmk requested a review from piodul November 23, 2023 12:04

Lorak-mmk changed the title ~~WIP: Switch serialization traits~~ Switch Session to new serialization traits Nov 23, 2023

piodul reviewed Nov 23, 2023

View reviewed changes

Lorak-mmk force-pushed the switch-serialization-traits branch from 0adab94 to b3a9d69 Compare November 30, 2023 19:22

Lorak-mmk added 24 commits December 12, 2023 15:57

Session: switch execute_iter to new serialization API

8675646

Session: switch execute to new serialization API

5ebb2d2

This commit also adjusts `test_unusual_valuelists` in session_test.rs, because it used `&dyn Value` type explicitly.

examples: Use both new and old serialization APIs

9bf7b77

We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.

cql_collections_test: Use both new and old serialization APIs

6cc643a

We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.

cql_types_test: Use both new and old serialization APIs

ea93d39

We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.

session_test: Use both new and old serialization APIs

ac88e0a

We use both APIs so that the code works before and after switching `Session::query` from accepting `ValueList` to `SerializeRow`. We can remove usage of old APIs after the change to `Session::query`.

Session: switch query method to new serialization API

b3261cd

Book: Derive new serialization traits in udt.md and values.md

25a6d36

examples: Remove usage of old serialization APIs

c353f17

cql_collections_test: Remove usage of old serialization APIs

8a33467

cql_types_test: Remove usage of old serialization APIs

72a81ce

session_test: Remove usage of old serialization macros / traits

c7e766e

CachingSession: Use new serialization APIs

4a09baa

RowIterator: switch new_for_prepared_statement to new serialization…

3d32c1c

… APIs

topology: type-safe serialization in query_filter_keyspace_name

d3b884b

This is in preparation to switch `Connection::execute_iter` to new serialization APIs.

Connection: switch execute_iter to new serialization API

d5b8636

RowIterator: switch new_for_connection_execute_iter to new serializ…

a4a27de

…ation API

Connection: switch execute method to new serialization API

7ae1abb

Connection: switch execute_with_consistency to new serialization API

5960e3b

PreparedStatement: switch extract_partition_key* and `PrimaryKey::n…

991ea48

…ew` methods to new serialization API This commit switches all those methods at once, because switching them one by one is too problematic - lifetime issue of PrimaryKey and temporary LegacySerializedValues.

partitioner: switch calculate_token_for_partition_key to new serial…

4d5c33a

…ization API

QueryParameters: switch values to new SerializedValues

ee61cb0

Remove unnecessary #[allow(dead_code)]

64c6ac6

During the switch to new serialization API `#[allow(dead_code)]` was added in several places as a temporary measure. Those can now be removed.

Lorak-mmk force-pushed the switch-serialization-traits branch from bd5a6c1 to 64c6ac6 Compare December 12, 2023 15:08

piodul approved these changes Dec 12, 2023

View reviewed changes

piodul merged commit 969d37e into scylladb:main Dec 12, 2023
9 checks passed

Lorak-mmk mentioned this pull request Dec 13, 2023

Serialization refactor: adjust the codebase to use SerializeRow and SerializeCql in the public API #822

Closed

Lorak-mmk mentioned this pull request Dec 21, 2023

Serialization refactor: add new serialization traits #801

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Session to new serialization traits #858

Switch Session to new serialization traits #858

Lorak-mmk commented Nov 2, 2023 •

edited

Loading

piodul left a comment

piodul commented Nov 16, 2023

Lorak-mmk commented Nov 20, 2023

Lorak-mmk commented Nov 20, 2023

piodul Nov 23, 2023

Lorak-mmk Nov 30, 2023

piodul Dec 1, 2023

Lorak-mmk Dec 4, 2023 •

edited

Loading

piodul Dec 5, 2023 •

edited

Loading

Lorak-mmk commented Nov 30, 2023 •

edited

Loading

piodul commented Dec 1, 2023

Lorak-mmk commented Dec 12, 2023

Switch Session to new serialization traits #858

Switch Session to new serialization traits #858

Conversation

Lorak-mmk commented Nov 2, 2023 • edited Loading

Pre-review checklist

piodul left a comment

Choose a reason for hiding this comment

piodul commented Nov 16, 2023

Lorak-mmk commented Nov 20, 2023

Lorak-mmk commented Nov 20, 2023

piodul Nov 23, 2023

Choose a reason for hiding this comment

Lorak-mmk Nov 30, 2023

Choose a reason for hiding this comment

piodul Dec 1, 2023

Choose a reason for hiding this comment

Lorak-mmk Dec 4, 2023 • edited Loading

Choose a reason for hiding this comment

piodul Dec 5, 2023 • edited Loading

Choose a reason for hiding this comment

Lorak-mmk commented Nov 30, 2023 • edited Loading

piodul commented Dec 1, 2023

Lorak-mmk commented Dec 12, 2023

Lorak-mmk commented Nov 2, 2023 •

edited

Loading

Lorak-mmk Dec 4, 2023 •

edited

Loading

piodul Dec 5, 2023 •

edited

Loading

Lorak-mmk commented Nov 30, 2023 •

edited

Loading