Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch Session to new serialization traits #858

Merged
merged 39 commits into from
Dec 12, 2023

Conversation

Lorak-mmk
Copy link
Collaborator

@Lorak-mmk Lorak-mmk commented Nov 2, 2023

This PR is based on #855 and is the main part of serialization refactor - switching public API to new trait.
The last commit makes the actual change. Previous ones are different changes that were necessary (or helpful) to make the switch.

There are still some things to do:

  • Comments on new traits / structs
  • Documentation
  • Rename SerializeCQL / SerializeRow / NewSerializedValues to some sensible names
  • (Maybe) Check who calls query_filter_keyspace_name and use prepared statements in those places to avoid unnecessary prepares.

But the code changes are pretty much ready for review.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@Lorak-mmk Lorak-mmk marked this pull request as draft November 2, 2023 16:40
@Lorak-mmk Lorak-mmk force-pushed the switch-serialization-traits branch 13 times, most recently from 5c2657e to ef0cb49 Compare November 8, 2023 17:12
@Lorak-mmk Lorak-mmk force-pushed the switch-serialization-traits branch from ef0cb49 to 6f14f8c Compare November 12, 2023 17:19
@Lorak-mmk Lorak-mmk requested a review from piodul November 12, 2023 17:37
@Lorak-mmk Lorak-mmk marked this pull request as ready for review November 12, 2023 17:37
Copy link
Collaborator

@piodul piodul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit is quite large and hard to review. I think the changes there should be done in a more gradual fashion, utilizing the into_old_serialized_values method more.

scylla-cql/src/errors.rs Show resolved Hide resolved
scylla-proxy/src/proxy.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/serialize/row.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/serialize/row.rs Show resolved Hide resolved
scylla-cql/src/types/serialize/row.rs Outdated Show resolved Hide resolved
scylla/src/transport/topology.rs Outdated Show resolved Hide resolved
scylla/benches/benchmark.rs Outdated Show resolved Hide resolved
@piodul
Copy link
Collaborator

piodul commented Nov 16, 2023

Also, the improvements in the proxy related to the detection of control connection could be sent in a separate PR in order to merge them more quickly.

@Lorak-mmk
Copy link
Collaborator Author

Also, the improvements in the proxy related to the detection of control connection could be sent in a separate PR in order to merge them more quickly.

Opened #863

@Lorak-mmk Lorak-mmk closed this Nov 20, 2023
@Lorak-mmk
Copy link
Collaborator Author

Ooops, misslick

@Lorak-mmk Lorak-mmk reopened this Nov 20, 2023
@Lorak-mmk Lorak-mmk force-pushed the switch-serialization-traits branch from 6f14f8c to 0adab94 Compare November 23, 2023 02:11
@Lorak-mmk Lorak-mmk requested a review from piodul November 23, 2023 12:04
@Lorak-mmk Lorak-mmk changed the title WIP: Switch serialization traits Switch Session to new serialization traits Nov 23, 2023
scylla-cql/src/types/serialize/row.rs Show resolved Hide resolved
scylla-cql/src/types/serialize/mod.rs Outdated Show resolved Hide resolved
@@ -4,6 +4,7 @@ use scylla_cql::errors::TranslationError;
use scylla_cql::frame::request::options::Options;
use scylla_cql::frame::response::Error;
use scylla_cql::frame::types::SerialConsistency;
use scylla_cql::frame::value::SerializedValues;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connection.rs: Don't allow queries with values

After serialization refactor it will be impossible to perform unprepared
query with values, because serializing values will require knowing
column types.

In order to make refactor easier and better split responsibility, this
commit removes values arguments from Connection methods, so that
it is callers responsibility to prepare the query if necessary.

I'm not sure if I see why it makes the remaining refactor easier. It looks like you are simplifying the API of Connection but this is just pushing this concern to the caller. This requires modifying lots of places in the code, including tests and query_filter_keyspace_name which manually checks whether values is empty and dispatches to query or execute anyway.

I wonder whether it wouldn't be just simpler to drop this commit and then rework the last one, but I may be missing something.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that we don't want to send values in unprepared queries, and removing such capability from Connection is a way to guarantee that. If the only users of this functionality are tests and query_filter_keyspace_name, then I think it's not worth it to have this functionality.

I'll look into this matter again - I remember that some changes were less pleasant than I hoped, maybe reverting those changes in connections would help with that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that we don't want to send values in unprepared queries, and removing such capability from Connection is a way to guarantee that.

What do you mean exactly by "send values in unprepared queries"? We are no longer doing that on the protocol level after #806 and #812 . If you mean to disallow calling .query() with values, then I'm not sure if it makes sense to disallow it for Connection but not for Session (and AFAIK we didn't decide to do it for Session at this point).

Copy link
Collaborator Author

@Lorak-mmk Lorak-mmk Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this commit opens us up to the problems like following:
topology::query_filter_keyspace_name calls Connection::query_iter, potentially with values.
Connection::query_iter will call RowIterator::new_for_connection_query_iter, which will,
for each page, call Connection::query_with_consistency - which, if there are values,
prepares the query underneath to serialize them.
Such problems seem pretty hard to notice normally but removing values from Connection pretty much prevents them.

Currently, I tried to revert removal of values from Connection and from RowIterator::new_for_query, as it forced me to prepare a query globally in Session::query_iter if there are values - now it is again prepared for each sub query.
Doing this created another problem I'm not yet sure how to solve: - if RowIterator::new_for_query accepts values, how do we get a size of it for request span? In either variant of Connection API, new_for_query won't serialize the values, Connection will do it. There is no place where RowIterator sees serialized values, so it can't record it's size for span.

For a user of Connection struct (e.g. struct Session), API without values args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.

One advantage I see of removing values from Connection is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneath Connection::query* methods, so we are more aware of it while writing other code.

Copy link
Collaborator

@piodul piodul Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this commit opens us up to the problems like following: topology::query_filter_keyspace_name calls Connection::query_iter, potentially with values. Connection::query_iter will call RowIterator::new_for_connection_query_iter, which will, for each page, call Connection::query_with_consistency - which, if there are values, prepares the query underneath to serialize them. Such problems seem pretty hard to notice normally but removing values from Connection pretty much prevents them.

This issue is easy to fix - prepare the query if needed in Connection::query_iter and then change RowIterator::new_connection_query_iter to be RowIterator::new_connection_execute_iter.

Currently, I tried to revert removal of values from Connection and from RowIterator::new_for_query, as it forced me to prepare a query globally in Session::query_iter if there are values - now it is again prepared for each sub query.

This sounds like a harder problem because, unlike Connection::query_iter, the iterator might switch to another connection when performing a query fails. Perhaps we could be smarter here and prepare the query when RowIteratorWorker::work chooses a connection, but doing that sounds like too much work to be worth it, so your current approach looks OK for this one.

Doing this created another problem I'm not yet sure how to solve: - if RowIterator::new_for_query accepts values, how do we get a size of it for request span? In either variant of Connection API, new_for_query won't serialize the values, Connection will do it. There is no place where RowIterator sees serialized values, so it can't record it's size for span.

This problem should go away if you apply my first suggestion.

For a user of Connection struct (e.g. struct Session), API without values args covers all the use cases that the previous one did - perhaps with a bit more code sometimes, as the caller needs to prepare the query explicitly if they want to send values.

One advantage I see of removing values from Connection is that it makes the API more explicit - additional overhead of preparing queries is not hidden underneath Connection::query* methods, so we are more aware of it while writing other code.

You can look at it from another angle - having to do manual prepare in metadata fetch code sounds like polluting it with unnecessary details which should be handled on a lower level.


The examples with query_iter convince me that it might be a good idea after all to automatically prepare on the highest layer instead on the lowest, because otherwise unnecessary preparing may occur. Though I'd prefer that query_iter API did prepare automatically like I described (I see it as a user API in some sense, where the control connection is the user), there is an issue about changing internal CQL statements to use prepared statements (#417) so I suppose choosing to use Connection::{query_iter,execute_iter} explicitly is acceptable for now.

To sum up - let's not pursue my original suggestion from my first comment in this thread, I don't think it's worth it anymore.

scylla-cql/src/types/serialize/row.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/serialize/row.rs Show resolved Hide resolved
scylla-cql/src/types/serialize/row.rs Outdated Show resolved Hide resolved
scylla/src/statement/prepared_statement.rs Show resolved Hide resolved
@Lorak-mmk Lorak-mmk force-pushed the switch-serialization-traits branch from 0adab94 to b3a9d69 Compare November 30, 2023 19:22
@Lorak-mmk
Copy link
Collaborator Author

Lorak-mmk commented Nov 30, 2023

  • Rebased on current adjust_interface main branch
  • Addressed most of review comments (I marked those as resolved)

@piodul
Copy link
Collaborator

piodul commented Dec 1, 2023

scylla-cql: make SerializationError a newtype instead of alias

Please send this commit as a separate PR. It will introduce lots of conflicts with my other PRs, so let's merge this first so that I can rebase on top of it.

This commit also adjusts `test_unusual_valuelists` in session_test.rs,
because it used `&dyn Value` type explicitly.
We use both APIs so that the code works before and after switching
`Session::query` from accepting `ValueList` to `SerializeRow`.
We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching
`Session::query` from accepting `ValueList` to `SerializeRow`.
We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching
`Session::query` from accepting `ValueList` to `SerializeRow`.
We can remove usage of old APIs after the change to `Session::query`.
We use both APIs so that the code works before and after switching
`Session::query` from accepting `ValueList` to `SerializeRow`.
We can remove usage of old APIs after the change to `Session::query`.
This is in preparation to switch `Connection::execute_iter` to new
serialization APIs.
…ew` methods to new serialization API

This commit switches all those methods at once, because switching them one by one
is too problematic - lifetime issue of PrimaryKey and temporary LegacySerializedValues.
There are fallback implementations of SerializeRow for LegacySerializedValues
provided to help users transition to new traits. This commit updates most of the
code that uses those implementations, so that removing them in the future is easier.

Uses that are not removed:
- Session::batch: will be removed when batches are switched to new API
- value_tests::map_value_list: will be removed while removing LegacySerializedValues
- serialize/row.rs tests: will be removed while removing LegacySerializedValues
During the switch to new serialization API `#[allow(dead_code)]` was
added in several places as a temporary measure. Those can now be removed.
@Lorak-mmk Lorak-mmk force-pushed the switch-serialization-traits branch from bd5a6c1 to 64c6ac6 Compare December 12, 2023 15:08
@Lorak-mmk
Copy link
Collaborator Author

Addressed review comments, fixed a book build errors (in a minimal way - I only introduced changes that were strictly necessary, I don't want to expand scope of this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants