Move non-critical metadata out of Session #595

piodul · 2022-11-15T12:43:41Z

(Proposal based on the discussion in #574, credit to @wyfo for the idea)

By default, the Session fetches full schema metadata, but only a small part of it is necessary for the session to work properly - currently, we only need keyspace replication info and per-table partitioners. If the amount of items in the schema is large, the metadata may be large in size as well. Considering that not everybody needs this information, this can lead to wasted cluster load, bandwidth and RAM.

Currently, we allow preventing non-essential metadata from being fetched via the fetch_schema_metadata configuration option, and restrict the keyspaces for which metadata is fetched at all via keyspaces_to_fetch. However, those options are opt-out rather than opt-in, and it's hard to manage the lifetime of the metadata (somebody might want to fetch schema metadata once and consume it somehow, and then deallocate it - it's not possible right now).

To reduce the waste, we could move the logic that fetches full metadata outside the Session, and the session would just keep the keyspace replication info and per-table partitioners. The ability to fetch full metadata could be moved to a separate module. We could provide abstractions that allow better control over what metadata is being fetched and when, and allow better control over the lifetime of this metadata.

Some things to consider before designing/implementing the solution:

The API of the CPP driver requires the metadata to be available at all times, and to be updated when the session performs schema changes. Considering the cpp-rust-driver project, we need to make it possible to emulate the current semantics. We could provide a metadata manager object which manages metadata in a similar way to what the session currently does. The session would have to be able to push schema change events to the manager so that it synchronously waits for the manager to update the schema.

The text was updated successfully, but these errors were encountered:

roydahan · 2024-03-10T13:38:24Z

Is this still on-track planned for MS 0.13.0?

Lorak-mmk · 2024-05-10T17:41:03Z

I'm trying to think about how to approach this. Some constraints and observations - many already mentioned by you:

Driver needs some metadata for prepared statements to work: keyspace with replication strategies and tables with partitioners. Let's call it obligatory metadata, the rest will be additional metadata.
Some users need access to metadata, some don't - additional metadata shouldn't be fetched unconditionally.
User may only need access to some parts of the metadata, so it should be possible to filter what we fetch. Imo the higher the granularity of fetching the better.
If a user has a lot of keyspaces with a lot of tables then even obligatory metadata may be large, but the user may only plan to query a few tables - then fetching all of obligatory metadata would be a waste. It would be nice to add some form of optional table / keyspace allowlist for obligatory metadata.
User should have more control over lifetime of additional metadata - it should not always be allocated forever.
At the same time it should be possible to preserve the current behavior of having metadata continuously updated.
We should avoid doing unnecessary queries - like fetching some parts of metadata twice.
When metadata is continuously updated by the driver (like is currently the case) then obligatory metadata and additional metadata should be consistent. It would be weird and counterintuitive if additional metadata had some table but obligatory metadata didn't fetch it yet (or vice versa).
Note: metadata will be inconsistent (or will fail to fetch) if we execute different fetching queries against different nodes. Right now everything is executed on control connection.

To me it looks very hard, if even possible, to satisfy those constraints.
If we move additional metadata out of session, then it won't have access to control connection so it will be difficult to provide consistency, for two reasons:

different connection is used, and different nodes may return different schemas IIUC
Obligatory schema will be fetched at different times (periodically and in response to events) than additional schema

Such outside metadata fetcher would also need some additional APIs if it were to avoid refetching keyspaces and tables.

If someone has a nice idea on how to make this split, I'm listening. If not, maybe its better to keep metadata fetching generally as-is, but provide:

better, more granular filters for fetching
possibility to remove some / all additional metadata to reclaim memory.

@wprzytula @piodul

piodul added the performance Improves performance of existing features label Nov 15, 2022

piodul mentioned this issue Nov 15, 2022

Do not fetch all keyspaces #574

Closed

piodul added the API-breaking This might introduce incompatible API changes label Mar 28, 2023

piodul added this to the 1.0.0 milestone Apr 5, 2023

piodul mentioned this issue Apr 5, 2023

Unify ColumnType and CqlType #691

Open

roydahan modified the milestones: 1.0.0, 0.12.0 Nov 12, 2023

Lorak-mmk self-assigned this Nov 15, 2023

avelanarius modified the milestones: 0.12.0, 0.13.0 Jan 15, 2024

Lorak-mmk mentioned this issue Mar 28, 2024

Introduce support for Tablets #937

Merged

18 tasks

wprzytula mentioned this issue Apr 11, 2024

Planned API breaking changes - umbrella issue #979

Open

avelanarius modified the milestones: 0.13.0, 0.14.0 Apr 30, 2024

This was referenced Jun 6, 2024

Metadata: don't refresh periodically by default. #1008

Open

Metadata API changes - umbrella issue #1010

Open

wprzytula added the area/metadata label Jul 9, 2024

roydahan modified the milestones: 0.14.0, 0.15.0 Jul 30, 2024

Lorak-mmk modified the milestones: 0.16.0, 2.0.0 Nov 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move non-critical metadata out of Session #595

Move non-critical metadata out of Session #595

piodul commented Nov 15, 2022

roydahan commented Mar 10, 2024

Lorak-mmk commented May 10, 2024

Move non-critical metadata out of Session #595

Move non-critical metadata out of Session #595

Comments

piodul commented Nov 15, 2022

roydahan commented Mar 10, 2024

Lorak-mmk commented May 10, 2024