Random Tablet Selection causes "time inconsistent" data reads #11971

arthurschreiber · 2022-12-15T09:13:21Z

vtgate performs random tablet selection (with preference for local cell tablets) when deciding which tablet to use for serving a query. On larger MySQL setups, where there might be a high variance in replication lag between replica nodes, this behaviour can lead to
data inconsistencies that is hard to understand and work around.

Here's an example:

Let's suppose we have 2 tables, check_suites and check_runs.
A check_suites row has many associated check_runs rows.
A row in check_runs can only be created after creating the parent check_suites row.
Data is never deleted from check_suites.

Based on this description, an application that runs against a "regular" MySQL primary/replica setup, where connections are load balanced using e.g. haproxy can rely on the fact that it can read from check_runs, and every row will have a reference to a check_suites row that is guaranteed to exist.

The application will have a "time consistent" view of the data, where a read query will always see data at the same point of a previous query, or a later point in time.

When sending requests through vtgate, that expectation is broken if queries are distributed across more than one replica. 😔

The first query that is reading from check_runs might hit replica A, which has almost no replication lag. The second query, reading from check_suites, might hit replica B, which has over a second of replication lag. This can lead to the second query not being able to find any matching row in the check_suites table, which in turn can lead to errors or other weird behaviour in our application.

The application no longer has a "time consistent" view of the data, instead read queries see data at newer and older points in time.

The text was updated successfully, but these errors were encountered:

deepthi · 2022-12-15T17:03:30Z

Have you explored using transactional reads? We do support that.

arthurschreiber · 2022-12-15T17:53:44Z

I'm not sure this would solve the problem we're facing in a satisfactory way. I might be wrong on this, so here's what I think the problems with transaction reads would be.

First, it's unclear what parts of our codebase need to be wrapped in transactions, and with a codebase as large as ours, it'd be a herculean effort to figure that out.

We could wrap every web request that comes in with transaction, but that will lead to gravely reducing the multiplexing done on the vttablet. We'd hit the vttablet connection limits that we've set very quickly, as connections would be "locked" for the duration of that web request. This would be made even worse with scatter or other multi shard queries.

It's also hard for me to estimate what the change in semantics would mean for our application, and which isolation level would be the correct one to use.

Overall, it feels like a much heavier solution to this problem than what we would require. 🤔

arthurschreiber mentioned this issue Dec 15, 2022

[WIP] Consistent Hashing for tablet selection #11959

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Random Tablet Selection causes "time inconsistent" data reads #11971

Random Tablet Selection causes "time inconsistent" data reads #11971

arthurschreiber commented Dec 15, 2022 •

edited

Loading

deepthi commented Dec 15, 2022

arthurschreiber commented Dec 15, 2022 •

edited

Loading

Random Tablet Selection causes "time inconsistent" data reads #11971

Random Tablet Selection causes "time inconsistent" data reads #11971

Comments

arthurschreiber commented Dec 15, 2022 • edited Loading

deepthi commented Dec 15, 2022

arthurschreiber commented Dec 15, 2022 • edited Loading

arthurschreiber commented Dec 15, 2022 •

edited

Loading

arthurschreiber commented Dec 15, 2022 •

edited

Loading