Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Tablet Selection causes "time inconsistent" data reads #11971

Open
arthurschreiber opened this issue Dec 15, 2022 · 2 comments
Open

Random Tablet Selection causes "time inconsistent" data reads #11971

arthurschreiber opened this issue Dec 15, 2022 · 2 comments

Comments

@arthurschreiber
Copy link
Contributor

arthurschreiber commented Dec 15, 2022

vtgate performs random tablet selection (with preference for local cell tablets) when deciding which tablet to use for serving a query. On larger MySQL setups, where there might be a high variance in replication lag between replica nodes, this behaviour can lead to
data inconsistencies that is hard to understand and work around.

Here's an example:

  • Let's suppose we have 2 tables, check_suites and check_runs.
  • A check_suites row has many associated check_runs rows.
  • A row in check_runs can only be created after creating the parent check_suites row.
  • Data is never deleted from check_suites.

Based on this description, an application that runs against a "regular" MySQL primary/replica setup, where connections are load balanced using e.g. haproxy can rely on the fact that it can read from check_runs, and every row will have a reference to a check_suites row that is guaranteed to exist.

The application will have a "time consistent" view of the data, where a read query will always see data at the same point of a previous query, or a later point in time.

When sending requests through vtgate, that expectation is broken if queries are distributed across more than one replica. 😔

The first query that is reading from check_runs might hit replica A, which has almost no replication lag. The second query, reading from check_suites, might hit replica B, which has over a second of replication lag. This can lead to the second query not being able to find any matching row in the check_suites table, which in turn can lead to errors or other weird behaviour in our application.

The application no longer has a "time consistent" view of the data, instead read queries see data at newer and older points in time.

@deepthi
Copy link
Member

deepthi commented Dec 15, 2022

Have you explored using transactional reads? We do support that.

@arthurschreiber
Copy link
Contributor Author

arthurschreiber commented Dec 15, 2022

I'm not sure this would solve the problem we're facing in a satisfactory way. I might be wrong on this, so here's what I think the problems with transaction reads would be.

First, it's unclear what parts of our codebase need to be wrapped in transactions, and with a codebase as large as ours, it'd be a herculean effort to figure that out.

We could wrap every web request that comes in with transaction, but that will lead to gravely reducing the multiplexing done on the vttablet. We'd hit the vttablet connection limits that we've set very quickly, as connections would be "locked" for the duration of that web request. This would be made even worse with scatter or other multi shard queries.

It's also hard for me to estimate what the change in semantics would mean for our application, and which isolation level would be the correct one to use.

Overall, it feels like a much heavier solution to this problem than what we would require. 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants