Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distsql: uncertainty reads under DistSQL don't benefit from read span refresh mechanism #24798

Open
andreimatei opened this issue Apr 13, 2018 · 11 comments
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team

Comments

@andreimatei
Copy link
Contributor

andreimatei commented Apr 13, 2018

When a regular Scan encounters a ReadWithinUncertaintyInterval error, the TxnCoordSender will immediately try to refresh the txn's read spans and, if successful, retry the batch. This doesn't apply to DistSQL reads which don't go through the TxnCoordSender.
We should figure out another level at which to retry.
Separately, if the whole flow is scheduled on the gateway, everything could go through the TxnCoordSender, I think.

Jira issue: CRDB-5744

@andreimatei andreimatei added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Apr 13, 2018
@andreimatei andreimatei added this to the 2.1 milestone Apr 13, 2018
@andreimatei andreimatei self-assigned this Apr 13, 2018
@andreimatei andreimatei added A-kv-client Relating to the KV client and the KV interface. A-sql-execution Relating to SQL execution. C-performance Perf of queries or internals. Solution not expected to change functional behavior. labels May 4, 2018
@andreimatei
Copy link
Contributor Author

As the only-visible-to-crdb referenced issue above shows, this is suspected to cause a significant regression in higher percentage select latency on a customer workload.

@jordanlewis
Copy link
Member

Hmm, @andreimatei can we talk about this? Seems like something we should tackle soon.

@tbg tbg removed the A-kv-client Relating to the KV client and the KV interface. label Aug 21, 2018
@jordanlewis jordanlewis self-assigned this Aug 21, 2018
@jordanlewis jordanlewis modified the milestones: 2.1, 2.2 Sep 26, 2018
@petermattis petermattis removed this from the 2.2 milestone Oct 5, 2018
@andreimatei
Copy link
Contributor Author

Months later, DistSQL reads go through the TxnCoordSender, but the txnSpanRefresher is neutered.
Remote flows do return their read spans to the gateway so, amusingly, the root can attempt to refresh if an error is encountered later by some other query, but not when an error is encountered by DistSQL itself.

@jordanlewis jordanlewis removed their assignment Feb 15, 2019
andreimatei added a commit to andreimatei/cockroach that referenced this issue Sep 30, 2019
Before this patch, races between ingesting leaf txn metadata into the
root and the root performing span refreshes could lead to the failure to
refresh some spans and thus write skew (see cockroachdb#41173).
This patches fixes that by suspending root refreshes while there are
leaves in operation - namely while DistSQL flows that use leaves (either
remotely or locally) are running. So, with this patch, while a
distributed query is running there's going to be no refreshes, but once
it finishes and all leaf metadata has been collected, refreshes are
enabled again.

Refreshes are disabled at different levels depending on the reason:
- they're disabled at the DistSQLPlanner.Run() level for distributed
queries
- they're disabled at the FlowBase level for flows that use leaves
because of concurrency between Processors
- they're disabled at the vectorizedFlow level for vectorized flows that
use leaves internal in their operators

The former two bullets build on facilities built in the previous commit
for detecting concurrency within flows.

Fixes cockroachdb#41173
Touches cockroachdb#24798

Release justification: bug fix

Release note (bug fix): Fix a bug possibly leading to write skew after distributed
queries (cockroachdb#41173).
@knz
Copy link
Contributor

knz commented Nov 25, 2019

The only way I see is by attaching a token to every row flowing through DistSQL, tracking the highest timestamp of a read that contributed to that row.

What do you do with a filter, an aggregation or an anti-join, where the row carrying the tag is filtered out?

@andreimatei
Copy link
Contributor Author

I would carry forward the tag even when a row is filtered by infecting all the next rows. I'd have each processor keep track of the highest timestamp that any of its input rows have been tagged with, and I'd tag every output row with that (and also tag the "absence of any output rows" by including this timestamp in each processor's trailing metadata (collected when processors drain). I think that works?

Now that I think about it again, I'm not sure why I phrased this as "tagging rows" rather than describing it in terms of broadcasting metadata and taking advantage of the DistSQL ordered message streams: processors that do KV operations (TableReader, IndexJoiner, etc) would notice when a scan they've done was actually performed at a new (higher) timestamp and would broadcast this information to all their consumers as a ProducerMetadata before sending any more rows to any consumer (the "all consumers" part is important; for example a hash router would send this along on all its output streams). Then, every other processor would respect the convention that such a metadata record is forwarded immediately (as opposed to how we currently handle metadata by deferring its forwarding to later).

@knz
Copy link
Contributor

knz commented Nov 25, 2019

A distsql processor can have no output row. Indeed it seems like something that's not part of the flow but instead part of the "metadata"

(I really think that this word "metadata" is really bad and should never have been used. The better abstraction is a difference between control plane and data plane. You're playing with the control plane here regardless of what flows data-wise.)

@knz
Copy link
Contributor

knz commented Nov 25, 2019

There's another challenge in there though. Suppose you have two concurrent processors A and B.

Processor A fails with a logic error (says some SQL built-in function errors out).
Concurrently B is scanning ahead some data.

Today the repatriation of the "metadata" payload will cause the logic error to cancel out whatever result comes from B. That would trash the information bits needed in your algorithm.

If we ever implement savepoint rollbacks in combination with txn refreshes, it's important that the magic that you want to implement does not get invalidated by such a logic error.

@github-actions
Copy link

github-actions bot commented Jun 6, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
10 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 10, 2023
@knz
Copy link
Contributor

knz commented Oct 10, 2023

seems still relevant

@knz knz reopened this Oct 10, 2023
@knz knz added T-sql-queries SQL Queries Team and removed X-stale T-sql-queries SQL Queries Team no-issue-activity labels Oct 10, 2023
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Oct 10, 2023
@yuzefovich yuzefovich moved this from Triage to New Backlog in SQL Queries Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) C-performance Perf of queries or internals. Solution not expected to change functional behavior. T-sql-queries SQL Queries Team
Projects
Status: Backlog
Development

No branches or pull requests

6 participants