-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: uncertainty reads under DistSQL don't benefit from read span refresh mechanism #24798
Comments
As the only-visible-to-crdb referenced issue above shows, this is suspected to cause a significant regression in higher percentage select latency on a customer workload. |
Hmm, @andreimatei can we talk about this? Seems like something we should tackle soon. |
Months later, DistSQL reads go through the TxnCoordSender, but the |
Before this patch, races between ingesting leaf txn metadata into the root and the root performing span refreshes could lead to the failure to refresh some spans and thus write skew (see cockroachdb#41173). This patches fixes that by suspending root refreshes while there are leaves in operation - namely while DistSQL flows that use leaves (either remotely or locally) are running. So, with this patch, while a distributed query is running there's going to be no refreshes, but once it finishes and all leaf metadata has been collected, refreshes are enabled again. Refreshes are disabled at different levels depending on the reason: - they're disabled at the DistSQLPlanner.Run() level for distributed queries - they're disabled at the FlowBase level for flows that use leaves because of concurrency between Processors - they're disabled at the vectorizedFlow level for vectorized flows that use leaves internal in their operators The former two bullets build on facilities built in the previous commit for detecting concurrency within flows. Fixes cockroachdb#41173 Touches cockroachdb#24798 Release justification: bug fix Release note (bug fix): Fix a bug possibly leading to write skew after distributed queries (cockroachdb#41173).
What do you do with a filter, an aggregation or an anti-join, where the row carrying the tag is filtered out? |
I would carry forward the tag even when a row is filtered by infecting all the next rows. I'd have each processor keep track of the highest timestamp that any of its input rows have been tagged with, and I'd tag every output row with that (and also tag the "absence of any output rows" by including this timestamp in each processor's trailing metadata (collected when processors drain). I think that works? Now that I think about it again, I'm not sure why I phrased this as "tagging rows" rather than describing it in terms of broadcasting metadata and taking advantage of the DistSQL ordered message streams: processors that do KV operations (TableReader, IndexJoiner, etc) would notice when a scan they've done was actually performed at a new (higher) timestamp and would broadcast this information to all their consumers as a |
A distsql processor can have no output row. Indeed it seems like something that's not part of the flow but instead part of the "metadata" (I really think that this word "metadata" is really bad and should never have been used. The better abstraction is a difference between control plane and data plane. You're playing with the control plane here regardless of what flows data-wise.) |
There's another challenge in there though. Suppose you have two concurrent processors A and B. Processor A fails with a logic error (says some SQL built-in function errors out). Today the repatriation of the "metadata" payload will cause the logic error to cancel out whatever result comes from B. That would trash the information bits needed in your algorithm. If we ever implement savepoint rollbacks in combination with txn refreshes, it's important that the magic that you want to implement does not get invalidated by such a logic error. |
We have marked this issue as stale because it has been inactive for |
We have marked this issue as stale because it has been inactive for |
seems still relevant |
When a regular Scan encounters a
ReadWithinUncertaintyInterval
error, theTxnCoordSender
will immediately try to refresh the txn's read spans and, if successful, retry the batch. This doesn't apply to DistSQL reads which don't go through theTxnCoordSender
.We should figure out another level at which to retry.
Separately, if the whole flow is scheduled on the gateway, everything could go through the
TxnCoordSender
, I think.Jira issue: CRDB-5744
The text was updated successfully, but these errors were encountered: