Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: Client-side retries with WriteTooOldError in UPDATE #38591

Closed
bdarnell opened this issue Jul 1, 2019 · 7 comments
Closed

sql: Client-side retries with WriteTooOldError in UPDATE #38591

bdarnell opened this issue Jul 1, 2019 · 7 comments
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale

Comments

@bdarnell
Copy link
Contributor

bdarnell commented Jul 1, 2019

WriteTooOldErrors are handled specially: We try to delay returning this error until the end of the transaction in order to lay down more intents and do more discovery of conflicting timestamps. However, this delay undermines the SQL-level "rewind" functionality which can transparently retry the first statement in a transaction.

We have a strong preference for transparent retries whenever possible. In #22234, we disabled the delayed WriteTooOldError behavior for CPut and Increment, because we know that a transparent RefreshSpans will fail, and if we delay the WriteTooOldError to COMMIT, it will trigger a client-visible retry. This had the effect of ensuring that if the first statement was an INSERT, we'd retry internally, but if it was an UPDATE (including an UPDATE clause in an INSERT ON CONFLICT) we may trigger a retry at COMMIT time.

I think that #22234 didn't go far enough: Nearly all Put operations currently originated by SQL are logically "conditional puts" that are done in separate Get and Put steps. The RefreshSpans is guaranteed to fail just as it is for CPut. We should be returning WriteTooOldError immediately for these regular puts too. Doing this for all Puts effectively disables the "delayed WTOE" behavior, although I think Puts where the delayed WTOE would be beneficial are rare in our current usage.

I also think that #22234 went too far in making WTOE immediate for all CPuts. In the first statement of a transaction, an immediate WTOE maximizes our chance of a transparent retry. But in later statements, CPuts as well as regular Puts benefit from continuing to lay down intents. (This would prevent the O(n^2) behavior from #22234 (comment). In scenarios with high contention on multiple keys, we currently have quadratic behavior for operations using CPut but better performance for operations using Put)

I suggest addressing this in two parts. First, disable the "delayed WTOE" behavior for Puts (and probably just for everything). I think this is probably backportable (although it has the risk of making the aforementioned. O(n^2) behavior more likely for high-contention UPDATEs). Second, introduce a new flag to be plumbed down from SQL to indicate whether we are currently able to autoretry an immediate WTOE, and use this flag instead of the type of request to decide whether WTOEs are immediate or delayed. (I'll only do the first part myself; I need to hand off the latter part to someone else).

@awoods187 awoods187 added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 2, 2019
@awoods187 awoods187 added the A-kv Anything in KV that doesn't belong in a more specific category. label Jul 2, 2019
@awoods187
Copy link
Contributor

cc @andy-kimball i think we should try to get this done for 19.2

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 2, 2019

The first part of this is trivial and I intend to get it into 19.1.3. I'm not sure the second part is worth trying to get into 19.2 at this stage in the development cycle.

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 3, 2019

The fix is in #38668. After working through the tests I'm a little less eager to get this into 19.1.3. I think it's probably still safe but I missed a few things on my initial implementation so it might be better to wait.

bdarnell added a commit to bdarnell/cockroach that referenced this issue Jul 3, 2019
CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates cockroachdb#38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).
@awoods187
Copy link
Contributor

Second, introduce a new flag to be plumbed down from SQL to indicate whether we are currently able to autoretry an immediate WTOE, and use this flag instead of the type of request to decide whether WTOEs are immediate or delayed

@bdarnell could you clarify how big the work effort is for the second part?

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 8, 2019

I think it's fairly small, probably a couple days of work. But testing it may be tricky and it may get more complex than expected once you start pulling on that thread.

@bdarnell
Copy link
Contributor Author

bdarnell commented Jul 9, 2019

Copying this from #38668 (review)

And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old flag, just assume that the refresh will fail - which I guess would keep the write_too_old field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.

if !os.ImplicitTxn.Get() && txn.IsSerializablePushAndRefreshNotPossible() {
rc, canAutoRetry := ex.getRewindTxnCapability()
if canAutoRetry {
ev := eventRetriableErr{
IsCommit: fsm.FromBool(isCommit(stmt.AST)),
CanAutoRetry: fsm.FromBool(canAutoRetry),
}
txn.ManualRestart(ctx, ex.server.cfg.Clock.Now())
payload := eventRetriableErrPayload{
err: roachpb.NewTransactionRetryWithProtoRefreshError(
"serializable transaction timestamp pushed (detected by connExecutor)",
txn.ID(),
// No updated transaction required; we've already manually updated our
// client.Txn.
roachpb.Transaction{},
),
rewCap: rc,
}
return ev, payload, nil
}
}

bdarnell added a commit to bdarnell/cockroach that referenced this issue Jul 9, 2019
CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates cockroachdb#38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).
craig bot pushed a commit that referenced this issue Jul 9, 2019
38668: storage: Make Put handle WriteTooOldError more like CPut r=nvanbenschoten,knz a=bdarnell

CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates #38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).

Co-authored-by: Ben Darnell <[email protected]>
bdarnell added a commit to bdarnell/cockroach that referenced this issue Jul 29, 2019
CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.

The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)

A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).

Updates cockroachdb#38591

Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).
@github-actions
Copy link

github-actions bot commented Jun 4, 2021

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv Anything in KV that doesn't belong in a more specific category. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) no-issue-activity X-stale
Projects
None yet
Development

No branches or pull requests

2 participants