-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Make Put handle WriteTooOldError more like CPut #38668
Conversation
cb1d3be
to
58dcc63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 10 of 10 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell)
pkg/roachpb/api.go, line 111 at r1 (raw file):
} // IsReadAndWrite returns truee if the request both reads and writes
true
pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):
// retry is inevitable, it's probably best to continue and lay down // as many intents as possible before that retry (this can avoid n^2 // behavior in some scenarios with high contention on multiple keys,
This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.
pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):
// // TODO(bdarnell): Plumb the SQL CanAutoRetry field through to // !ba.Header.DeferWriteTooOldError.
Are you planning on addressing this immediately?
Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.
pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):
} // If there was an EndTransaction in the batch that finalized the transaction,
How was this behaving correctly before?
pkg/storage/replica_test.go, line 4171 at r1 (raw file):
} // Send a put for keyB; by default this fails with a WriteTooOldError
Missing a period at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestDropWhileBackfill is failing consistently in CI with this change but I haven't been able to reproduce it locally; I'm still investigating.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
This assumes that transactions are accessing these multiple keys in random orders. In the common case where transactions are highly contended but each touch keys in the same order (e.g. tpcc's newOrder txn), this won't cause n^2 behavior. That might be worth mentioning.
If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.
pkg/storage/replica_evaluate.go, line 221 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are you planning on addressing this immediately?
Another option which is simpler but also loses some precision is to base this flag off of the transaction's sequence number. If the sequence number is zero before the current batch, we know that this is the first writing batch of the epoch.
Not immediately; see #35891. My plan is to get this in now and consider it for backporting to 19.1.x, but the plumbing changes would probably not be backportable (and I think they're lower priority - I wouldn't push for their addition to the 19.2 roadmap at this late date).
pkg/storage/replica_evaluate.go, line 335 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How was this behaving correctly before?
I think it wasn't, but we weren't ever reaching a case where it matters. The test that fails is this one which sends a Put in the same batch as an EndTransaction. This didn't try to return the WriteTooOldError until this commit.
It's tricky to get into the situation where this matters (a write in the same batch as an EndTransaction, and NoRefreshSpans is true). 1PC write-only transactions have their EndTransactions elided. Multi-range transactions never have anything in the same batch as their EndTransaction. Single-range multi-step transactions will usually have refresh spans. SQL never uses Put without refresh spans, so you couldn't get into the erroneous case here in practice before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once the test failure is resolved.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @bdarnell)
pkg/storage/replica_evaluate.go, line 209 at r1 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
If it's random order I wouldn't be too concerned about this since those transactions would also be very deadlock-prone. But I think the O(n^2) behavior can still manifest even if transactions are accessing their keys in order. Suppose txnA accesses keys that are multiples of two and txnB accesses keys that are multiples of 3. If the txns run to the end before the restart, they'll know that they collide on all the multiples of 6 one of the transactions will succeed on the second attempt. If we restart as soon as the conflict is detected, they could restart once for key 6, once for key 12, etc.
By "same order" I meant that conflicting transactions would be almost identical to each other. I guess it's not worth discussing in the comment though.
58dcc63
to
cc485c8
Compare
OK, the new commit appears to fix the test flakiness. @knz could you please take a look at the error changes? The error handling issues appear to be absent in the 19.1 branch, so if we backport this PR we'd only take the original commit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this partial revert of #35821 is adequate. Back when I implemented #35821 we did not have an adequate way to wrap errors properly. Now that we do, this can be used.
I still think that encountering a retry error in some of the places listed here was not envisioned in the design of the surrounding code (thinking about schema changes specifically, but now that I think of it, a retry during a backup is also suspicious - why is this not accessing KV at a fixed timestamp in the past?) and letting that retry error flow through without further investigation is ... worrying.
@andreimatei what do you think?
I don't have any evidence that the backup cases are actually problematic. The ones that cause problems I've seen are the ones related to schema job status, which perform UPDATE queries. These queries would previously never return WriteTooOldError; the error would be deferred to the end of the transaction (where it wouldn't get wrapped). This PR makes the UPDATE statement return the WTOE immediately. (but even before this PR, other retriable errors were possible from the UPDATE statement) |
thanks for explaining 👍 |
bors r=nvanbenschoten,knz |
Build failed (retrying...) |
Build failed (retrying...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any interest in addressing this TODO about not always refreshing on WriteTooOldErrors
?
Line 1246 in d024107
// TODO(andrei): Chances of success for on write-too-old conditions might be |
If we're making WTOE more eager, should we also make regular timestamp pushes generate serialization errors more eagerly? In fact, can we get rid of the write_too_old
field on the txn? Since we got rid of snapshot isolation, this field doesn't add anything to the pushed timestamp, does it?
And FYI, we also have this code. It detects pushes while the txn can still auto-retry, but it doesn't do anything unless there's no chance for the refresh to succeed. Maybe instead of this patch we could improve that code - have it actually try a refresh (or, in the case of a write_too_old
flag, just assume that the refresh will fail - which I guess would keep the write_too_old
field somewhat useful). This way KV could continue to defer WTOE (and we would also defer them for CPut) and SQL would be in charge of deciding when deferring is no longer a good idea.
cockroach/pkg/sql/conn_executor_exec.go
Lines 425 to 445 in d024107
if !os.ImplicitTxn.Get() && txn.IsSerializablePushAndRefreshNotPossible() { | |
rc, canAutoRetry := ex.getRewindTxnCapability() | |
if canAutoRetry { | |
ev := eventRetriableErr{ | |
IsCommit: fsm.FromBool(isCommit(stmt.AST)), | |
CanAutoRetry: fsm.FromBool(canAutoRetry), | |
} | |
txn.ManualRestart(ctx, ex.server.cfg.Clock.Now()) | |
payload := eventRetriableErrPayload{ | |
err: roachpb.NewTransactionRetryWithProtoRefreshError( | |
"serializable transaction timestamp pushed (detected by connExecutor)", | |
txn.ID(), | |
// No updated transaction required; we've already manually updated our | |
// client.Txn. | |
roachpb.Transaction{}, | |
), | |
rewCap: rc, | |
} | |
return ev, payload, nil | |
} | |
} |
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @bdarnell and @nvanbenschoten)
Build failed (retrying...) |
Build failed (retrying...) |
Build failed |
Probably not. Regular timestamp pushes can usually be handled transparently with RefreshSpans.
It's hard to get rid of it because we can't write an intent and return an error at the same time, so we need the txn flag to pass this status out of the replica evaluation level. Also, the additional work contemplated in #38591, deferred WTOEs would come back.
Ah, that sounds like a better idea than plumbing this flag down the stack. I'll look into it. |
The AssertionError wrappers hid the retryable nature of the errors, causing failures in cases that should have been retried. Most of the affected wrappers were changed from Wrapf to NewAssertionError in cockroachdb#35821. I looked at all uses of NewAssertionErrorWithWrappedErrf and changed all the ones where a client.Txn was in scope unless they were clearly not going to be txn-related (for example, proto unmarshaling errors). Release note (bug fix): Transaction retries in schema changes are again handled correctly.
CPut (used by INSERT) and Put (used by UPDATE) previously handled WriteTooOldErrors differently: CPut would return them immediately while Put would defer them to the end of the transaction. This commit makes Put work (mostly) like CPut. The major benefit of this change is that if the first statement of a transaction is an UPDATE, it will no longer be possible for it to return transaction-retry errors due to SQL-level automatic retries. (This was already true for INSERT. Note that an ON CONFLICT DO UPDATE clause acts like a standalone UPDATE in this respect) A potential downside is that UPDATE-heavy workloads experiencing high contention on many keys may have worse performance (up to O(n^2)). Updates cockroachdb#38591 Release note (sql change): The first statement of a transaction will no longer return a transaction-retry error if it is an UPDATE or DELETE (this was already true for INSERT).
cc485c8
to
4ee7b8a
Compare
bors r=nvanbenschoten,knz Merge skew on a proto file leading to conflicts in the c++ generated code. |
38668: storage: Make Put handle WriteTooOldError more like CPut r=nvanbenschoten,knz a=bdarnell CPut (used by INSERT) and Put (used by UPDATE) previously handled WriteTooOldErrors differently: CPut would return them immediately while Put would defer them to the end of the transaction. This commit makes Put work (mostly) like CPut. The major benefit of this change is that if the first statement of a transaction is an UPDATE, it will no longer be possible for it to return transaction-retry errors due to SQL-level automatic retries. (This was already true for INSERT. Note that an ON CONFLICT DO UPDATE clause acts like a standalone UPDATE in this respect) A potential downside is that UPDATE-heavy workloads experiencing high contention on many keys may have worse performance (up to O(n^2)). Updates #38591 Release note (sql change): The first statement of a transaction will no longer return a transaction-retry error if it is an UPDATE or DELETE (this was already true for INSERT). Co-authored-by: Ben Darnell <[email protected]>
Build succeeded |
…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (SQL change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.
…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (sql change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.
…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (sql change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.
…oo-old Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before cockroachdb#38668. This patch retains the goal of cockroachdb#38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches cockroachdb#44653 Release note (sql change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts.
44654: storage: leave intents behind after blind-writes experiencing write-too-old r=andreimatei a=andreimatei Before this patch, any write running into a write-too-old condition resulted in a WriteTooOldError being returned by the server. Returning an error implies that no intents are left behind. This is unfortunate; we'd like to leave intents (or, in the future, other types of locks) behind so keep away other transactions. We've observed this resulting in the starvation of a class of transactions in a user's workload. This patch makes it so that blind writes (i.e. Puts - used by UPDATE, not CPuts) don't return WriteTooOldErrors any more. Instead, they return the a txn proto with the WriteTooOld flag set. This is the behavior they had before #38668. This patch retains the goal of #38668, however: the client now eagerly refreshes the transactions when it sees a WriteTooOld flag, and if the refresh succeeds, it returns a WriteTooOldError to the higher layers (SQL), allowing for automatic retries where applicable. Unfortunately, CPuts (used by INSERT) continue to return WriteTooOldErrors without leaving locks behind. Dealing with them requires more tenderness because they imply a read, and the timestamp of a read cannot be bumped as easily as that of a write. Touches #44653 Release note (SQL change): UPDATEs returning a serialization failure error (code 40001) now leave behind a lock, helping the transaction succeed if it retries. This prevents starvation of transactions whose UPDATEs are prone to conflicts. Co-authored-by: Andrei Matei <[email protected]>
CPut (used by INSERT) and Put (used by UPDATE) previously handled
WriteTooOldErrors differently: CPut would return them immediately
while Put would defer them to the end of the transaction. This commit
makes Put work (mostly) like CPut.
The major benefit of this change is that if the first statement of a
transaction is an UPDATE, it will no longer be possible for it to
return transaction-retry errors due to SQL-level automatic retries.
(This was already true for INSERT. Note that an ON CONFLICT DO UPDATE
clause acts like a standalone UPDATE in this respect)
A potential downside is that UPDATE-heavy workloads experiencing high
contention on many keys may have worse performance (up to O(n^2)).
Updates #38591
Release note (sql change): The first statement of a transaction will
no longer return a transaction-retry error if it is an UPDATE or
DELETE (this was already true for INSERT).