-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nexus should retry all transactions CRDB tells it to #3814
Comments
I think this is related to #973, which talks about having a way to put a bunch of statements into a non-interactive transaction. CockroachDB calls this batched statements. This allows CockroachDB to retry such errors automatically. This would have all the benefits mentioned in RFD 192 about avoiding interactive transactions, plus I'd expect it to minimize the window for transaction conflicts (since it avoids a round-trip to Nexus, plus all the time associated with doing the client-side retry). |
@smklein raised a good question about doing client-side retries in the meantime, since it doesn't seem like we'll have the foundation we need for batched transactions like this that soon. That'd be a useful stopgap but I'm not sure how that's possible. (I don't know it's impossible either, but I don't see how to do it.) |
Following up on @davepacheco 's comment:
|
https://www.cockroachlabs.com/docs/v23.1/advanced-client-side-transaction-retries seems a bit opinionated about how transactions should work, in a way that's slightly divergent from Diesel. CRDB recommends the following:
Diesel implements the This https://docs.diesel.rs/master/diesel/connection/trait.TransactionManager.html is a trait, admittedly, so it does seem possible for us to "roll our own" version of it? This could support existing transactions, but also add a "retryable cockroachdb transaction" method that is This would have an advantage over the "hand-rolled-retry", e.g. in omicron/nexus/db-queries/src/db/datastore/region.rs Lines 169 to 173 in d300fb8
Pros of using CRDB savepoints / retries:
Cons of using CRDB savepoints / retries:
|
Integrates automatic transaction retry into Nexus for most transactions. Additionally, this PR provides a "RetryHelper" object to help standardize how transaction retry is performed. Currently, after a short randomized wait (up to an upper bound), we retry unconditionally, emitting each attempt to Oximeter for further analysis. - [x] Depends on oxidecomputer/async-bb8-diesel#58 - [x] As noted in oxidecomputer/async-bb8-diesel#58, this will require customizing CRDB session variables to work correctly. (Edit: this is done on each transaction) Part of oxidecomputer/customer-support#46 Part of #3814
On rack2 which is on omicron commit
There isn't much more information to go about in the saga events:
|
The error has more to it. I filed #4662 to provide more details. |
@askfongjojo : Closing the loop, that does not look like a transaction retry error, though it does look like a bug. I'll follow-up on #4662 |
Correct. I tested concurrent disk provisioning/deprovisioning further over the weekend and has not run into any problem so far. |
This has tested successfully, will open new tickets if we hit new or edge cases |
#3754 adds a
retry_transaction
function onTransactionError
that returnstrue
if CRDB is telling us to, based on the instructions at https://www.cockroachlabs.com/docs/v23.1/transaction-retry-error-reference#client-side-retry-handling.Part of the review of that PR asked: why not to this for all transactions? If CRDB is telling us that we should be retrying a transaction, then we should. This issue tracks that work. Retrying in general seems like a safe thing to do but part of this issue should be convincing ourselves that we're not creating unbounded work for CRDB by retrying every transaction until it succeeds. Note this may also be impossible
The text was updated successfully, but these errors were encountered: