-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: 1PC transactions can be applied twice #10023
Comments
This seems error prone as the code evolves.
Ditto.
1PC transactions are only allowed for write-only operations, correct? So what do we need to recover other than whether the transaction committed or not? I'm not seeing what about this problem is specific to 1PC transactions. If a KV batch contains a |
This one doesn't seem that bad to me. We just need two kinds of RPC errors instead of one, and use one for getting an rpc connection and one for an error sending the request. The question here is whether that is precise enough for us or does the GRPC abstraction prevent us from seeing the distinction we want.
Yep, this seems very tricky and fragile.
It's legal at the KV layer to include a
|
How would this be used? Would you propagate the error all the way back to the client in the case of an unknown transaction disposition?
I thought we used this in formatting the error when writing to a unique index, but I can't find any code in sql-land which needs the actual value.
I think a first step is to try and write some tests to replicate this problem. The 1PC issue seems obvious and easily testable. And we could see if multi-phase transactions have a related problem. |
Yeah, I think so. If a request that does not contain EndTransaction ends with an unknown disposition, the |
Ok, maybe this isn't as fragile as I was imagining. @tamird Do we get a good signal from GRPC for when an RPC failed because the remote is down (and definitely didn't receive the RPC) vs other errors? |
We definitely get that signal from our own circuit breaker. We might have to err on the side of caution for the request that trips the breaker and consider it uncertain if we can't tell for sure what grpc's error means. |
Cc @cockroachdb/stability |
From the perspective of a client implementer and user, I think it's a great idea to classify every error in your client as known-failed vs indeterminate--maybe via a type (KnownFailure vs MaybeFailure) or a function (indeterminate? err). This is something every application developer has to worry about, so it makes sense to propagate all the way from internals to users. :) |
Note that @aphyr, I think the philosophy is that, from a client's perspective, network errors on a |
I assume this happened in practice with Jepsen testing? @bdarnell, I wouldn't expect a |
We strip the begin/end txn requests in |
The timestamp cache is populated at a higher level so it would see the EndTransaction, and while the next retry does not execute A quick unit test could really answer what happens here without any guessing. |
Agreed about writing a test. Spencer is on it. Spencer pointed out another issue. |
I see what's happening. The timestamp cache isn't being updated after success within raft in the event that the caller context is cancelled. This is a major correctness issue. The addition of context cancellations introduced a bug here, but it should be easy to fix. What this means is that Ben's case will now result in a However, if the leadership changes, then the replay won't be able to return the |
Yep! Here are two example failure cases, which caused single-statement inserts of unique values into a table to result in duplicate values. |
@tschottdorf my analysis was not correct. The So we're back to the more fundamental problem and unknown solution. |
Do you have the unit test? |
I abandoned the one I was working on because it was too close to the I spent a lot of time last night mulling this over and built a "txn cache" option in the code to measure performance implications. When running block writer on my local machine, performance seems to suffer by ~4%. |
I don't see what's wrong with a test that's not more complicated than it needs to be, at least to get a basic idea. |
I think the test needs to work using the distributed sender. |
Why? There's certainly a need to validate that whatever the fix is works end-to-end, but imo the meat is understanding what happens on the Replica, not orchestrating a complex test across most of the stack right away. |
Closed by #10207. |
When a 1PC transaction is retried due to a network failure, the second attempt may return an error even if the first attempt succeeded (and the successful result was masked by the network failure). The most likely error to be seen in this case is
WriteTooOldError
(caused by the transaction seeing its own past write, which by now has been committed, resolved, and scrubbed of its transaction ID), which is atransactionRestartError
. Upon restart, the transaction gets a new timestamp and may perform different writes, which may succeed on a subsequent attempt, leading to the same statement applying twice.This is the more insidious cousin of #6053 and #7604. Those issues are about the failure leaking back to the client; this one is about the fact that if the operation is retried in some situations, it could succeed.
Here is one concrete case in which the error can occur:
rowid
column) and has no secondary indexesThe response cache (removed in #3077 because it was too expensive and retries of non-transactional requests were deemed less important. The 1PC optimization makes its requests effectively non-transactional at the KV layer) handled this by remembering the response so the retry would get the same response.
A few possible solutions that don't require removing the 1PC optimization or bringing back the response cache:
The text was updated successfully, but these errors were encountered: