Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add function for automatic transaction retry #58
Add function for automatic transaction retry #58
Changes from 10 commits
4709682
a784d4b
d5136d2
478534c
0acaf8a
5adc123
db8e8bd
858e5ed
14bd98b
131f285
0a048d2
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
SerializationFailure
is defined to be synonymous with code 40001, which is the condition that the CockroachDB docs say to look for. I think we shouldn't have to inspect the message? (Not sure we shouldn't. But I'm not sure what we should do if we find a different message here.)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to just ignore the error message, and rely on the error code. Not sure what to do if these are inconsistent -- I think that's more a question for "how much do we trust our database", because them being out-of-sync seems like undocumented territory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we want to document here that this must not be used from within a transaction? Or is there some way to ensure that at compile-time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to ensure this at compile-time -- whether or not we're in a transaction is information attached to each connection, but it does not change the underlying "type" of the connection. If we wanted to embed this info here, I think we'd need to:
self
, and creating a new connection type (indicating we're in a transaction)self
again and reduce the transaction levelThat said, we do check for it at run-time, and I have a test that catches this case. I'll add documentation indicating that a runtime error would be returned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CockroachDB docs don't address what happens if this fails. I gather from the docs that by releasing the savepoint, we've already durably committed everything we've done, so even if the COMMIT fails, the operation has happened and I guess we could ignore the error altogether. (It also seems fine to not ignore the error. The only errors I could see here are communication-level ones -- e.g., a broken TCP connection. In those cases the caller will have to assume the operation may have happened.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(added the suggested comment)
Yeah, from reading https://www.cockroachlabs.com/docs/v23.1/commit-transaction , it seems like
COMMIT
"clears the connection to allow new transactions to begin" in the context of client-side transaction retries.This is a really weird case -- I agree that the last
RELEASE
is the thing documented to actually make the transaction durable. I'm truly unsure of whether or not it makes more sense to:I kinda lean towards (1) -- I think this is the case for any DB operation (not just transactions). A request could be sent, processed by the DB, and the response could be lost. If the caller sees a TCP error (or any other "unhandled" error), they may or may not have completed the operation, and so need to accept that the state of the DB is "unknown" and in need of reconciliation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What state are we in at this point? It seems like we failed a
ROLLBACK TO SAVEPOINT
and at that point. The CockroachDB docs don't say anything about errors or failures here and I'm not sure if we're still "in" the nested transaction or what.Might we wind up returning from this function while still in the outermost transaction?
edit: having look at the other thread, I see that
AnsiTransactionManager
tries to deal with stuff like this:https://docs.diesel.rs/master/src/diesel/connection/transaction_manager.rs.html#400-401
Is that behavior a problem for us? e.g., is it possible that we try to do the
rollback to savepoint
, but diesel winds up doing that, it fails, and then it also rolls back theBEGIN
and we've lost track of what depth we're at?This is more of a general comment and I'm not sure what we can do about this: it feels like there's some critical state here (like "transaction depth" and "are we moving forward or backward") along with some guarantees made by various Diesel functions and async-bb8-functions, plus invariants maintained by the loop, etc. But I don't really understand what they all are. I don't think this is an async-bb8-diesel problem per se. But the net result is that as a reader I find it pretty hard to understand what's true at each point and have confidence in the code's correctness.
It might help [me?] to draw a state diagram. With that in hand, I wonder if an implementation could use this (maybe with something like typestates) to be more clearly, structurally correct. This sounds like a lot of work though and if we're all satisfied that this is correct as is it may not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how the
rollback_transaction
method has failed.batch_execute
is called -- which happens for any of these SQL operations (BEGIN, SAVEPOINT, ROLLBACK, COMMIT, etc) -- Diesel callsupdate_transaction_manager_status
using the returned error code, which can possibly update the transaction state: https://docs.diesel.rs/master/src/diesel/pg/connection/mod.rs.html#135set_requires_rollback_maybe_up_to_top_level
(https://docs.diesel.rs/master/src/diesel/connection/transaction_manager.rs.html#147-151) which allows SAVEPOINTS to be released without throwing subsequent errors, until we've finished rolling back to the top-level.So, to answer your original question, "what state are we in", the answer is "it depends".
1
from Diesel's perspective.ROLLBACK
after getting into this error case.In some situations: the transaction manager will be "broken" and we'll expect all further Diesel operations to return: https://docs.diesel.rs/master/diesel/result/enum.Error.html#variant.BrokenTransactionManager
In other situations: the
ROLLBACK
error could have come from cockroachdb directly.The Diesel interface tries really hard to ensure that the "transaction depth" is consistent in the success/failure cases -- basically, regardless of whether "rollback" or "commit" succeeds or fails, there are two outcomes:
set_in_error
). Once this happens, basically all subsequent diesel operations return a "broken transaction manager" error.I don't think so - I think this would violate the aforementioned principle of "only change the transaction depth by at most one level".
I agree that typestates could help here, but I'm not sure how much value they have in
async-bb8-diesel
alone -- it kinda sounds like the property you want extends into Diesel's implementation of transactions too? I have a fear of managing the "transaction depth" insideasync-bb8-diesel
via typestates, while a client using the API could just skip theasync-bb8-diesel
level and send requests directly through Diesel -- which would violate our higher-level expectations.I started to write this up in https://docs.google.com/drawings/d/1JDU5-LsohXP7kEj_9ZFJUz64XyUXNJhZTu4J02sUYV4/edit?usp=sharing -- just trying to explain what Diesel is doing -- but go figure, it's complicated. It wasn't written as a state machine, so this is basically an expansion of code logic into state machine form -- if we want something that looks like a cleaner state machine, we'll need to re-write Diesel's TransactionManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a swing at making a state machine diagram for the CockroachDB client-side retry behavior. I intended to finish it and compare it with the code, but haven't been able to prioritize it. I figured I'd drop it here for posterity in case it's useful in the future. (View source to see the underlyaing mermaid source.)