Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transaction commit failure: avoid retry by default? #22904

Open
Tracked by #22959
Timovzl opened this issue Oct 6, 2020 · 19 comments
Open
Tracked by #22959

Transaction commit failure: avoid retry by default? #22904

Timovzl opened this issue Oct 6, 2020 · 19 comments

Comments

@Timovzl
Copy link

Timovzl commented Oct 6, 2020

With regards to failure on transaction commit, the docs state:

By default, the execution strategy will retry the operation as if the transaction was rolled back, but if it's not the case this will result in an exception if the new database state is incompatible or could lead to data corruption if the operation does not rely on a particular state, for example when inserting a new row with auto-generated key values.

In contrast to other Entity Framework defaults, this seems like an unsafe default.

If a drastic operation results in a failure on commit, which would we prefer?

  1. Risk performing the operation twice, which also implies misinforming the client.
  2. Risk being unable to tell the client what the result was.

Let's say that a client attempts to make a payment. Performing it twice could have bad consequences.

However, if we were to return an internal server error (since we did not catch the exception of this unexpected and highly unlikely failure), then we have technically informed the client that the result is unsure. Experienced developers know this, and can act accordingly. (Sure, some clients may assume that the attempt has failed, but that is a mistake. At least we have been as clear as possible.)

I'm aware that for optimal results we could manually implement verifySucceeded for each method, but that adds a lot of work and code overhead. More importantly, it's no reason to keep us from choosing the best possible defaults!

Should we have retrying execution strategies not retry by default when a failure on commit occurs?

@AndriySvyryd
Copy link
Member

@Timovzl Ideally, yes we shouldn't retry on commit failure, but if you look at https://github.com/dotnet/efcore/blob/b970bf29a46521f40862a01db9e276e6448d3cb0/src/EFCore/Storage/IExecutionStrategy.cs the Commit call is just part of the opaque operation. We could add a transaction-aware interface that would be used in SaveChanges when EF is managing the transaction, but if the user is calling Commit they would still need to be aware of this behavior.

@Timovzl
Copy link
Author

Timovzl commented Oct 7, 2020

@AndriySvyryd I completely forgot - it's the application itself that invokes the commit! Doing something only for the automatic transaction that SaveChanges creates seems too limited.

Could the provider intercept TransactionStarted and TransactionCommitted/TransactionRolledBack/TransactionFailed, and suppress retries in between?

For example, one implementation could be to catch [retryable] exceptions happening during this time, and rethrow them wrapped in an exception of a type that is not retried.

@roji
Copy link
Member

roji commented Oct 7, 2020

@Timovzl what kind of commit errors do you have in mind? As far as I know (coming from PostgreSQL), any server error raised out of commit should mean that the commit failed, rather than unsure - can you point to docs to the contrary? Loss of network connectivity could indeed generate a network error which leaves the commit status uncertain, but that's a different category of errors, and any retries will likely (but not surely) fail.

@Timovzl
Copy link
Author

Timovzl commented Oct 7, 2020

@roji I'm referring specifically to this (emphasis mine):

By default, the execution strategy will retry the operation as if the transaction was rolled back, but if it's not the case this will result in an exception if the new database state is incompatible or could lead to data corruption if the operation does not rely on a particular state, for example when inserting a new row with auto-generated key values.

Indeed, if something happens to the connection while the commit is past the point of no return, then the application will not be aware of the successful commit. As such, with an exception on commit, the application cannot know the state of the transaction in the database. (Granted, certain exception types may indicate that the transaction is definitely not committed.)

I can think of a few things that might break the connection. A severed cable. The firewall. A failing emergency power supply. Maybe a serverless database being moved onto another physical machine (although I strongly hope and suspect that the old instance is shut down cleanly).

[...] and any retries will likely (but not surely) fail.

That's... optimistic! :) If the application does not know whether the transaction was committed or not, retrying is dangerous. Doing so while the transaction was already committed has one of three potential results:

  • Best case, if the operation is idempotent, the retry succeeds and the result is correct.
  • Neutral case, the retry results in an exception.
    • For example, an entity with a code-generated ID is Added a second time.
  • Worst case, the retry results in an unintended effect.
    • For example, a monetary balance was loaded, incremented, and saved. The retry would double the effect.

While the best case is nice, the worst case can be bad for business! It would be much safer if we always got the neutral case (in this very rare fail-on-commit scenario). By not retrying during commits, and instead letting exceptions bubble up, we always get the neutral case.

I'm curious to know if you think suppressing retries between TransactionStarted and TransactionCommitted/TransactionRolledBack/TransactionFailed would do the trick.

@roji
Copy link
Member

roji commented Oct 7, 2020

The main point I was trying to make is the distinction between network errors between the driver (e.g. SqlClient) and the database (e.g. SQL Server) - which can indeed result in unknown commit state - and errors successfully reported from the database to the driver. To the extent of my knowledge, if a database successfully reports an error when commit occurs, the commit failed rather than being in an unsure state (but I'm not SQL Server expert).

In other words, if we special-case commit here in any way, that may be for client-server networking errors only.

@ajcvickers
Copy link
Member

@roji It's probably worth talking to @AndriySvyryd on this. This issue with SqlClient has been investigated extensively, and that's where this guidance comes from.

@AndriySvyryd
Copy link
Member

In other words, if we special-case commit here in any way, that may be for client-server networking errors only.

That would cover the vast majority of transient errors.

@roji
Copy link
Member

roji commented Oct 8, 2020

I think that depends, deadlocks are transient and can happen very frequently in some user scenarios (not sure those can happen on commit though); these can definitely be safely, no?

@AndriySvyryd
Copy link
Member

I think non-connection transient errors on commit are rare enough that we don't have to worry about them. And the best way of handling any commit errors would still be supplying verifySucceeded function.

@Timovzl
Copy link
Author

Timovzl commented Oct 8, 2020

I'm guessing when verifySucceeded is non-null the behavior will stay the same - whether we retry simply depends on its return value?

As for deadlocks, they usually happen earlier than commit time, right?

@AndriySvyryd
Copy link
Member

@Timovzl Yes and yes

@roji
Copy link
Member

roji commented Oct 8, 2020

We should be careful about behavior we assume across databases (e.g. what errors can occur at commit time).

Since we're concentrating on networking transient errors, what would we do if the verifySucceeded function itself fails (because the network isn't working)?

@AndriySvyryd
Copy link
Member

We should be careful about behavior we assume across databases (e.g. what errors can occur at commit time).

Not retrying should still be a safer default for any database.

Since we're concentrating on networking transient errors, what would we do if the verifySucceeded function itself fails (because the network isn't working)?

It's retried according to the current execution strategy. If a fatal exception is thrown it will not be handled.

@roji
Copy link
Member

roji commented Oct 8, 2020

OK, thanks for the explanations @AndriySvyryd.

@AndriySvyryd
Copy link
Member

With #10119 we can continue retrying safely

@stevendarby
Copy link
Contributor

On this point:

I'm aware that for optimal results we could manually implement verifySucceeded for each method, but that adds a lot of work and code overhead.

Would a simple way to do this currently be to implement a simple verifySucceeded that just returns false? And maybe a custom extension to wrap this up for less code overhead?

@Timovzl
Copy link
Author

Timovzl commented Jan 9, 2023

With #10119 we can continue retrying safely

@AndriySvyryd I really wish that were true, but I don't believe it is. IExecutionStrategy executes arbitrary user code, so we cannot make assumptions about what we are retrying.

One (very proper) example of such user code is:

  • Generate a key (in code).
  • Create an entity with the generated key value.
  • Save changes.
  • Commit.

If the commit fails, even if SaveChanges were idempotent, the encompassing execution strategy still must not retry, because we'd still have a duplicate insert on our hands.

@Timovzl
Copy link
Author

Timovzl commented Jan 9, 2023

Would a simple way to do this currently be to implement a simple verifySucceeded that just returns false? And maybe a custom extension to wrap this up for less code overhead?

@AndriySvyryd or @roji, this suggestion looks promising as a workaround. Unfortunately, verifySucceeded appears to get evaluated even if the failure did not occur while committing (judging from ExecutionStrategy and SqlSeverRetryingExecutionStrategy, at least).

Do you see any alternative workarounds?

@AndriySvyryd
Copy link
Member

@AndriySvyryd I really wish that were true, but I don't believe it is. IExecutionStrategy executes arbitrary user code, so we cannot make assumptions about what we are retrying.

You are right. That enhancement would only help the cases where the IExecutionStrategy is used by SaveChanges and doesn't contain user code

@AndriySvyryd or @roji, this suggestion looks promising as a workaround. Unfortunately, verifySucceeded appears to get evaluated even if the failure did not occur while committing (judging from ExecutionStrategy and SqlSeverRetryingExecutionStrategy, at least).

Assuming that the operation failed is dangerous (e.g. when inserting rows with generated keys) that's why we recommend going this route.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants