Transaction commit failure: avoid retry by default? #22904

Timovzl · 2020-10-06T13:44:32Z

With regards to failure on transaction commit, the docs state:

By default, the execution strategy will retry the operation as if the transaction was rolled back, but if it's not the case this will result in an exception if the new database state is incompatible or could lead to data corruption if the operation does not rely on a particular state, for example when inserting a new row with auto-generated key values.

In contrast to other Entity Framework defaults, this seems like an unsafe default.

If a drastic operation results in a failure on commit, which would we prefer?

Risk performing the operation twice, which also implies misinforming the client.
Risk being unable to tell the client what the result was.

Let's say that a client attempts to make a payment. Performing it twice could have bad consequences.

However, if we were to return an internal server error (since we did not catch the exception of this unexpected and highly unlikely failure), then we have technically informed the client that the result is unsure. Experienced developers know this, and can act accordingly. (Sure, some clients may assume that the attempt has failed, but that is a mistake. At least we have been as clear as possible.)

I'm aware that for optimal results we could manually implement verifySucceeded for each method, but that adds a lot of work and code overhead. More importantly, it's no reason to keep us from choosing the best possible defaults!

Should we have retrying execution strategies not retry by default when a failure on commit occurs?

The text was updated successfully, but these errors were encountered:

AndriySvyryd · 2020-10-06T21:20:22Z

@Timovzl Ideally, yes we shouldn't retry on commit failure, but if you look at https://github.com/dotnet/efcore/blob/b970bf29a46521f40862a01db9e276e6448d3cb0/src/EFCore/Storage/IExecutionStrategy.cs the Commit call is just part of the opaque operation. We could add a transaction-aware interface that would be used in SaveChanges when EF is managing the transaction, but if the user is calling Commit they would still need to be aware of this behavior.

Timovzl · 2020-10-07T09:11:00Z

@AndriySvyryd I completely forgot - it's the application itself that invokes the commit! Doing something only for the automatic transaction that SaveChanges creates seems too limited.

Could the provider intercept TransactionStarted and TransactionCommitted/TransactionRolledBack/TransactionFailed, and suppress retries in between?

For example, one implementation could be to catch [retryable] exceptions happening during this time, and rethrow them wrapped in an exception of a type that is not retried.

roji · 2020-10-07T14:00:03Z

@Timovzl what kind of commit errors do you have in mind? As far as I know (coming from PostgreSQL), any server error raised out of commit should mean that the commit failed, rather than unsure - can you point to docs to the contrary? Loss of network connectivity could indeed generate a network error which leaves the commit status uncertain, but that's a different category of errors, and any retries will likely (but not surely) fail.

Timovzl · 2020-10-07T14:37:49Z

@roji I'm referring specifically to this (emphasis mine):

By default, the execution strategy will retry the operation as if the transaction was rolled back, but if it's not the case this will result in an exception if the new database state is incompatible or could lead to data corruption if the operation does not rely on a particular state, for example when inserting a new row with auto-generated key values.

Indeed, if something happens to the connection while the commit is past the point of no return, then the application will not be aware of the successful commit. As such, with an exception on commit, the application cannot know the state of the transaction in the database. (Granted, certain exception types may indicate that the transaction is definitely not committed.)

I can think of a few things that might break the connection. A severed cable. The firewall. A failing emergency power supply. Maybe a serverless database being moved onto another physical machine (although I strongly hope and suspect that the old instance is shut down cleanly).

[...] and any retries will likely (but not surely) fail.

That's... optimistic! :) If the application does not know whether the transaction was committed or not, retrying is dangerous. Doing so while the transaction was already committed has one of three potential results:

Best case, if the operation is idempotent, the retry succeeds and the result is correct.
Neutral case, the retry results in an exception.
- For example, an entity with a code-generated ID is Added a second time.
Worst case, the retry results in an unintended effect.
- For example, a monetary balance was loaded, incremented, and saved. The retry would double the effect.

While the best case is nice, the worst case can be bad for business! It would be much safer if we always got the neutral case (in this very rare fail-on-commit scenario). By not retrying during commits, and instead letting exceptions bubble up, we always get the neutral case.

I'm curious to know if you think suppressing retries between TransactionStarted and TransactionCommitted/TransactionRolledBack/TransactionFailed would do the trick.

roji · 2020-10-07T15:09:30Z

The main point I was trying to make is the distinction between network errors between the driver (e.g. SqlClient) and the database (e.g. SQL Server) - which can indeed result in unknown commit state - and errors successfully reported from the database to the driver. To the extent of my knowledge, if a database successfully reports an error when commit occurs, the commit failed rather than being in an unsure state (but I'm not SQL Server expert).

In other words, if we special-case commit here in any way, that may be for client-server networking errors only.

ajcvickers · 2020-10-07T15:14:52Z

@roji It's probably worth talking to @AndriySvyryd on this. This issue with SqlClient has been investigated extensively, and that's where this guidance comes from.

AndriySvyryd · 2020-10-07T23:31:08Z

In other words, if we special-case commit here in any way, that may be for client-server networking errors only.

That would cover the vast majority of transient errors.

roji · 2020-10-08T05:55:10Z

I think that depends, deadlocks are transient and can happen very frequently in some user scenarios (not sure those can happen on commit though); these can definitely be safely, no?

AndriySvyryd · 2020-10-08T07:57:00Z

I think non-connection transient errors on commit are rare enough that we don't have to worry about them. And the best way of handling any commit errors would still be supplying verifySucceeded function.

Timovzl · 2020-10-08T08:12:39Z

I'm guessing when verifySucceeded is non-null the behavior will stay the same - whether we retry simply depends on its return value?

As for deadlocks, they usually happen earlier than commit time, right?

AndriySvyryd · 2020-10-08T17:59:43Z

@Timovzl Yes and yes

roji · 2020-10-08T18:16:36Z

We should be careful about behavior we assume across databases (e.g. what errors can occur at commit time).

Since we're concentrating on networking transient errors, what would we do if the verifySucceeded function itself fails (because the network isn't working)?

AndriySvyryd · 2020-10-08T18:22:41Z

We should be careful about behavior we assume across databases (e.g. what errors can occur at commit time).

Not retrying should still be a safer default for any database.

Since we're concentrating on networking transient errors, what would we do if the verifySucceeded function itself fails (because the network isn't working)?

It's retried according to the current execution strategy. If a fatal exception is thrown it will not be handled.

roji · 2020-10-08T18:35:57Z

OK, thanks for the explanations @AndriySvyryd.

AndriySvyryd · 2022-10-20T19:27:28Z

With #10119 we can continue retrying safely

stevendarby · 2022-10-20T20:15:14Z

On this point:

I'm aware that for optimal results we could manually implement verifySucceeded for each method, but that adds a lot of work and code overhead.

Would a simple way to do this currently be to implement a simple verifySucceeded that just returns false? And maybe a custom extension to wrap this up for less code overhead?

Timovzl · 2023-01-09T13:28:50Z

With #10119 we can continue retrying safely

@AndriySvyryd I really wish that were true, but I don't believe it is. IExecutionStrategy executes arbitrary user code, so we cannot make assumptions about what we are retrying.

One (very proper) example of such user code is:

Generate a key (in code).
Create an entity with the generated key value.
Save changes.
Commit.

If the commit fails, even if SaveChanges were idempotent, the encompassing execution strategy still must not retry, because we'd still have a duplicate insert on our hands.

Timovzl · 2023-01-09T13:41:16Z

Would a simple way to do this currently be to implement a simple verifySucceeded that just returns false? And maybe a custom extension to wrap this up for less code overhead?

@AndriySvyryd or @roji, this suggestion looks promising as a workaround. Unfortunately, verifySucceeded appears to get evaluated even if the failure did not occur while committing (judging from ExecutionStrategy and SqlSeverRetryingExecutionStrategy, at least).

Do you see any alternative workarounds?

AndriySvyryd · 2023-01-11T20:38:29Z

@AndriySvyryd I really wish that were true, but I don't believe it is. IExecutionStrategy executes arbitrary user code, so we cannot make assumptions about what we are retrying.

You are right. That enhancement would only help the cases where the IExecutionStrategy is used by SaveChanges and doesn't contain user code

@AndriySvyryd or @roji, this suggestion looks promising as a workaround. Unfortunately, verifySucceeded appears to get evaluated even if the failure did not occur while committing (judging from ExecutionStrategy and SqlSeverRetryingExecutionStrategy, at least).

Assuming that the operation failed is dangerous (e.g. when inserting rows with generated keys) that's why we recommend going this route.

Timovzl added the customer-reported label Oct 6, 2020

AndriySvyryd added area-save-changes type-enhancement labels Oct 9, 2020

AndriySvyryd added this to the Backlog milestone Oct 9, 2020

AndriySvyryd mentioned this issue Jan 14, 2022

Improve database updates #22959

Open

52 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transaction commit failure: avoid retry by default? #22904

Transaction commit failure: avoid retry by default? #22904

Timovzl commented Oct 6, 2020

AndriySvyryd commented Oct 6, 2020

Timovzl commented Oct 7, 2020

roji commented Oct 7, 2020

Timovzl commented Oct 7, 2020

roji commented Oct 7, 2020

ajcvickers commented Oct 7, 2020

AndriySvyryd commented Oct 7, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

Timovzl commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 20, 2022

stevendarby commented Oct 20, 2022

Timovzl commented Jan 9, 2023

Timovzl commented Jan 9, 2023

AndriySvyryd commented Jan 11, 2023

Transaction commit failure: avoid retry by default? #22904

Transaction commit failure: avoid retry by default? #22904

Comments

Timovzl commented Oct 6, 2020

AndriySvyryd commented Oct 6, 2020

Timovzl commented Oct 7, 2020

roji commented Oct 7, 2020

Timovzl commented Oct 7, 2020

roji commented Oct 7, 2020

ajcvickers commented Oct 7, 2020

AndriySvyryd commented Oct 7, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

Timovzl commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 8, 2020

roji commented Oct 8, 2020

AndriySvyryd commented Oct 20, 2022

stevendarby commented Oct 20, 2022

Timovzl commented Jan 9, 2023

Timovzl commented Jan 9, 2023

AndriySvyryd commented Jan 11, 2023