TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

joostmeijles · 2022-07-20T14:12:02Z

Describe the bug

When performing multiple queries in parallel, and each pair of 2 queries are inside a transaction scope, an unexpected transaction error is thrown (see below).

Exception message: System.Transactions.TransactionAbortedException: The transaction has aborted.
Stack trace:
Unhandled exception. System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInterna
l& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)
--- End of stack trace from previous location ---

To reproduce

Below code reproduces the error. It runs 100K times 2 queries in parallel where each 2 queries are inside a single transaction scope.
Note that after each query the database connection is disposed (and thus closed). This is important to trigger the error.
When we change the code, and use 1 connection for both queries and only open 1 connection the error does not occur (see also code comments below).

using System.Transactions;
using Microsoft.Data.SqlClient;

static async Task PerformTransactionWithQuery(int num)
{
    try
    {
        using (new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            // Adding "max pool size=1000;" to the connection string seems to trigger the problem less often
            string connStr = @"Server=.\SQLEXPRESS;Database=master;Trusted_Connection=True;Encrypt=False";

            var query = "SELECT COUNT(*) FROM sys.dm_tran_active_transactions";

            await using (var dbConn = new SqlConnection(connStr))
            {
                await dbConn.OpenAsync();

                await using (var command1 = new SqlCommand(query, dbConn))
                {
                    await command1.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)

            await using (var dbConn = new SqlConnection(connStr)) 
            {
                // Reopening the connection triggers the following error:
                // System.Transactions.TransactionAbortedException: The transaction has aborted.
                //
                // NB. Using a single connection and opening it once does NOT trigger the error
                await dbConn.OpenAsync();
                
                await using (var command2 = new SqlCommand(query, dbConn))
                {
                    await command2.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)
            
            //Do not complete transaction
        }
    }
    catch (Exception e)
    {
        Console.WriteLine($"Failed {num}");
        throw;
    }
} 

var tasks = Enumerable.Range(0, 100000).ToList().Select(PerformTransactionWithQuery);
await Task.WhenAll(tasks);

Expected behavior

Being able to use multiple connections (with the same connection string) in sequence within the same transaction without errors.

Further technical details

Microsoft.Data.SqlClient version: 4.1.0
.NET target: .NET 6
SQL Server version: SQL Server 2019
Operating system: Windows 11

The text was updated successfully, but these errors were encountered:

DavoudEshtehari · 2022-07-22T00:22:01Z

@joostmeijles Thank you for the repro. I'll back to you if I had any questions to reproduce the issue.

joostmeijles · 2022-08-01T06:30:33Z

@DavoudEshtehari any update on an ETA to verify that this is a bug and get it fixed?

DavoudEshtehari · 2022-08-04T19:16:34Z

@joostmeijles I confirm it's reproducible with MDS and we'll return to it after the next release. I'm afraid we can't provide any estimation for a fix while it's in the investigation stage. We'll keep updating here with any findings, and we welcome any contribution as this is an open-source driver.

JRahnama · 2022-08-05T18:18:20Z

@joostmeijles can you test with a reduced MaxPoolSize to 20? I wonder if this is something similar to #422?

joostmeijles · 2022-08-08T06:44:55Z

@joostmeijles can you test with a reduced MaxPoolSize to 20? I wonder if this is something similar to #422?

Changing the pool size to 20 does not change much, but setting it very large (e.g. 1000) seems to trigger the problem less often.

JRahnama · 2022-08-08T16:24:15Z

@joostmeijles one more question, were you able to see the same problem on Windows 10?

JRahnama · 2022-08-08T18:10:03Z

@joostmeijles regarding the comment in the repro

// NB. Using a single connection and opening it once does NOT trigger the error

does this mean if you open a connection first and then try the rest of the code is working fine or just a single connection?

joostmeijles · 2022-08-09T06:28:17Z

@joostmeijles regarding the comment in the repro
// NB. Using a single connection and opening it once does NOT trigger the error
does this mean if you open a connection first and then try the rest of the code is working fine or just a single connection?

It means changing the code to:

await using (var dbConn = new SqlConnection(connStr))
{
    await dbConn.OpenAsync();

    await using (var command1 = new SqlCommand(query, dbConn))
    {
        await command1.ExecuteScalarAsync();
    }

    await using (var command2 = new SqlCommand(query, dbConn))
    {
        await command2.ExecuteScalarAsync();
    }
} // Connection is disposed (and thus closed)

does not trigger the error.

joostmeijles · 2022-08-09T11:06:37Z

@joostmeijles one more question, were you able to see the same problem on Windows 10?

Yes, same.

JRahnama · 2022-08-09T17:38:06Z

@joostmeijles couple of things I noticed:

If we set the pooling to false the issue does not show up (this confirms the issue only occurs with pooling)
if you set SqlConnection.ClearAllPools(); before the second using of dbConn your code should be working fine as well.
If we start with one connection first and then try to open the rest the connections the issue does not com up.
The above makes me suspicious that we need to adjust the code to do pool warming.
Too many parallel connections are trying to create the very same pool. If we let one connection to get opened a pool is stablished and the rest will use that.

David-Engel · 2022-08-09T18:37:32Z

Edit: Timeout isn't the issue here.

It sounds to me like your transaction is timing out. With 100,000 parallel operations and a limited connection pool and thread pool, there will likely be a significant delay between the first query completing, the connection closing, and the second connection being obtained from the pool during the transaction. Idle transactions aren't kept around forever. The default transaction timeout is only one minute.

Useful blog post: https://docs.microsoft.com/en-za/archive/blogs/dbrowne/using-new-transactionscope-considered-harmful

joostmeijles · 2022-08-10T06:37:18Z

It sounds to me like your transaction is timing out. With 100,000 parallel operations and a limited connection pool and thread pool, there will likely be a significant delay between the first query completing, the connection closing, and the second connection being obtained from the pool during the transaction. Idle transactions aren't kept around forever. The default transaction timeout is only one minute.

Useful blog post: https://docs.microsoft.com/en-za/archive/blogs/dbrowne/using-new-transactionscope-considered-harmful

An error possibly occur within seconds, so doesn't seem to hit a timeout. Also, the 100K parallel operations are just to make it easier to reproduce. I have seen the issue also with less (~100) parallel operations.

Creating the transaction scope as follows (as mentioned in the blog post + async option) results in the same error.

static TransactionScope CreateTransactionScope()
{
    var transactionOptions = new TransactionOptions();
    transactionOptions.IsolationLevel = IsolationLevel.ReadCommitted;
    transactionOptions.Timeout = TransactionManager.MaximumTimeout;
    return new TransactionScope(TransactionScopeOption.Required, transactionOptions, TransactionScopeAsyncFlowOption.Enabled);
}

joostmeijles · 2022-08-10T06:41:48Z

@joostmeijles couple of things I noticed:

If we set the pooling to false the issue does not show up (this confirms the issue only occurs with pooling)

if you set SqlConnection.ClearAllPools(); before the second using of dbConn your code should be working fine as well.

If we start with one connection first and then try to open the rest the connections the issue does not com up.
The above makes me suspicious that we need to adjust the code to do pool warming.
Too many parallel connections are trying to create the very same pool. If we let one connection to get opened a pool is stablished and the rest will use that.

Without connection pooling (on the connection string or by clearing the pool before the second connection) System.PlatformNotSupportedException: This platform does not support distributed transactions. will be thrown immediately.

DavoudEshtehari · 2022-10-25T00:32:15Z

Just posting an update:

I could reproduce this issue with Microsoft.Data.SqlClient since version 1.1.4 against supported target frameworks (.NET/.NET Core or .NET Framework). And System.Data.SqlClient doesn't throw "TransactionAbortedException".

This issue has been introduced by PR #543 for issue #237, which explains why it doesn't occur against SDS.

DavoudEshtehari · 2022-10-25T00:39:35Z

@joostmeijles Using the NuGet package from the pipeline that is introduced with PR #1814 I experienced some improvement with this issue. May I ask you to verify it with the provided link, please?

joostmeijles · 2022-10-25T09:09:39Z

@DavoudEshtehari thanks for the update! Sounds as a good change, but unfortunately I am still getting an error, although its a different one:

Unhandled exception. System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Initialize()
   at System.Transactions.TransactionStatePSPEOperation.PSPEInitialize(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.TransactionStateActive.EnlistPromotableSinglePhase(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Transaction atomicTransaction, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)

DavoudEshtehari · 2022-10-25T18:17:09Z

@joostmeijles thank you for sharing the result.
Also, I can see some improvements in the exception occurrence. With the proposed change it is less likely to happen (by increasing to over 1M iterations), which is not a constant behaviour and depends on the available machine's resources. Can you verify if you see the same progress, please?

joostmeijles · 2022-10-26T07:21:47Z

It fluctuates, doing 100K iterations. Some runs pass, others still bring up the error 1 or 2 times.

sdrapkin · 2022-11-15T22:18:19Z

I can reproduce it under both MDS-5.1.0-preview2.22314.2 and SDS-4.8.5 on Windows10, under NET5, NET6, and NET7. Making a query is not even required to reproduce (might take a few runs). Uncommenting ts.Complete() stops triggering the exceptions. IMHO this is a bug.

LinqPad:

async Task Main()
{
	var tasks = Enumerable.Range(0, 100_000).Select(PerformWork);
	await Task.WhenAll(tasks);
	"DONE!".Dump();
}
const string connString = "Server=.\\SQL2019;Database=TempDB;Application Name=TestApp;Trusted_Connection=True;TrustServerCertificate=True;Pooling=True;Encrypt=True;Max Pool Size = 5000;";

static TransactionScope CreateTransactionScope() =>
	new TransactionScope(TransactionScopeOption.Required,
		new TransactionOptions { IsolationLevel = System.Transactions.IsolationLevel.ReadCommitted }, TransactionScopeAsyncFlowOption.Enabled);

static async Task PerformWork(int num)
{
	try
	{
		using (var ts = CreateTransactionScope())
		{
			for (int i = 0; i < 2; ++i)
			{
				await using (var dbConn = new SqlConnection(connString))
				{
					await dbConn.OpenAsync();
					await Task.Delay(1);
				} // Connection is disposed (and thus closed)
			}//for
			//ts.Complete();
		}//ts
	}//try
	catch (Exception e)
	{
		Console.WriteLine($"Failed {num}");
		e.Dump(); Environment.Exit(-1);
		throw;
	}
}

cheenamalhotra · 2022-11-16T05:13:49Z

@joostmeijles @sdrapkin

What you're trying to do is unsafe and I don't agree this is bug. Old behavior before v1.1.4 was a bug where accounting was not done for externally aborted transactions. I would request a read on #237 (comment) and #237 (comment) to understand the root cause.

How distributed transactions work is that they are decoupled from your application and are running within the MS DTC service. Which means if your app tries to be super-fast and expect MS DTC to keep coordinating with SQL Server for the MISSED commit/rollbacks, it needs time to do so, as it needs to abort the transaction on the server.

From driver end, if you closed a POOLED connection, server-side connection is kept open. So now if you happen to re-open the connection BEFORE MSDTC could terminate your previous transaction, you find that AFTER opening the connection, transaction gets aborted. It should never happen that the same OLD transaction continues to be used, and I hope you'd agree with me on that! So, someone must keep things in order. Either it's your app, or the driver.

If you want driver to consistently terminate your connection alongwith Distributed transaction, DON'T use connection pool!
If you want to use connection pool for performance needs, you MUST manage your transaction lifecycle and cannot leave them to be aborted by server.

Likewise, in the above repro, if you do call ts.Complete() things do just fine!

I would certainly recommend documentation on this somewhere so it's not a surprise to end users as the default behavior of driver is with pooling enabled, which is clearly not intuitive to users. cc @David-Engel

cheenamalhotra · 2022-11-16T05:21:42Z

Also, to address @DavoudEshtehari 's PR #1814 and why it made visible difference:

When the PR changed Pool design from Stack to a Queue, it's clear that instead of fetching the latest connection from stack, you'd always get the oldest connection, which gives more time to the recently added connection in the pool to terminate transaction.

Since it's just pushing the latest connection to end of queue, it's not a solution IMO. You can still reproduce this behavior with a single connection going in and out of the pool if you limit your pool size.

joostmeijles · 2022-11-16T07:56:50Z

@cheenamalhotra

As far as I understand the provided example code should not result in a distributed transaction as the connection gets closed before the next connection is opened.

Please note that I am not using the MS DTC service (its not running on my machine). And I think that the .NET version I use does not support distributed transactions?

sdrapkin · 2022-11-16T16:28:40Z

@cheenamalhotra There are no distributed transactions in my LinqPad example. MSDTC escalation does not happen because only one connection-string is used. If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

When the 1st connection-open request occurs, it should enlist that connection into the Transaction. Then 1st-connection-close occurs (but the inner-connection is not really closed - just returns to the pool). Then the 2nd connection-open request should receive the exact same already-open inner-connection back from the pool (ie. the same one durable resource is enlisted). This is exactly what happens when dbConn.Open() is used. There should be no observable behavior differences between dbConn.Open(); and await dbConn.OpenAsync(); - yet the async version crashes.

cheenamalhotra · 2022-11-16T17:18:23Z

If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

This is interesting and I'd agree I didn't look into that to be the possibility, so it certainly points to something missing out with OpenAsync(), which is very strange since both APIs 'should' follow the same path. 🤔

There are no distributed transactions

I possibly was imagining .NET 7, but never mind. It's still a case of delegated transaction as we're looking at System.Transactions.TransactionPromotionException

Let me dive into it and will get back.

cheenamalhotra · 2022-11-16T19:46:52Z

I reproed again with Open() and it does seem to happen, maybe not everytime but it does sometimes fail (notice the stacktrace):

System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlInternalConnectionTds.cs:line 992
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDelegatedTransaction.cs:line 157
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlDelegatedTransaction.cs:line 228
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 753
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 590
   at Microsoft.Data.SqlClient.SqlInternalConnection.Enlist(Transaction tx) in E:\sqlclient\src\Microsoft.Data.SqlClient\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 460
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.Activate(Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlInternalConnectionTds.cs:line 887
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionInternal.cs:line 221
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1333
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1317
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionPool.cs:line 1132
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\ProviderBase\DbConnectionFactory.cs:line 122
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Common\src\Microsoft\Data\ProviderBase\DbConnectionInternal.cs:line 341
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Common\src\Microsoft\Data\ProviderBase\DbConnectionClosed.cs:line 39
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1844
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides) in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1333
   at Microsoft.Data.SqlClient.SqlConnection.Open() in E:\sqlclient\src\Microsoft.Data.SqlClient\netcore\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 1310

The problem seems to be the fact the driver or System.Transactions (to be precise) even attempts to promote transaction, where it should not and does not in a normal flow. The trigger scenario is when below call fails to enlist connection into delegated transaction and returns false which is possible if transaction is not yet aborted:

SqlClient/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlInternalConnection.cs

Line 529 in eb1b922

hasDelegatedTransaction = tx.EnlistPromotableSinglePhase(delegatedTransaction);

The transaction promotion is triggered by System.Transactions when the driver attempts to grab transaction cookie here:

SqlClient/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlInternalConnection.cs

Line 586 in eb1b922

cookie = GetTransactionCookie(tx, _whereAbouts);

As you can see it leads to calling at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction) in the stacktrace above.

cc @roji do you know why would System.Transactions try to make the transaction distributed, calling promote, in .NET 6?

jetizz · 2023-02-06T06:33:40Z

@DavoudEshtehari Tried setting the timeout to 180 seconds, issue still occurs. Also tried 60 seconds and 10 minutes (TransactionManager.DefaultTimeout and TransactionManager.MaximumTimeout) - both appear to make no difference.
As @ByteMe-web said, we can't apply previously stated workaround as we're running under linux (aspnet:7.0-bullseye-slim as base to be precise).
Again, I do not have a suite to measure and reliably repro this issue, but reading from logs, it occurs 2-3 times per 15k transactions. We do however use up to 20 (very short lived) connections during single transaction. Thought of rewriting how connections are used (roughly - scope to request instead of transient), but it would require significant effort. It should result in lower pool usage but no guarantee it would fully resolve this issue, so I'm reluctant.

arical · 2023-03-15T22:12:02Z

Mostly noting this down as another point of data for the team.

After upgrading our project to EF 7/SqlClient 5.1.0 from EF 6/SqlClient 5.0.1, we've started receiving this exception intermittently on our CI while running our test suite. It happens roughly once in 3 runs, which is annoying enough to be noticeable given we have had a fairly stable build up until this point.

Setup:

EF 7 / SqlClient 5.1.0
Our CI build agent is .NET 7 on linux (aws/codebuild/standard:6.0 image)
Not using distributed transaction
Not using MARS
Using connection pooling (max 200)
Timeout 30 seconds
SQL Server 2017 running in docker

DavoudEshtehari · 2023-06-06T23:17:49Z

I can't see the issue with the repro by upgrading to Windows 11 version 21H2 (OS Build 22000.1936). Please, share your results.

Update:
By increasing the number of cnn.OpenAsyn() calls to 20 on each thread, it started to break with the same exception after a few runs.

DavoudEshtehari · 2024-01-18T17:17:52Z

This package from the PR #2301 can be used for verification. Please, give it a try as this issue varies across different machines.

sdrapkin · 2024-01-18T22:28:30Z

@DavoudEshtehari I have tested Nuget 5.20.0-pull.101801 (Windows 10/64bit with .NET 8.0.1) and it consistently does not throw (ie. works). It also works for me on latest .NET 7 and .NET 6. 💚

I have also tested the exact same repro-code with Nuget 5.2.0-preview4.233342.2, and it consistently throws on every run. 💥
Thus, so far this PR is very encouraging, thanks for the work.

Can this fix please be ported to the System.Data.SqlClient as well?

DavoudEshtehari · 2024-01-18T22:55:07Z

@sdrapkin thanks for the prompt reply.
This issue doesn't apply to S.D.SqlClient because it hasn't received another fix through PR #543.
Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

sdrapkin · 2024-01-18T23:02:24Z

Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

IMHO this is a significant bug, and Microsoft should fix it in S.D.SqlClient as well (which a vast number of enterprise software still depends on).

DavoudEshtehari · 2024-01-19T00:36:14Z

I'll bring up your concern in the next bug triage meeting and will update here if there's a different outcome.

joostmeijles · 2024-01-19T10:39:37Z

This package from the PR #2301 can be used for verification. Please, give it a try as this issue varies across different machines.

Verified on my machine:
.NET target: .NET 6
SQL Server version: SQL Server 2019
Operating system: Windows 11

Works like a charm! Fantastic work @DavoudEshtehari 🥇

razvalex · 2024-01-25T16:23:05Z

First thank you very much for prioritizing this issue appropriately. I just wanted to share what we've encountered till now in our tests;

Some time ago we migrated from System.Data.SqlClient to Microsoft.Data.SqlClient 3.1.1, which worked as expected for us (we're using transaction scope quite intensively and we're trying to migrate away);

Recently, due to https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-0056, we tried to upgrade from 3.1.1 to 3.1.5;

We started to encounter two different issues from time to time (~20 times per day out of ~2M requests & tens of thousands background tasks):

System.InvalidOperationException: The operation is not valid for the current state of the enlistment.

[{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistmentState.Committed","level":0,"line":39,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/EnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.SinglePhaseEnlistment.Committed","level":1,"line":55,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/SinglePhaseEnlistment.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit","level":2,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.DurableEnlistmentCommitting.EnterState","level":3,"line":160,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/DurableEnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.CommittableTransaction.Commit","level":4,"line":82,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/CommittableTransaction.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.InternalDispose","level":5,"line":800,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.Dispose","level":6,"line":738,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"}]

System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.

[{"severityLevel":"Error","outerId":"0","message":"System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.","type":"System.Reflection.TargetInvocationException","id":"64648675","parsedStack":[{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":0,"line":61,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull","level":1,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection","level":2,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection","level":3,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection","level":4,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal","level":5,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync","level":6,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":7,"line":53,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":8,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":9,"line":118,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,"method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":12,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,{"severityLevel":"Error","outerId":"64648675","message":"The transaction has aborted.","type":"System.Transactions.TransactionAbortedException","id":"53313168","parsedStack":[{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateAborted.CheckForFinishedTransaction","level":0,"line":1493,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistableStates.PromotedToken","level":1,"line":662,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.Transaction.GetPromotedToken","level":2,"line":462,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/Transaction.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"InvokeStub_Transaction.GetPromotedToken","level":3,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":4,"line":0}]},{"severityLevel":"Error","outerId":"53313168","message":"Failure while attempting to promote transaction.","type":"System.Transactions.TransactionPromotionException","id":"58585774","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":0,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStatePSPEOperation.PSPEPromote","level":1,"line":4423,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateDelegatedNonMSDTC.EnterState","level":2,"line":4264,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"}]},{"severityLevel":"Error","outerId":"58585774","message":"The requested operation cannot be completed because the connection has been broken.","type":"System.InvalidOperationException","id":"40702182","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction","level":0,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":1,"line":0}]}]

We also tried to migrate to 5.1.x/5.2.x, but I think that these issues were much more common (~30 times per day); Of course all these metrics might be influenced by actual usage (we tried the migration in various environments which are more or less similar in terms of infrastructure, etc);

Based on my understanding this PR: #1801 (3.1.x: #1912) introduced a race condition between connection.CleanupConnectionOnTransactionCompletion and enlistment.Committed (which was moved outside of connection lock) as after the initial cleanup, the connection will be returned to the default pool, enabling other threads to utilize it, while the subsequent cleanup will identify another ongoing transaction.

I saw that 5.2.0-preview5.24024.3 is up for grabs and we will give it a try these days (max. early next week);

I wanted to ask if there are plans to backport #2301 to older versions like 3.1.x. @DavoudEshtehari Do you know something about it at this moment, or is it too early?

Setup:
EF 6 / SqlClient 3.1.5 or SqlClient 5.1.x / 5.2.x
.NET 8
Not using distributed transaction
Not using MARS
Using connection pooling
Timeout 30 seconds
Azure SQL Database

DavoudEshtehari · 2024-01-25T19:27:53Z

Hi @razvalex, This fix is slated to be considered for backporting to all the supported servicing versions, with the initial release expected to be 5.1.5. However, there is no specific timeline set for these releases as of now.

razvalex · 2024-01-31T09:27:15Z

@DavoudEshtehari, I wanted to inform you that we evaluated the performance of https://github.com/dotnet/SqlClient/releases/tag/v5.2.0-preview5 in one of our environments, which handles approximately 2 million daily requests and tens of thousands of background tasks. The results were very positive. Thanks again!

For us it would be helpful to also have #2301 backported to 3.1.x, but I think we can also manage to update everything to 5.1.5 if that won't be the case.

ErikEJ · 2024-01-31T09:51:32Z

@razvalex 5.1.5 has been released

razvalex · 2024-01-31T10:47:22Z

@ErikEJ Thank you. This week, we're aiming to upgrade the majority of our projects to version 5.1.5. I'll provide you with an update on the outcomes soon (mid-February). Backporting #2301 to version 3.1.x would help us (for example) to avoid extra handling for Encrypt (#1210) in certain older projects.

JRahnama · 2024-01-31T17:13:07Z

@razvalex eventually the fix will be backported to 3.1, date yet to be determined.

DavoudEshtehari · 2024-01-31T18:46:17Z

@razvalex Thanks for sharing the result. It sounds promising.
I understand the challenges of upgrading with breaking changes, and I encourage you to consider moving on to the latest stable version sooner rather than later to gain the benefits of the improvements.

razvalex · 2024-03-19T16:22:23Z

We were able to upgrade the majority of our projects from 3.1.5 to version 5.1.5 with great success; This week we are planning to upgrade from 5.1.5 to 5.2.0, mainly due to increased performance in some key areas (ex: #1544).

I was wondering if there are any plans for backporting #2301 to 3.1.x as some of our older projects might benefit having this fix in place. Also, upgrading them to a newer version means dealing with breaking changes (ex: Encrypt) and might be riskier.

joostmeijles changed the title ~~TransactionAbortedException when querying in parallel inside a transaction~~ TransactionAbortedException when performing queries in parallel inside a transaction scope Jul 20, 2022

DavoudEshtehari mentioned this issue Oct 24, 2022

Improved general connection pool #1814

Closed

lcheunglci assigned DavoudEshtehari Oct 25, 2022

neilzzy mentioned this issue Jan 12, 2024

Solution with IPromotableSinglePhaseNotification and TransactionCompletion against a promoted transaction to MS-DTC dotnet/runtime#96881

Open

DavoudEshtehari mentioned this issue Jan 18, 2024

Fix | Invalid transaction exception against the connections and distributed transactions #2301

Merged

DavoudEshtehari closed this as completed in #2301 Jan 19, 2024

JRahnama mentioned this issue Jan 25, 2024

[5.1.5] Fix | Invalid transaction exception against the connections and distributed transactions (#2301) #2321

Merged

TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

Comments

joostmeijles commented Jul 20, 2022 • edited Loading

Describe the bug

To reproduce

Expected behavior

Further technical details

DavoudEshtehari commented Jul 22, 2022

joostmeijles commented Aug 1, 2022

DavoudEshtehari commented Aug 4, 2022

JRahnama commented Aug 5, 2022

joostmeijles commented Aug 8, 2022 • edited Loading

JRahnama commented Aug 8, 2022

JRahnama commented Aug 8, 2022

joostmeijles commented Aug 9, 2022

joostmeijles commented Aug 9, 2022

JRahnama commented Aug 9, 2022 • edited Loading

David-Engel commented Aug 9, 2022 • edited Loading

joostmeijles commented Aug 10, 2022 • edited Loading

joostmeijles commented Aug 10, 2022

DavoudEshtehari commented Oct 25, 2022 • edited Loading

DavoudEshtehari commented Oct 25, 2022

joostmeijles commented Oct 25, 2022 • edited Loading

DavoudEshtehari commented Oct 25, 2022

joostmeijles commented Oct 26, 2022 • edited Loading

sdrapkin commented Nov 15, 2022

cheenamalhotra commented Nov 16, 2022 • edited Loading

cheenamalhotra commented Nov 16, 2022 • edited Loading

joostmeijles commented Nov 16, 2022

sdrapkin commented Nov 16, 2022

cheenamalhotra commented Nov 16, 2022

cheenamalhotra commented Nov 16, 2022 • edited Loading

jetizz commented Feb 6, 2023

arical commented Mar 15, 2023 • edited Loading

DavoudEshtehari commented Jun 6, 2023 • edited Loading

DavoudEshtehari commented Jan 18, 2024

sdrapkin commented Jan 18, 2024

DavoudEshtehari commented Jan 18, 2024

sdrapkin commented Jan 18, 2024

DavoudEshtehari commented Jan 19, 2024

joostmeijles commented Jan 19, 2024

razvalex commented Jan 25, 2024 • edited Loading

DavoudEshtehari commented Jan 25, 2024

razvalex commented Jan 31, 2024 • edited Loading

ErikEJ commented Jan 31, 2024

razvalex commented Jan 31, 2024 • edited Loading

JRahnama commented Jan 31, 2024

DavoudEshtehari commented Jan 31, 2024

razvalex commented Mar 19, 2024

joostmeijles commented Jul 20, 2022 •

edited

Loading

joostmeijles commented Aug 8, 2022 •

edited

Loading

JRahnama commented Aug 9, 2022 •

edited

Loading

David-Engel commented Aug 9, 2022 •

edited

Loading

joostmeijles commented Aug 10, 2022 •

edited

Loading

DavoudEshtehari commented Oct 25, 2022 •

edited

Loading

joostmeijles commented Oct 25, 2022 •

edited

Loading

joostmeijles commented Oct 26, 2022 •

edited

Loading

cheenamalhotra commented Nov 16, 2022 •

edited

Loading

cheenamalhotra commented Nov 16, 2022 •

edited

Loading

cheenamalhotra commented Nov 16, 2022 •

edited

Loading

arical commented Mar 15, 2023 •

edited

Loading

DavoudEshtehari commented Jun 6, 2023 •

edited

Loading

razvalex commented Jan 25, 2024 •

edited

Loading

razvalex commented Jan 31, 2024 •

edited

Loading

razvalex commented Jan 31, 2024 •

edited

Loading