Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release connection lock before signaling SinglePhaseCommit completion #1801

Merged
merged 2 commits into from
Dec 20, 2022

Conversation

roji
Copy link
Member

@roji roji commented Oct 9, 2022

This is a possible fix for #1800, introduced by #1042; but it may re-introduce the issue that #1042 was meant to fix.

Fixes #1800

@codecov
Copy link

codecov bot commented Oct 9, 2022

Codecov Report

Base: 70.68% // Head: 70.02% // Decreases project coverage by -0.66% ⚠️

Coverage data is based on head (3552268) compared to base (03f3053).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1801      +/-   ##
==========================================
- Coverage   70.68%   70.02%   -0.67%     
==========================================
  Files         292      292              
  Lines       61727    61727              
==========================================
- Hits        43634    43226     -408     
- Misses      18093    18501     +408     
Flag Coverage Δ
addons 92.38% <ø> (ø)
netcore 73.33% <100.00%> (-0.78%) ⬇️
netfx 68.83% <ø> (-0.32%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...icrosoft/Data/SqlClient/SqlDelegatedTransaction.cs 45.12% <100.00%> (ø)
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 23.30% <0.00%> (-22.95%) ⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 65.68% <0.00%> (-7.70%) ⬇️
...ta/SqlClient/SqlConnectionPoolGroupProviderInfo.cs 42.50% <0.00%> (-5.00%) ⬇️
...re/src/Microsoft/Data/SqlClient/SNI/SNINpHandle.cs 73.77% <0.00%> (-4.92%) ⬇️
.../src/Microsoft/Data/SqlClient/SNI/SNIMarsHandle.cs 87.89% <0.00%> (-4.49%) ⬇️
...rc/Microsoft/Data/ProviderBase/DbConnectionPool.cs 85.02% <0.00%> (-0.84%) ⬇️
...c/Microsoft/Data/SqlClient/SqlConnectionFactory.cs 68.85% <0.00%> (-0.82%) ⬇️
.../Microsoft/Data/SqlClient/SNI/SNIMarsConnection.cs 76.72% <0.00%> (-0.63%) ⬇️
...osoft/Data/SqlClient/TdsParserStateObjectNative.cs 86.84% <0.00%> (-0.53%) ⬇️
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cheenamalhotra
Copy link
Member

Hi @roji

Apologies for the delay, I've been heads down in release work. I tested these changes today with multiple cases and the fix looks good to me overall.

However, since we don't have TransactionFlow attribute support in .NET 7 yet, we cannot replicate the scenario from .NET Framework where external service would interfere with distributed transactions. But I verified the change in .NET Framework, and I don't see any regression with this change either so I would like to believe it should be safe.

I couldn't reproduce this issue in .NET Framework too with the repro provided, so I'm not sure if we want to backport this fix just yet as Framework's been stable with the changes so I would be hesitant to touch it. I think due to lack of Distribution Transaction support in .NET Core, we couldn't really test this back then. But I'll be watchful of if/when TransactionFlow attribute support arrives with WCF services, we should validate #237 in .NET 7+ once.

Copy link
Member

@cheenamalhotra cheenamalhotra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest adding a test case for future proofing this scenario with the repro from #1800 (comment)

@roji
Copy link
Member Author

roji commented Nov 17, 2022

Thanks for reviewing @cheenamalhotra!

Would suggest adding a test case for future proofing this scenario with the repro from #1800 (comment)

Giving that this is a timing-sensitive race condition, that repro runs in a while (true) loop and isn't suitable as a regression test that always runs, I think... Or are you thinking about a manually-run test? Can you point me to where to add that etc.?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 19, 2022

Hi @roji

I got the repro simplified and can almost reproduce the issue everytime with this code. Tested with .NET 7

    string connString = "Server=localhost; Integrated Security=true; Pooling=false;"; // remember to disable pooling

    TransactionManager.ImplicitDistributedTransactions = true;
    using var transaction = new CommittableTransaction();

    // Uncommenting the following makes the deadlock go away as a workaround. If the transaction is promoted before
    // the first SqlClient enlistment, it never goes into the delegated state.
    // _ = TransactionInterop.GetTransmitterPropagationToken(transaction);
    await using var conn = new SqlConnection(connString);
    await conn.OpenAsync();
    conn.EnlistTransaction(transaction);
    await conn.CloseAsync();

    // Enlisting the transaction in second connection causes the transaction to be promoted.
    // After this, the transaction state will be "delegated" (delegated to SQL Server), and the commit below will
    // trigger a call to SqlDelegatedTransaction.SinglePhaseCommit.
    await using var conn2 = new SqlConnection(connString);
    await conn2.OpenAsync();
    conn2.EnlistTransaction(transaction);

    // Enlisting the transaction in third connection causes the deadlock on transaction.Commit() below (issue 1800)
    await using var conn3 = new SqlConnection(connString);
    await conn3.OpenAsync();
    await conn2.CloseAsync();
    conn3.EnlistTransaction(transaction);

    Console.WriteLine("Committing transaction...");
    transaction.Commit();
    // Never gets printed due to deadlock.
    Console.WriteLine("Transaction committed.");

There may be 1% chance it may not repro, but even if it does 99% of the time, we still have a possibility of capturing it if anything changes in future. I'd recommend adding this to test case with useful comments. 🙂

And once again, thank you for looking after this ❤️

@roji
Copy link
Member Author

roji commented Nov 21, 2022

OK, sounds good! Can you point me to where you guys put this kind of test? Or if you prefer, you can also just push a commit doing that to this PR.

@roji roji force-pushed the DelegatedTransactionDeadlock branch from 8f38613 to 8116d72 Compare November 22, 2022 08:43
@roji
Copy link
Member Author

roji commented Nov 22, 2022

@cheenamalhotra thanks, I've pushed the test. Some notes:

  • The test is under NET7_0_OR_GREATER, but the SqlClient tests currently target net6.0, so it's never run by default. I hacked around the project system and confirmed that when targeting net7.0, the test fails without the fix and passes with it. I'd recommend targeting net7.0 in the tests (possibly in addition to net6.0), but that's beyond the scope of this PR.
  • Your proposed test code above - with pooling disabled and a 3rd connection - actually passed without the fix; however, my original code sample (with pooling, two connections) very reliably failed without the fix and passed with it. So I just pushed that.
  • Just a note: various other tests under ManualTests/TransactionTest failed for me, I'm hoping that's expected (I'm guessing these aren't run as part of your CI build, hence the name ManualTests?)

@David-Engel and others, I really hope this can be patched in 5.0; the bug effectively prevents people from using distributed transactions with SqlClient on .NET 7 (the primary use case); in EF we won't be able to take a dependency on SqlClient 5.1 (7.0 is out, and patch releases generally can only bump patch versions of dependencies).

@lcheunglci
Copy link
Contributor

Since we're in the middle of releasing the 5.1 GA, I'll need to wait for approval from @David-Engel before we can merge this.

@DavoudEshtehari
Copy link
Contributor

@roji Adding support .NET 7.0 to the pipelines is on our plan, but it has the known issue #4014 on forwarding MSBuild's properties using dotnet test command, which is a blocker.

@JRahnama
Copy link
Contributor

@roji failures are not related to your changes. I have added windows 11 and apparently there is a failure due to some access restriction for Namedpipes. I am addressing them.

@DavoudEshtehari DavoudEshtehari merged commit 064c835 into dotnet:main Dec 20, 2022
@DavoudEshtehari DavoudEshtehari added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label Dec 20, 2022
@DavoudEshtehari DavoudEshtehari added this to the 5.1.0 milestone Dec 20, 2022
@DavoudEshtehari DavoudEshtehari added the Area\Netcore Issues that are apply only to .NET runtime or the 'netcore' project folder. label Dec 20, 2022
@roji roji deleted the DelegatedTransactionDeadlock branch December 21, 2022 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area\Netcore Issues that are apply only to .NET runtime or the 'netcore' project folder. 🐛 Bug! Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deadlock when using SinglePhaseCommit with distributed transactions
6 participants