Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrent connection usage and MARS #1234

Closed
roji opened this issue Aug 29, 2021 · 40 comments · Fixed by #1357
Closed

Concurrent connection usage and MARS #1234

roji opened this issue Aug 29, 2021 · 40 comments · Fixed by #1357
Labels
💥 Regression Issues that are regressions introduced from earlier PRs. 📈 Performance Issues that are targeted to performance improvements.

Comments

@roji
Copy link
Member

roji commented Aug 29, 2021

As part of running some compatibility checks between EF Core 6.0 and SqlClient 3.0.0 and 4.0.0-preview1.21237.2 (dotnet/efcore#25766), I came across what looks like a considerable perf regression. While the EF test suite completed in around 4m30s with SqlClient 3.0.0, it took around 12 minutes with 4.0.0-preview1.21237.2.

Some digging seems to point at test QueryBugsTest.Thread_safety_in_relational_command_cache; with 3.0.0 the total run completes in 20 seconds (including all startup), whereas with 4.0.0-preview1.21237.2 it takes around 5m15s. This test runs 100 parallel queries - no big data is being transferred, and purely sync I/O is used.

To run this, you can check out the EF Core code base, run build.cmd to get the latest dotnet core, go inside test/EFCore.SqlServer.FunctionalTests and run dotnet test --filter Thread_safety_in_relational_command_cache

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 29, 2021

Windows or Linux will be an important factor.

@roji
Copy link
Member Author

roji commented Aug 30, 2021

Good point, this happened on my machine running on Ubuntu 21.04.

@ajcvickers
Copy link

I will do a test run on my Windows laptop.

@ajcvickers
Copy link

I can't get any of the SQL Server tests to run on my machine using 4.0.0-preview1.21237.2. I get:

System.ComponentModel.Win32Exception : The certificate chain was issued by an authority that is not trusted.

Is there something else I need to configure on my machine to make these tests run?

Full output:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:09.14]     Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache [FAIL]
  Failed Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache [346 ms]
  Error Message:
   Microsoft.Data.SqlClient.SqlException : A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 - The certificate chain was issued by an authority that is not trusted.)
---- System.ComponentModel.Win32Exception : The certificate chain was issued by an authority that is not trusted.
  Stack Trace:
     at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.SNIWritePacket(PacketHandle packet, UInt32& sniError, Boolean canAccumulate, Boolean callerHasConnectionLock)
   at Microsoft.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate)
   at Microsoft.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate)
   at Microsoft.Data.SqlClient.TdsParser.TdsLogin(SqlLogin rec, FeatureExtension requestedFeatures, SessionData recoverySessionData, FederatedAuthenticationFeatureExtensionData fedAuthFeatureExtensionData)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.Login(ServerInfo server, TimeoutTimer timeout, String newPassword, SecureString newSecurePassword)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteCommand[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 330
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.<>c__37`1.<Execute>b__37_0(<>f__AnonymousType83`5 state) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 316
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext c, TState s) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 341
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass28_0`2.<Execute>b__0(DbContext context, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 161
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementation[TState,TResult](Func`3 operation, Func`3 verifySucceeded, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 176
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 160
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation, Func`2 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 339
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 288
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Execute[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\tes\TestUtilities\SqlServerTestStore.cs:line 247
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateDatabase(Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 110
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Initialize(Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 86
.cs:line 52
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.RelationalTestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Relational.Specification.Tests\TestUtilities\RelationalTestStore.cs:line 43
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.InitializeSqlServer(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 76
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateInitialized(String name, Boolean useFileName, Nullable`1 multipleActiveResultSets) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 44
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.CreateTestStore() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 10401
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.Initialize[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 86
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.InitializeAsync[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Action`1 seed, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 67
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 7092
--- End of stack trace from previous location ---
----- Inner Stack Trace -----


Failed!  - Failed:     1, Passed:     0, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)

@roji
Copy link
Member Author

roji commented Aug 30, 2021

Yes, you need to add Encrypt=false to your connection string; SqlClient 4.0.0 has Encrypt set to true by default (#1210).

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2021

Preview 4 has a security change which forces encryption and trust. You either need to disable encryption (not recommended because it'll be what users have enabled) or add TrustServerCertificate=True; to your connection string.

@ajcvickers
Copy link

Not seeing a regression on Windows:

With 2.1:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)

With 4.0 preview:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2021

Ok, Managed SNI it is. @roji were your tests with Encryption off? and do you know if MARS was on or off?

@roji
Copy link
Member Author

roji commented Aug 30, 2021

Yeah, Encrypt=false and no MARS. Can also run any other tests here - just let me know.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2021

No Mars is bad news. That means it's not just the usual threadpool exhaustion because with mars off sync IO is really synchronous in managed SNI.

@ajcvickers
Copy link

Okay, so ran with managed SNI on Windows using:

static SqlServerTestStore()
{
    var ManagedNetworkingAppContextSwitch = "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows";
    AppContext.SetSwitch(ManagedNetworkingAppContextSwitch, true);
}

My connection string is:

Server=localhost;Database=master;Trusted_Connection=True;TrustServerCertificate=True;

Now the test runs for about 10-15 seconds, but then consistently fails with:

Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache

Microsoft.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server...

Microsoft.Data.SqlClient.SqlException
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server: Could not open a connection to SQL Server)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, Boolean withFailover, SqlAuthenticationMethod authType)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteCommand[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 336
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.<>c__38`1.<Execute>b__38_0(<>f__AnonymousType83`5 state) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 322
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext c, TState s) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 341
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass28_0`2.<Execute>b__0(DbContext context, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 161
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementation[TState,TResult](Func`3 operation, Func`3 verifySucceeded, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 176
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 160
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation, Func`2 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 339
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 288
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Execute[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 313
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteScalar[T](DbConnection connection, String sql, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 253
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateDatabase(Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 116
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Initialize(Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 92
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.<>c__DisplayClass13_0.<Initialize>b__1() in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStoreIndex.CreateNonShared(String name, Action initializeDatabase) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStoreIndex.cs:line 52
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.RelationalTestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Relational.Specification.Tests\TestUtilities\RelationalTestStore.cs:line 43
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.InitializeSqlServer(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 82
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateInitialized(String name, Boolean useFileName, Nullable`1 multipleActiveResultSets) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 50
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.CreateTestStore() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 10401
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.Initialize[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 86
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.InitializeAsync[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Action`1 seed, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 67
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 7092
   at Xunit.Sdk.TestInvoker`1.<>c__DisplayClass48_1.<<InvokeTestMethodAsync>b__1>d.MoveNext() in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\Runners\TestInvoker.cs:line 264
--- End of stack trace from previous location ---
   at Xunit.Sdk.ExecutionTimer.AggregateAsync(Func`1 asyncAction) in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\ExecutionTimer.cs:line 48
   at Xunit.Sdk.ExceptionAggregator.RunAsync(Func`1 code) in C:\Dev\xunit\xunit\src\xunit.core\Sdk\ExceptionAggregator.cs:line 90

@roji
Copy link
Member Author

roji commented Aug 30, 2021

Sounds very similar/related...

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2021

That does smell like threadpool exhaustion..
It'll start multiple connections and rely on a task callback to complete the connection but because the threadpool hasn't ramped up you haven't got enough threads to schedule the completion and when one becomes available as it is added to the threadpool it'll likely pick up a thread that's spend so long connecting that it's passed the timeout and will fail. If you make it through that phase things will sort of work. That's the exact presentation in #422 but it looks like it's been extended into the pool now.

The immediate easy test it to artificially inflate the threadpool minthreads to the number of connections you expect to open, in this case 100, and then see if it still fails. If it all works then we know what the problem is. Fixing it though, that's a different matter.

@ajcvickers
Copy link

Some more info: this fails in exactly the same way with SqlClient 2.1 on Windows with managed SNI. Increasing thread pool size doesn't make any difference.

static SqlServerTestStore()
{
    var ManagedNetworkingAppContextSwitch = "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows";
    AppContext.SetSwitch(ManagedNetworkingAppContextSwitch, true);
    ThreadPool.GetMaxThreads(out var workerThreads, out var completionPortThreads);
    ThreadPool.SetMaxThreads(workerThreads * 20, completionPortThreads * 20);
}

@roji
Copy link
Member Author

roji commented Aug 30, 2021

@Wraith2 I agree this indeed smells like TP exhaustion... But @ajcvickers I don't think this test was slow for me on Linux with SqlClient 2.1 (which is what we're currently using in EF Core)? Weird...

@ajcvickers
Copy link

@roji There is clearly some Windows/Linux difference here, not just managed/native difference.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 30, 2021

The areas that are affected by platform are mainly how pool keys are derived and how networking is implemented, both are switched on the appcontext switch you've used so you're effectively running in linux mode on windows when using that script, I've been able to repro linux-only problems on windows using that switch. Of course there could be some api call which isn't behaving the same way but that would be a bug in it's own right that needs addressing.

for the original reported issue would it be possible to get an EF-free repro?

@JRahnama
Copy link
Contributor

Hi @roji, @ajcvickers and @Wraith2
This seems related to TP exhaustion and as much as I try to avoid relating issues to #422 I end up there, but as @Wraith2 mentioned is it possible to have an EF-free repro? It makes it much easier to follow if there is a regression.

@roji
Copy link
Member Author

roji commented Aug 30, 2021

Sure, I'll do that in the coming days.

@roji
Copy link
Member Author

roji commented Sep 1, 2021

Sat down to do a minimal repro (see below), and found some interesting things - I was unfamiliar with the environment of the test in question:

  • Contrary to what I wrote above, the test does run in MARS, always.
  • The test suite actually has the test reuse the same DbConnection instance - in parallel - rather than separate DbConnection instances per thread. I am not sure to what extent this is a supported feature - to my understand MARS was about being able to execute a new command while an existing reader is still open - but that's not the same as making the connection thread-safe/concurrent.
  • However, this does seem to work 100% reliably on Windows with unmanaged networking (which is why our test generally passes well).
  • Even with managed networking (both Linux and Windows), both SqlClient 2.1.3 and 3.0.0 usually execute this successfully - an error will only occur after many iterations.
  • In 4.0.0-preview.1, some sort of change makes this block or error very quickly.

Stepping back... if this (IMHO questionable!) practice is officially supported, then you guys seem to have a bug in current versions, which manifests much more reliably/often in 4.0.0-preview.1. Since this seems to work flawlessly on Windows with unmanaged networking, I suspect users porting to non-Windows platforms are hitting this from time to time.

See below for the repro: leave it running against 3.0.0 and 4.0.0-preview.1, and also compare with unmanaged networking.

Repro
AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows", true);
ThreadPool.SetMinThreads(200, 200);

const string ConnectionString =
    "Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false;Multiple Active Result Sets=true";

using (var conn = new SqlConnection(ConnectionString))
{
    conn.Open();
    using var cmd = new SqlCommand
    {
        Connection = conn,
        CommandText = @"
DROP TABLE IF EXISTS [Lists];
CREATE TABLE [Lists] (
    [Id] INTEGER,
    [IsDeleted] BIT
)"
    };

    cmd.ExecuteNonQuery();
}

while (true)
{
    var stopWatch = Stopwatch.StartNew();

    using var conn = new SqlConnection(ConnectionString);
    conn.Open();

    Parallel.For(
        0, 100,
        i =>
        {
            using var cmd = new SqlCommand
            {
                Connection = conn,
                CommandText = @"
SELECT [l].[Id], [l].[IsDeleted]
FROM [Lists] AS [l]
WHERE ([l].[IsDeleted] = CAST(0 AS bit)) AND [l].[Id] IN (1, 2, 3)"
            };

            using var reader = cmd.ExecuteReader();
            // using var context = contextFactory.CreateContext();
            // var query = context.Lists.Where(l => !l.IsDeleted && ids.Contains(l.Id)).ToList();
        });

    Console.WriteLine($"Completed in {stopWatch.ElapsedMilliseconds}ms");
}

@roji roji changed the title Extreme slowdown in parallelized EF Core test when switching from 3.0.0 to 4.0.0-preview1.21237.2 Concurrent connection usage and MARS Sep 1, 2021
@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

Contrary to what I wrote above, the test does run in MARS, always.

That's a relief, sort of. It means that longstanding behaviour I don't know how to fix is still the problem and that we don't have another.

The test suite actually has the test reuse the same DbConnection instance - in parallel - rather than separate DbConnection instances per thread.

Ew. Not supported. If you were actually making concurrent calls to anything other than Open I expect you'd see really scary exceptions that warn about concurrent usage but since this is about Open it's not unsafe or easily detectable until you've opened.

In 4.0.0-preview.1, some sort of change makes this block or error very quickly.

That should make it easier to trace which is good.

@roji
Copy link
Member Author

roji commented Sep 1, 2021

Ew. Not supported.

FWIW I completely agree it shouldn't (and it's even documented that way) - so this is our bad on the EF Core.

But I'm a little worried by the fact that the scenario works perfectly on unmanaged Windows - it may indicate that thread-safety was somehow an original goal. But more importantly, the distinction between thread-safety and multiple result sets is a bit subtle for the unsuspecting new user, and I'm guessing there's lots of code out there which does concurrent usage (and would break when ported.

In 4.0.0-preview.1, some sort of change makes this block or error very quickly.

Definitely agree.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

From what I remember mars is there to allow sloppy programming. COM ADO had a fun behaviour whereby is you had an open ResultSet and tried to open another it would use a new thread to open a second one for you silently in the background and handle the marshalling for you (at least in vb6) which let you do things like open a transaction and then start updating one resultset while opening a second resultset inside that same transaction to lookup data.

People liked this because it means that the code to do such things was a lot smaller than having to think and code fully transactionally and deal with rollbacks etc. As you can imagine it got into all sorts of Line Of Business and asp programs. When the migration from .net was being pushed it was a major upgrade blocker if you couldn't do the same thing on .net so it had to be implemented. I think MARS is a terrible idea and that no-one should use it. Sadly there's no way anyone will ever let me take it out of the driver.

Mars in SqlClient strictly applies to allowing multiple SqlDataReader instances to be active at one time on the same connection and works by multiplexing the traffic using the MC-SMP protocol. In practical terms it changes the way data is received by decoupling the packet receipt loop from the reader because otherwise you can deadlock the two SqlDataReaders, the receive loop with Mars enabled is implemented with async callback so just having mars enabled even if you aren't using it puts you at the mercy of threadpool exhaustion.

TLDR; just don't use mars, it's not the right solution to whatever problem you're having.

As far as thread safety goes. I'm not aware of anything being thread safe in SqlClient, but also nothing cares about what thread it's running on (apart from connecting with impersonation) so as long as your access is sequential you're unlikely to have a problem. I suspect the test case is serializing connection attempts at the pool level so even though the caller is concurrent the internals are likely sequential so you see no error. You'll probably want an official answer on this, I'm just recalling what I've seen in the code.

@cheenamalhotra
Copy link
Member

This is indeed #422.

TLDR; just don't use mars, it's not the right solution to whatever problem you're having.

I wouldn't say that because this is a client design issue, I have a feeling it's related to SNIMarsHandle queue design. And the problem is not with multiple data reader design, problem is when connection.Open() is called with MARS enabled in managed SNI.

We need to remember MARS is enabled using "async" reads, which uses the blocking queue design and that design has been my suspicion is the culprit. But the bigger challenge is we've only known this 1 design pattern in Managed SNI, so we need to really break down the design pattern and come up with something that would solve this sync/async contradiction or do things synchronously in that space.

@JRahnama
Copy link
Contributor

JRahnama commented Sep 1, 2021

we need to really break down the design pattern and come up with something that would solve this sync/async contradiction or do things synchronously in that space.

I agree. IMHO, I think we should separate sync and async calls. async is contagious and effects the entire environment ( not sure but thread pool management could be included in the environment change) , plus in MS docs it has been suggested as the best practices to avoid mixing sync/async, but on the other hand if we do so, we will end up having two sets of codes and it is hard to maintain. I remember that @roji mentioned in some other PR that they are maintaining sync/async in the very same code in their product with an async validator boolean, when async, for example, is true other part of the code runs. if I am not mistaken, I wonder if they ever noticed any issue with this pattern?

@cheenamalhotra
Copy link
Member

The problems trigger with #933 as exceptions start to occur.
@Wraith2 could you investigate design changes made that could contribute to this?

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

It immediately hits an assert in debug mode
SNIMarsHandle.HandleRecieveComplete should be called while holding the SNIMarsConnection.DemuxerSync because it can cause deadlocks so yeah, saw that one coming 😁

Add this to the list of reasons we need to be able to run tests in debug mode.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

@roji what exception are we expecting to see here? or aren't we? In debug mode the only problem it's having is caused by the fast re-use of a single session. The session identifier is limited to a 16 bit integer and if you create and destroy sessions fast enough you eventually wrap around and hit a number that's already in use which will bubble up as a connection failure.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Sep 1, 2021

If you run the repro from Shay, you won't get exception using v3.0.0, and queries continue (even though slowly) without exception.
I think there's some kind of deadlock with new changes that happens randomly, and code terminates abruptly with exception.

I'd say compare behavior before and after your change to understand it's impact.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

I'm using Main which is 4.0 preview currently. No crash. I get a slow start because I'm not forcing the threadpool min as I wanted to see any long blocking behaviour but the only crash is from exceeding 65000+ sessions.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Sep 1, 2021

image
Left > With head just before #933 was committed.(6a5a670)
Right > With latest "main"

Running repro on .NET 5 with Managed SNI enabled.
The only change in my repro is I commented out ThreadPool.SetMinThreads(200, 200); to use default settings.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

Ok, that's just normal timing out which is what you'd expect. It looks like I can just run it fast enough to succeed most of the time. I've seen large numbers of threads waiting

which is expected behaviour. They've requested a packet and are waiting for the dispatch loop to wake them, it can't because it can't be scheduled due to lack of threads and we have no way to give it any sort of high priority because tasks don't have priority.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Sep 1, 2021

I don't think that's a fair analysis or "normal" behavior.

The exception is consistently reproducible with new change and the code also "consistently" passes with previous design.
So it is a regression that needs to be addressed.

There's likely a deadlock in SNI layer, if threads are waiting and not receiving packets.

@roji
Copy link
Member Author

roji commented Sep 1, 2021

@roji what exception are we expecting to see here? or aren't we?

So I think it's OK to say "if you use SqlConnection concurrently, the behavior is undefined" (though again, keep in mind this seems totally supported on unmanaged Windows). A better option is to actually detect these cases and throw a "connection used concurrently" InvalidOperationException, which would really help users find errors in their code. Npgsql does this but in a way that isn't 100% reliable, and I've generally lost a lot of time over the years helping users finding their accidental concurrent usage.

If you run the repro from Shay, you won't get exception using v3.0.0, and queries continue (even though slowly) without exception.

This was true for me - but only on unmanaged Windows. 3.0.0 with managed does eventually throw an exception for me, though it took a lot more time than 4.0.0; if that's true, then there may not be an actual regression here...

I remember that @roji mentioned in some other PR that they are maintaining sync/async in the very same code in their product with an async validator boolean, when async, for example, is true other part of the code runs. if I am not mistaken, I wonder if they ever noticed any issue with this pattern?

Yeah, this is how Npgsql is architected, and it has worked quite well - it's a good way to fully support sync and async without code duplication, and you can still split implementations on a per-function basis where you need to.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 1, 2021

I don't think that's a fair analysis or "normal" behavior.

I look forward to seeing your analysis.

There's likely a deadlock in SNI layer, if threads are waiting and not receiving packets.

I can't see one or any evidence of anything other than behaviour consistent with not being able to schedule a task.

It took me months to get it this far and then more months to get it reviewed. If there is evidence of a problem with it then I'm happy to help out but I can't see any. There's a performance problem but lacking a reproduction of that problem and not just #422 I can't do much about it.

I'm pretty sure the assertions of demuxer lock entry in the SNIMarsHandle methods are outdated and can be removed, the demuxer lock only has to guard the annotated variables and anything that sends a packet including the SNIMarsHandle ctor because someone was stupid when they wrote it.

@JRahnama
Copy link
Contributor

JRahnama commented Sep 3, 2021

I have tried @roji's repro with version 3.0.0 and it works fine, but with the newest preview it fails right away as below:

image

@Wraith2 I am afraid, but I believe this is a regression and has eliminated the only workaround we had for issue 422.

Something interesting: I was trying to compare behavior with PLINQ, ( the below code) as the original issue #422 was with PLINQ and Parallel.For was the provided workaround. It was working slow, but was running till completion. I was just trying to rearrange the code to add more to it. I surprisingly noticed it works fine with the latest release. This might be because of my mistake. I really appreciate if anybody can try it and confirm or deny my findings.

Repro
        static void Main(string[] args)
        {
            AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows", true);
            //ThreadPool.SetMinThreads(200, 200);

            SqlConnectionStringBuilder builder = new()
            {
                DataSource = "localhost",
                InitialCatalog = <database name>,
                UserID = "sa",
                Password = <your password>,
                ConnectTimeout = 60,
                ConnectRetryCount = 0,
                Encrypt = false,
                MultipleActiveResultSets = true
            };
            while (true)
            {
                RunWithParallelFor(builder.ConnectionString);
            }
            //RunWithPLinq(builder.ConnectionString);
        }

        private static void RunWithPLinq(string connString)
        {

            using SqlConnection conn = new(connString);

            conn.Open();
            Enumerable.Range(0, count).
               AsParallel().
               WithDegreeOfParallelism(count).
               ForAll(n => RunSqlCommand(conn));
        }

        private static void RunWithParallelFor(string connString)
        {
            using var conn = new SqlConnection(connString);

            conn.Open();

            Parallel.For(
                0, 100,
                i =>
                {
                    RunSqlCommand(conn);
                });
        }

        private static void RunSqlCommand(SqlConnection conn)
        {
            var stopWatch = Stopwatch.StartNew();
            using var cmd = new SqlCommand
            {
                Connection = conn,
                CommandText = @"
                                 SELECT [l].[Id], [l].[IsDeleted]
                                 FROM [Lists] AS [l]
                                 WHERE ([l].[IsDeleted] = CAST(0 AS bit)) AND [l].[Id] IN (1, 2, 3)"
            };
            try
            {
                using SqlDataReader reader = cmd.ExecuteReader();
                Console.WriteLine($"Current thread {Thread.CurrentThread.ManagedThreadId} Completed in {stopWatch.ElapsedMilliseconds}ms");
            }
            catch (SqlException ex)
            {
                Console.WriteLine(ex.Message);
            }

        }

        private static void TableMaintainer(string connString)
        {
            using var conn = new SqlConnection(connString);
            try
            {
                conn.Open();
                using var cmd = new SqlCommand()
                {
                    Connection = conn,
                    CommandText = @"
                                    DROP TABLE IF EXISTS [Lists];
                                    CREATE TABLE [Lists] (
                                    [Id] INTEGER,
                                    [IsDeleted] BIT
                                   )"
                };

                cmd.ExecuteNonQuery();
            }
            catch (SqlException ex)
            {
                Console.WriteLine(ex.Message);
            }
        }

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 3, 2021

@Wraith2 I am afraid, but I believe this is a regression

Yes, it is a regression, I thought that much was clear from the very start of the discussion.

Assuming it's from the mars rework is unproven. I agree that it's possible and maybe even likely but so far no evidence. It won't crash with anything other than duplicate key under a debugger for me and we're going to need a debugger to investigate.

and has eliminated the only workaround we had for issue 422.

What workaround for #422? the only workaround I knew of was to not starve async of threads.

I managed to implement a custom scheduler and get it working so that all mars async io is done on dedicated threads. Still incredibly slow and the profiler shows the contention time allocated to the _parserLock which makes sense to me since we've got up to 100 threads contending for the same parser.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 3, 2021

Here's what i see on current main:

Overall:
image

breakdown of lock contention:

image

@roji
Copy link
Member Author

roji commented Sep 3, 2021

I have tried @roji's repro with version 3.0.0 and it works fine, but with the newest preview it fails right away as below:

Just to note that running my repro above against 3.0.0 (with managed networking) does fails - it just takes a bit of time (the program eventually blocks and then throws unpredictably). So I'm not sure there's a regression here in 4.0.0-preview.1.

@Wraith2
Copy link
Contributor

Wraith2 commented Sep 3, 2021

Interestingly if you manage to force mars async io off threadpool threads and onto dedicated threads the number of thredpool threads required (and created) drops significantly. The time taken to execute doesn't drop because the majority of it is taken up on the parser lock but in general the profile looks a lot cleaner. I couldn't work out how to do this before but working through some of the postgres issues you had with the context switcher let me learn what I needed.

image

@cheenamalhotra cheenamalhotra added 💥 Regression Issues that are regressions introduced from earlier PRs. 📈 Performance Issues that are targeted to performance improvements. labels Sep 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💥 Regression Issues that are regressions introduced from earlier PRs. 📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants