Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved general connection pool #1814

Conversation

DavoudEshtehari
Copy link
Contributor

While investigating issue #1675, I noticed that the underlying data structure used at DBConnectionPool for storing internal connection objects is a Stack, which won't be the best choice in a high-demand multi-thread application.

@DavoudEshtehari DavoudEshtehari added the 💡 Enhancement Issues that are feature requests for the drivers we maintain. label Oct 24, 2022
@Wraith2
Copy link
Contributor

Wraith2 commented Oct 25, 2022

Imagine a situation where you have an application which uses a large number of connections for a short time meaning that the pool has a relatively high population and then the app calms down to a steady state there it uses connections more slowly.

In this case with a stack a small number of connections will be pushed and popped off the stack regularly and then the other items will get older and eventually be removed by the pool cleanup because they're inactive.

In this case with a queue all connections are kept alive. You will retain all of the items in the pool not aging any out (depending on the number of items in the pool) so pools will be more highly populated.

@codecov
Copy link

codecov bot commented Oct 25, 2022

Codecov Report

Base: 71.38% // Head: 71.34% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (f1aa0d4) compared to base (c6821c3).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1814      +/-   ##
==========================================
- Coverage   71.38%   71.34%   -0.04%     
==========================================
  Files         290      290              
  Lines       61236    61234       -2     
==========================================
- Hits        43712    43690      -22     
- Misses      17524    17544      +20     
Flag Coverage Δ
addons 92.38% <ø> (ø)
netcore 74.98% <100.00%> (-0.01%) ⬇️
netfx 69.24% <100.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...rc/Microsoft/Data/ProviderBase/DbConnectionPool.cs 85.00% <100.00%> (-0.20%) ⬇️
...rc/Microsoft/Data/ProviderBase/DbConnectionPool.cs 85.90% <100.00%> (-0.32%) ⬇️
...ata/SqlClient/SqlConnectionTimeoutErrorInternal.cs 30.35% <0.00%> (-14.29%) ⬇️
...Microsoft/Data/ProviderBase/DbConnectionFactory.cs 17.04% <0.00%> (-3.41%) ⬇️
...Client/Reliability/Common/SqlRetryLogicProvider.cs 89.13% <0.00%> (-2.18%) ⬇️
...crosoft/Data/SqlClient/SqlInternalConnectionTds.cs 72.95% <0.00%> (-0.40%) ⬇️
...netcore/src/Microsoft/Data/SqlClient/SqlCommand.cs 83.44% <0.00%> (-0.33%) ⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 81.65% <0.00%> (-0.30%) ⬇️
...ient/netfx/src/Microsoft/Data/SqlClient/SqlUtil.cs 58.04% <0.00%> (-0.25%) ⬇️
... and 8 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@David-Engel
Copy link
Contributor

Imagine a situation where you have an application which uses a large number of connections for a short time meaning that the pool has a relatively high population and then the app calms down to a steady state there it uses connections more slowly.

In this case with a stack a small number of connections will be pushed and popped off the stack regularly and then the other items will get older and eventually be removed by the pool cleanup because they're inactive.

In this case with a queue all connections are kept alive. You will retain all of the items in the pool not aging any out (depending on the number of items in the pool) so pools will be more highly populated.

I've been looking at CleanupCallback() trying think about your scenario and I don't think that's how it works. The pool pruning logic doesn't care when a connection was last used. If it is idle in the pool when the method runs, it is subject to cleanup regardless of when it was last used. Yes, using a stack (as-is) will favor keeping the last used connection as the most used and least likely to be cleaned up, but changing to a queue won't keep idle connections from being pruned.. It will just more evenly distribute connection use across all of the items in the pool.

@Wraith2
Copy link
Contributor

Wraith2 commented Oct 25, 2022

It has been some time since I tried to understand the connection pool logic and it's very possible that I misunderstood the structure. I thought this was how it worked. Since you've looked through the code with that scenario in mind and say it's fine then I'm ok with the change. I just wanted to flag up what I thought might be the reason for the stack.

@roji
Copy link
Member

roji commented Oct 25, 2022

@DavoudEshtehari am on vacation at the moment, and it's once I'm back it's very unlikely I'll be able to look at this before .NET Conf (at least November 10th). I'll happy to take a look afterwards though, if that's not too far away.

@DavoudEshtehari
Copy link
Contributor Author

@roji thank you for the prompt reply. I wish you the best with .NET Conf, and I hope we could gain the benefit of your thoughts as well.

@JRahnama
Copy link
Contributor

/AzurePipelines run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Nov 16, 2022

@DavoudEshtehari have you measured performance by running parallel async testing with these changes? I tried but I saw the current implementation being slightly more performant. You may want to benchmark to have consistent results (incl non-pooled connections).

Here's my repro (just a basic app): Test SqlConnection.OpenAsync performance. Feel free to modify and add output per connection attempt, and the difference would be visible when connecting to Azure SQL.

This is an important use case I would recommend keeping close and not regressing.

@cheenamalhotra
Copy link
Member

Looking at the changes, I don't think this should make any significant difference, negative or positive, so you may disregard my last comment. Slight movement in results is possible due to network jitter so that should be fine.

Stacks should be preferred for pool when you're looking at high usability, such that when accessing a pool element, it need not move other items around unlike queue, so time taken should be less to fetch/add an element in a heavy pool, I think. Applications creating and pooling large number of connections would most likely notice that, so I would be considerate of that.

I would like to understand why you think changing it to queue/bag would benefit a multi-threading app; in case I'm missing a logical reasoning?

@DavoudEshtehari
Copy link
Contributor Author

Stacks should be preferred for pool when you're looking at high usability, such that when accessing a pool element, it need not move other items around unlike queue, so time taken should be less to fetch/add an element in a heavy pool, I think. Applications creating and pooling large number of connections would most likely notice that, so I would be considerate of that.

To compare the Stack and Queue, I don't have internal implementation knowledge of them. So, I put it for those who have such ability. Even though they don't have significant perf differences, I believe the benefit of using it on a multi-threaded app can be beneficial.

I would like to understand why you think changing it to queue/bag would benefit a multi-threading app; in case I'm missing a logical reasoning?

When I was running the repro, I noticed a reduction in the number of failures, plus a difference in the stack trace. You can find it by referring to the originally reported stacktrace and the result of this change.
Also, from the logs, I found the failure occurs when a connection's taken by a new thread that was owned by another thread before returning to the pool.

@roji
Copy link
Member

roji commented Nov 17, 2022

This is something we looked into in Npgsql a while ago as well. A long time ago, the pool was implemented as a stack to promote good pruning (for the same reasoning @Wraith2 explains in this comment). However, we ended up moving away from that, and not relying on the data structure for pruning. In other words, the current pruning algorithm samples the idle/busy state several times, and if it determines that pruning should occur, closes any idle connection (we don't care whether it's one that happened to get handed out recently or not).

The current Npgsql implementation uses System.IO.Channels, which is a conceptual queue that always manages efficient handing off of elements (connections) across threads, as well as efficiently waiting when the queue is empty (i.e. Max Pool Size has been reached).

In any case, I'd highly recommend measuring perf via benchmarking before making any actual changes. Would also be happy to continue talking about this topic (pooling implementation is very interesting and not at all trivial).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💡 Enhancement Issues that are feature requests for the drivers we maintain.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants