-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using a NullPool for external connections by default #4251
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Currently, even though `get_sqla_engine` calls get memoized, engines are still short lived since they are attached to an models.Database ORM object. All engines created through this method have the scope of a web request. Knowing that the SQLAlchemy objects are short lived means that a related connection pool would also be short lived and mostly useless. I think it's pretty rare that connections get reused within the context of a view or Celery worker task. We've noticed on Redshift that Superset was leaving many connections opened (hundreds). This is probably due to a combination of the current process not garbage collecting connections properly, and perhaps the absence of connection timeout on the redshift side of things. This could also be related to the fact that we experience web requests timeouts (enforced by gunicorn) and that process-killing may not allow SQLAlchemy to clean up connections as they occur (which this PR may not help fixing...) For all these reasons, it seems like the right thing to do to use NullPool for external connection (but not for our connection to the metadata db!). Opening the PR for conversation. Putting this query into our staging today to run some tests.
🚢 |
Tested in our staging, connections appear to have dramatically reduce in our Redshift cluster. |
mistercrunch
added a commit
that referenced
this pull request
Jan 27, 2018
This reverts commit 4b11f45.
michellethomas
pushed a commit
to michellethomas/panoramix
that referenced
this pull request
May 24, 2018
Currently, even though `get_sqla_engine` calls get memoized, engines are still short lived since they are attached to an models.Database ORM object. All engines created through this method have the scope of a web request. Knowing that the SQLAlchemy objects are short lived means that a related connection pool would also be short lived and mostly useless. I think it's pretty rare that connections get reused within the context of a view or Celery worker task. We've noticed on Redshift that Superset was leaving many connections opened (hundreds). This is probably due to a combination of the current process not garbage collecting connections properly, and perhaps the absence of connection timeout on the redshift side of things. This could also be related to the fact that we experience web requests timeouts (enforced by gunicorn) and that process-killing may not allow SQLAlchemy to clean up connections as they occur (which this PR may not help fixing...) For all these reasons, it seems like the right thing to do to use NullPool for external connection (but not for our connection to the metadata db!). Opening the PR for conversation. Putting this query into our staging today to run some tests.
wenchma
pushed a commit
to wenchma/incubator-superset
that referenced
this pull request
Nov 16, 2018
Currently, even though `get_sqla_engine` calls get memoized, engines are still short lived since they are attached to an models.Database ORM object. All engines created through this method have the scope of a web request. Knowing that the SQLAlchemy objects are short lived means that a related connection pool would also be short lived and mostly useless. I think it's pretty rare that connections get reused within the context of a view or Celery worker task. We've noticed on Redshift that Superset was leaving many connections opened (hundreds). This is probably due to a combination of the current process not garbage collecting connections properly, and perhaps the absence of connection timeout on the redshift side of things. This could also be related to the fact that we experience web requests timeouts (enforced by gunicorn) and that process-killing may not allow SQLAlchemy to clean up connections as they occur (which this PR may not help fixing...) For all these reasons, it seems like the right thing to do to use NullPool for external connection (but not for our connection to the metadata db!). Opening the PR for conversation. Putting this query into our staging today to run some tests.
mistercrunch
added
🏷️ bot
A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels
🚢 0.23.0
labels
Feb 27, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently, even though
get_sqla_engine
calls get memoized, engines arestill short lived since they are attached to an models.Database ORM
object. All engines created through this method have the scope of a web
request.
Knowing that the SQLAlchemy objects are short lived means that
a related connection pool would also be short lived and mostly useless.
I think it's pretty rare that connections get reused within the context
of a view or Celery worker task.
We've noticed on Redshift that Superset was leaving many connections
opened (hundreds). This is probably due to a combination of the current
process not garbage collecting connections properly, and perhaps the
absence of connection timeout on the redshift side of things. This
could also be related to the fact that we experience web requests timeouts
(enforced by gunicorn) and that process-killing may not allow SQLAlchemy
to clean up connections as they occur (which this PR may not help
fixing...)
For all these reasons, it seems like the right thing to do to use
NullPool for external connection (but not for our connection to the metadata
db!).
Opening the PR for conversation. Putting this query into our staging
today to run some tests.