Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a NullPool for external connections by default #4251

Merged
merged 1 commit into from
Jan 23, 2018

Conversation

mistercrunch
Copy link
Member

Currently, even though get_sqla_engine calls get memoized, engines are
still short lived since they are attached to an models.Database ORM
object. All engines created through this method have the scope of a web
request.

Knowing that the SQLAlchemy objects are short lived means that
a related connection pool would also be short lived and mostly useless.
I think it's pretty rare that connections get reused within the context
of a view or Celery worker task.

We've noticed on Redshift that Superset was leaving many connections
opened (hundreds). This is probably due to a combination of the current
process not garbage collecting connections properly, and perhaps the
absence of connection timeout on the redshift side of things. This
could also be related to the fact that we experience web requests timeouts
(enforced by gunicorn) and that process-killing may not allow SQLAlchemy
to clean up connections as they occur (which this PR may not help
fixing...)

For all these reasons, it seems like the right thing to do to use
NullPool for external connection (but not for our connection to the metadata
db!).

Opening the PR for conversation. Putting this query into our staging
today to run some tests.

Currently, even though `get_sqla_engine` calls get memoized, engines are
still short lived since they are attached to an models.Database ORM
object. All engines created through this method have the scope of a web
request.

Knowing that the SQLAlchemy objects are short lived means that
a related connection pool would also be short lived and mostly useless.
I think it's pretty rare that connections get reused within the context
of a view or Celery worker task.

We've noticed on Redshift that Superset was leaving many connections
opened (hundreds). This is probably due to a combination of the current
process not garbage collecting connections properly, and perhaps the
absence of connection timeout on the redshift side of things. This
could also be related to the fact that we experience web requests timeouts
(enforced by gunicorn) and that process-killing may not allow SQLAlchemy
to clean up connections as they occur (which this PR may not help
fixing...)

For all these reasons, it seems like the right thing to do to use
NullPool for external connection (but not for our connection to the metadata
db!).

Opening the PR for conversation. Putting this query into our staging
today to run some tests.
@hughhhh
Copy link
Member

hughhhh commented Jan 23, 2018

🚢

@mistercrunch
Copy link
Member Author

Tested in our staging, connections appear to have dramatically reduce in our Redshift cluster.

@mistercrunch mistercrunch merged commit 4b11f45 into apache:master Jan 23, 2018
@mistercrunch mistercrunch deleted the nullpool branch January 23, 2018 23:13
mistercrunch added a commit that referenced this pull request Jan 27, 2018
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 24, 2018
Currently, even though `get_sqla_engine` calls get memoized, engines are
still short lived since they are attached to an models.Database ORM
object. All engines created through this method have the scope of a web
request.

Knowing that the SQLAlchemy objects are short lived means that
a related connection pool would also be short lived and mostly useless.
I think it's pretty rare that connections get reused within the context
of a view or Celery worker task.

We've noticed on Redshift that Superset was leaving many connections
opened (hundreds). This is probably due to a combination of the current
process not garbage collecting connections properly, and perhaps the
absence of connection timeout on the redshift side of things. This
could also be related to the fact that we experience web requests timeouts
(enforced by gunicorn) and that process-killing may not allow SQLAlchemy
to clean up connections as they occur (which this PR may not help
fixing...)

For all these reasons, it seems like the right thing to do to use
NullPool for external connection (but not for our connection to the metadata
db!).

Opening the PR for conversation. Putting this query into our staging
today to run some tests.
wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
Currently, even though `get_sqla_engine` calls get memoized, engines are
still short lived since they are attached to an models.Database ORM
object. All engines created through this method have the scope of a web
request.

Knowing that the SQLAlchemy objects are short lived means that
a related connection pool would also be short lived and mostly useless.
I think it's pretty rare that connections get reused within the context
of a view or Celery worker task.

We've noticed on Redshift that Superset was leaving many connections
opened (hundreds). This is probably due to a combination of the current
process not garbage collecting connections properly, and perhaps the
absence of connection timeout on the redshift side of things. This
could also be related to the fact that we experience web requests timeouts
(enforced by gunicorn) and that process-killing may not allow SQLAlchemy
to clean up connections as they occur (which this PR may not help
fixing...)

For all these reasons, it seems like the right thing to do to use
NullPool for external connection (but not for our connection to the metadata
db!).

Opening the PR for conversation. Putting this query into our staging
today to run some tests.
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0 labels Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 0.23.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants