Idle database connections not reaped #871

andrewgodwin · 2018-02-06T06:38:50Z

I suspect this is due to the lack of calling close_old_connections, as seen in Channels 1 here: https://github.com/django/channels/blob/1.x/channels/signals.py#L10

andrewgodwin · 2018-02-06T06:40:37Z

@michaelhays If you want to verify this would fix it, you can try calling close_old_connections yourself at the end of your consumer's method (but likely not in disconnect, as django/daphne#152 will prevent that). I'll look at solving both issues later in the week.

michaelhays · 2018-02-06T23:45:07Z

close_old_connections didn't work for me as I used it. I tried it in a couple of places.

End of the consumer method:

# consumers.py
class ChatbotConsumer(JsonWebsocketConsumer):
    def connect(self):
        self.accept()

    def receive_json(self, content):
        # create Chat in scope if one doesn't exist
        if 'chat' not in self.scope:
            chat = Chat.objects.create_chat(self.scope['user'])
            self.scope['chat'] = chat

        # process chatbot response, store messages, and send response to client
        response_data = some_func(content['message'])
        self.scope['chat'].add_message(content['message'])
        self.scope['chat'].add_message(response_data)
        self.send_json(response_data)
        close_old_connections()

End of the model method:

# models.py
class Chat(models.Model):
    body = JSONField()
    ...

    def add_message(self, message):
        self.body.append(message)
        self.save()
        close_old_connections()

Setting CONN_MAX_AGE to 0 has helped a bit, in that certain actions like reloading the page removes 2 or 3 of the "COMMIT" idle connections (but, only sometimes?), and occasionally, the idle connections are reaped. But I've yet to notice any rhyme or reason to it.

andrewgodwin · 2018-02-07T03:31:15Z

Probably need the startup method connected as well. I'll look into it more tomorrow when I have time to work on Channels!

andrewgodwin · 2018-02-07T23:11:50Z

OK, I have replicated this, and the temporary workaround is calling django.db.close_old_connections right after you do an ORM call, inside the same sync context (calling it anywhere else hides the connections due to thread safety).

Working on a proper fix now.

andrewgodwin · 2018-02-08T01:12:28Z

OK, this should be fixed in 6afd423 - any code inside of a SyncConsumer will work fine, and anything you're calling from an async context should use the new channels.db.database_sync_to_async (can either be called with a function or be used as a function/method decorator). You'll also need the latest release of asgiref (2.1.4) to make method decorators work correctly.

If someone could confirm this fixes it for them, I can close this.

michaelhays · 2018-02-08T01:21:05Z

Confirmed to fix the issue for me!

andrewgodwin · 2018-02-08T01:33:52Z

Great! I'll put this out in a release in the next day or two, depending on what else I land.

LeonidIvanov · 2018-03-01T08:07:10Z

Hey, @andrewgodwin ! I'm steel having the same problem. And it's not only about consumer. I was following your tutorial (multichat). I have database_sync_to_async decorator that covers get_room_or_error in chat.utils. The main thing is that every page refresh (not only chat page) creates new database connection! Django + PostgreSQL + Daphne + Heroku

andrewgodwin · 2018-03-01T08:12:24Z

If you can reduce it to a simple set of code that reliably reproduces the problem, using the latest versions of all of the repositories, then please open a new issue with that code snippet, and steps to replicate locally (I can't debug things that only happen in Heroku)

LeonidIvanov · 2018-03-01T08:52:44Z

It's exact same issue as @michaelhays described at django/daphne#163. In my case there is no such issue in local but it is in Heroku.
And I understand that you can't debug things that only happen in Heroku... But there's no support where I can text with that issue. You have more influence to report them about it as more people who updating from channels 1.0 to 2.0 will suffer this problem...

LeonidIvanov · 2018-03-01T12:34:08Z

I just recreated everything on your clear multichat example here and deployed to heroku https://django-channels-2-test.herokuapp.com/ (user: demo, pass: demo) and get same issue. Refresh page 20 times and you'll get

OperationalError at /
FATAL:  too many connections for role "jdknihvnjyczlm"

andrewgodwin · 2018-03-01T18:36:25Z

Can you confirm what exact versions of each channels package you're running using a pip freeze from the Heroku environment? I just tried this again locally with multichat and verified that it only ever opens a single database connection.

LeonidIvanov · 2018-03-01T20:32:05Z

aioredis==1.0.0
asgiref==2.1.6
async-timeout==2.0.0
attrs==17.4.0
autobahn==17.10.1
Automat==0.6.0
channels==2.0.2
channels-redis==2.1.0
constantly==15.1.0
daphne==2.0.4
dj-database-url==0.4.2
Django==2.0.2
hiredis==0.2.0
hyperlink==18.0.0
idna==2.6
incremental==17.5.0
msgpack==0.5.6
psycopg2==2.7.4
pytz==2018.3
six==1.11.0
Twisted==17.9.0
txaio==2.8.2
whitenoise==3.3.1
zope.interface==4.4.3

Looks like packages in heroku are exactly the same as in local env... And yes, the problem only appears at Heroku, works fine at local though...

andrewgodwin · 2018-03-01T20:41:46Z

Unless I can replicate it locally I'm afraid I can't fix it, sorry - I'm not sure what's different about the Heroku environment that would cause this.

kennethreitz · 2018-03-07T17:32:23Z

Hmmmm

burcaw · 2018-03-08T19:13:01Z

Same issue, Django + PostgreSQL + Daphne + Heroku.

Originally had this issue with the consumers, but was fixed with the prior release.

On any DB lookup within the Django app, postgres connections are left idle and never close. This occurs on any DB lookup, not the very narrow area where I use channels.

Relevant versions:
asgiref==2.2.0
channels==2.0.2
channels-redis==2.1.0
daphne==2.1.0
dj-database-url==0.5.0
Django==2.0
psycopg2==2.7.4

One additional item, I verified that these idle connections don't close after the conn_max_age 500 second have elapsed.

db_from_env = dj_database_url.config(conn_max_age=500)
DATABASES['default'].update(db_from_env)

UPDATE: Perhaps this article provides a clue: https://devcenter.heroku.com/articles/forked-pg-connections

kennethreitz · 2018-03-08T22:10:40Z

what happens if you drop conn_max_age from the dj_database_url call?

seanwiseman · 2018-03-15T18:57:40Z

I was experiencing the exact same problem @burcaw described. I can confirm @kennethreitz, dropping conn_max_age from the dj_database_url.config call has resolved the issue for me. No more idle db connections are left open 👍.

kennethreitz · 2018-03-15T19:09:05Z

✨🍰✨

bsr3000 · 2018-04-25T11:46:03Z

aioredis==1.0.0
asgi-redis==1.3.0
asgiref==2.3.0
async-timeout==2.0.1
attrs==17.4.0
autobahn==18.4.1
Automat==0.6.0
channels==2.1.1
channels-redis==2.1.1
daphne==2.1.1
Django==1.11
psycopg2==2.7.4
six==1.11.0
Twisted==17.9.0
txaio==2.9.0
txredisapi==1.4.4
ujson==1.35
uWSGI==2.0.15
zope.interface==4.4.3

Hello, I have encountered the same issue as @burcaw , Nginx + Uwsgi(WSGI Layer) + Daphne (only ASGI Layer), I have already updated all packages to latest ones, but even when I dropped CONN_MAX_AGE, it creates and keeps idle connections to database, without terminating them, so if few clients connect to websockets, postgres reaches the max connection limit.

Is there any workaround for current issue?

Thanks in advance

andrewgodwin · 2018-04-25T12:44:35Z

If you are still having the issue it means you are using code that talks to a database outside of database_sync_to_async. Make sure all your database access is wrapped in that (consumer methods on sync consumers already are).

bsr3000 · 2018-04-26T10:51:58Z

Thanks for your reply, I have double checked my code and didn`t found any database access in methods without database_sync_to_async decorator.

So the problem was in my server configuration and ASGI_THREADS env var:

I didn`t set an environment variable in docker image
I had 10 instances of daphne, with max workers set to None (if I understand well, each daphne had it's own Threadpool set to default max_workers (os.cpu_count() or 1) * 5 ) and the test server a was running on os.cpu_count() was 16, so max worker count for each daphne server were 80, so that`s why I had so many connections to database.
Setting ASGI_THREADS and "CONN_MAX_AGE": 60 works fine now, each worker reuses connections to database and cleans everything well, on my current settings ASGI_THREADS=20 and 4 daphne instances on different ports, number of database connections doesn`t exceed more than 45, while load testing them using artillery.io

P.S Maybe it can be usefull to add some info about ASGI_THREADS var to documentation

Also one more question, while load testing with an external tool, sometimes I get

ERROR    Exception in callback AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287
handle: <TimerHandle when=1713807.003341125 AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/events.py", line 126, in _run
    self._callback(*self._args)
  File "/venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py", line 290, in run
    f(*args, **kwargs)
  File "/venv/lib/python3.5/site-packages/twisted/internet/tcp.py", line 289, in connectionLost
    protocol.connectionLost(reason)
  File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 128, in connectionLost
    self._connectionLost(reason)
  File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 2467, in _connectionLost
    WebSocketProtocol._connectionLost(self, reason)
  File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 1096, in _connectionLost
    self._onClose(self.wasClean, WebSocketProtocol.CLOSE_STATUS_CODE_ABNORMAL_CLOSE, "connection was closed uncleanly (%s)" % self.wasNotCleanReason)
  File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 171, in _onClose
    self.onClose(wasClean, code, reason)
  File "/venv/lib/python3.5/site-packages/daphne/ws_protocol.py", line 146, in onClose
    self.application_queue.put_nowait({
AttributeError: 'WebSocketProtocol' object has no attribute 'application_queue'

Is it a channels issue or just some problem in testing tool, that it doesn`t close the connections properly to ws?

SHxKM · 2019-04-29T10:59:50Z

For me, the only thing that worked was setting ASGI_THREADS (an an env var) to a very low number (6). Daphne never closes these connections by the way. I'm not sure if this is the expected behaviour, @andrewgodwin?

andrewgodwin · 2019-04-30T02:46:32Z

Daphne and Channels both don't have enough visibility into the threadpool system to close any connections in there, sadly. This is something that has to be remedied in Django itself, which I am in the process of.

vkolova · 2019-05-09T18:03:01Z

I removed conn_max_age, tried with close_old_connections, as well as setting ASGI_THREADS to an even lower number (3) and the issue is still here for me for both rest requests and websockets.

channels-redis==2.3.3
channels==2.1.7
dj-database-url==0.5.0
Django==2.1.7

LgArcida · 2019-06-04T19:51:22Z

Any update with this issue? @andrewgodwin

carltongibson · 2019-06-05T06:10:14Z

@LgArcida It's a big project. The first steps are made here: https://github.com/django/deps/blob/master/draft/0009-async.rst

joshmarlow · 2019-06-20T21:21:28Z

For those who keep finding this issue, here's what I did:

I added close_old_connections calls here and there (I had some custom authentication code where I was authenticating using DRF tokens). This helped but the problem still popped up immediately.
I started using pgbouncer (which I was planning on doing anyway) - https://devcenter.heroku.com/articles/python-concurrency-and-database-connections#limit-connections-with-pgbouncer

I then was able to send 1,000 messages through my channels code on Heroku and it all worked fine (previously I would have issues after 20-30 messages).

- ref django/channels#871 (comment)

SHxKM · 2019-09-23T21:34:33Z

I have to say I'm still a little confused by this issue, even though setting ASGI_THREADS to a low number has "solved" the issue for me. I'm using a synchronous consumer.

Where am I supposed to use close_old_connections()? Here's my workflow with channels:

User connects, starts a long-running celery task
Celery task writes to DB, possibly thousands of times, calls Channels consumer each time with details of (updated) message to be sent to client
Consumer sends a JSON string to the client each time a message is sent by Celery task, update is displayed to the user

Where does close_old_connections() go here?

Paulcappaert · 2020-02-24T18:42:05Z

I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()

carltongibson · 2020-02-26T13:04:36Z

@Paulcappaert I don’t suppose we have any idea what changed there do we?

Paulcappaert · 2020-02-26T14:45:25Z

@carltongibson No idea, all I know is that I made that single change and the problem went away.

carltongibson · 2020-02-26T14:59:06Z

Hmmm. Yes... 🙂 Thanks for the follow-up.

(asyncio is changing rapidly, still, but 🤷‍♂️)

sagardspeed2 · 2020-03-31T16:06:25Z

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

Paulcappaert · 2020-03-31T16:10:09Z

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

I updated the runtime to python 3.8.1

sagardspeed2 · 2020-03-31T16:15:44Z

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

I updated the runtime to python 3.8.1

ok thanks, i also updated runtime to python 3.8.2 and done

Hardeepsingh980 · 2020-04-12T07:44:39Z

I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()

You now what? i was struggling from last two days on this stupid error. I am glad you fixed it. Thanks a lot. I am really happy

SHxKM · 2020-04-12T07:51:13Z

I'm happy to see @Paulcappaert's suggestion to switch runtimes helped some here, unfortunately for some of us that's impossible as we may be using other libraries that don't support 3.8 yet. I for one rely on Celery for background processing and cannot really update.

andrewgodwin · 2020-04-12T16:21:03Z

For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.

SHxKM · 2020-04-13T04:14:56Z

For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.

OK, just a heads up that from my experience - that's not true. I was (and still am) using Channels 1, and when I upgraded to Python 3.7.2 to enjoy a Celery feature, I also had to upgrade to Channels 2 (all sorts of dependency clashes). That's when I encountered this issue again, and reverted my whole setup so Channels 2 didn't kill my server. I am hoping to give this upgrade another chance soon enough, after multiple failed attempts.

I'm hosted at Heroku; YMMV.

andrewgodwin · 2020-04-13T16:59:58Z

Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.

Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest asgiref release, but Channels does not use the Local implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.

SHxKM · 2020-04-13T17:03:26Z

Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.

Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest asgiref release, but Channels does not use the Local implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.

Let me clarify. Maybe I shouldn't have mentioned Channels 1 at all. What I was talking about is Channels 2 on Python 3.7.2, it's why I had to revert back to Channels 1, which is working perfectly fine.

phamhm · 2020-04-23T16:39:56Z

I'm using Python 3.7.6 on heroku. My main server runs fine on with guinicorn. I observed this issue when I run daphne on my dev server.

On my dev server, I incorporated Channels 2 + Websocket and daphne into my app and saw this issue whenever I refreshed my page. I changed my app to use AJAX instead and still see this issue. I only have one view function that return JSONresponse for the AJAX call. That AJAX view function was decorated with 'login_required' and a database update. So i did the following:

Using AJAX + daphne:

comment out the database update in my Ajax function => same issue with multiple ajax calls.
comment out the login_require decorator on my ajax function=> no fatal on multiple ajax calls.

Using websocket(Channels2) + daphne

20+ Websocket calls with no database update in WebSocketConsumer=> no fatal
20+ Websocket calls with database update in WebSocketConsumer => fatal
Refresh the login_required view 20+ times: fatal.

I looked at the heroku logs and found that in all instances, the fatal occurred when Django was doing session validation. I don't observe this behavior on my main server when I refresh the login_required view 20+ times. My main server uses guinicorn.

File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/auth/__init__.py", line 58, in _get_user_session_key
return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/sessions/backends/base.py", line 64, in __getitem__
return self._session[key]

I believe that Django session management is causing this issue when I use daphne. When either Django View or a WebSocket consummer requires validation, a connection is opened to the DATABASE but it never closes. This behavior is not apparent on the local machine because we don't have the limitation on database connection.

carltongibson · 2020-04-23T17:16:51Z

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

phamhm · 2020-04-23T17:22:08Z

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

Hardeepsingh980 · 2020-05-02T07:54:01Z

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

Hey i fixed this by changing my heroku runtime to python 3.8. This fixed the issue. So, it is worth trying.

phamhm · 2020-05-04T15:40:02Z

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

Hey i fixed this by changing my heroku runtime to python 3.8. This fixed the issue. So, it is worth trying.

I switched to 3.8 and it has fixed my issue. That was very helpful, thank you.

dctalbot · 2020-06-25T01:08:49Z

So glad I found this thread. Just chiming in here - the following manifested this behavior:

Heroku Postgres, Heroku Redis
Python 3.7.7
Django 3.0.3

channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0

Dropping conn_max_age from the dj_database_url call solves my immediate issue, but this just defaults it to 0 such that connections are never reused.

I haven't tried upgrading Python yet, but I should do that and report back.

Fix bug in live updates.

andrewgodwin added bug exp/beginner labels Feb 6, 2018

andrewgodwin mentioned this issue Feb 6, 2018

PostgreSQL idle connections not being reused django/daphne#163

Closed

andrewgodwin mentioned this issue Feb 6, 2018

Daphne Connections Leak in Heroku django/daphne#164

Closed

andrewgodwin self-assigned this Feb 6, 2018

andrewgodwin closed this as completed Feb 8, 2018

burcaw mentioned this issue Mar 23, 2018

Ability to Override MAX_CONN_AGE heroku/django-heroku#11

Open

kltdwrds mentioned this issue Apr 23, 2018

Channels should handle abnormal database server termination gracefully. #1028

Closed

kevinhowbrook added a commit to torchbox/rca-wagtail-2019 that referenced this issue Sep 13, 2019

Remove due to too many connections error in sentry

6b1bb0b

- ref django/channels#871 (comment)

kevinhowbrook mentioned this issue Sep 13, 2019

Remove due to too many connections error in sentry torchbox/rca-wagtail-2019#214

Merged

EchNet added a commit to EchNet/d20hex that referenced this issue Jul 22, 2020

Bump runtime to latest python per this thread: django/channels#871

54bf2f5

Fix bug in live updates.

tiendq mentioned this issue Aug 5, 2022

FATAL: sorry, too many clients already saleor/saleor#10319

Closed

8 tasks

Idle database connections not reaped #871

Idle database connections not reaped #871

Comments

andrewgodwin commented Feb 6, 2018

andrewgodwin commented Feb 6, 2018

michaelhays commented Feb 6, 2018

andrewgodwin commented Feb 7, 2018

andrewgodwin commented Feb 7, 2018

andrewgodwin commented Feb 8, 2018

michaelhays commented Feb 8, 2018

andrewgodwin commented Feb 8, 2018

LeonidIvanov commented Mar 1, 2018 • edited Loading

andrewgodwin commented Mar 1, 2018

LeonidIvanov commented Mar 1, 2018 • edited Loading

LeonidIvanov commented Mar 1, 2018 • edited Loading

andrewgodwin commented Mar 1, 2018

LeonidIvanov commented Mar 1, 2018 • edited Loading

andrewgodwin commented Mar 1, 2018

kennethreitz commented Mar 7, 2018

burcaw commented Mar 8, 2018 • edited Loading

kennethreitz commented Mar 8, 2018

seanwiseman commented Mar 15, 2018

kennethreitz commented Mar 15, 2018

bsr3000 commented Apr 25, 2018

andrewgodwin commented Apr 25, 2018

bsr3000 commented Apr 26, 2018

SHxKM commented Apr 29, 2019 • edited Loading

andrewgodwin commented Apr 30, 2019

vkolova commented May 9, 2019

LgArcida commented Jun 4, 2019

carltongibson commented Jun 5, 2019

joshmarlow commented Jun 20, 2019

SHxKM commented Sep 23, 2019 • edited Loading

Paulcappaert commented Feb 24, 2020

carltongibson commented Feb 26, 2020

Paulcappaert commented Feb 26, 2020

carltongibson commented Feb 26, 2020

sagardspeed2 commented Mar 31, 2020

Paulcappaert commented Mar 31, 2020

sagardspeed2 commented Mar 31, 2020

Hardeepsingh980 commented Apr 12, 2020

SHxKM commented Apr 12, 2020

andrewgodwin commented Apr 12, 2020

SHxKM commented Apr 13, 2020 • edited Loading

andrewgodwin commented Apr 13, 2020

SHxKM commented Apr 13, 2020 • edited Loading

phamhm commented Apr 23, 2020

carltongibson commented Apr 23, 2020

phamhm commented Apr 23, 2020

Hardeepsingh980 commented May 2, 2020

phamhm commented May 4, 2020

dctalbot commented Jun 25, 2020

LeonidIvanov commented Mar 1, 2018 •

edited

Loading

LeonidIvanov commented Mar 1, 2018 •

edited

Loading

LeonidIvanov commented Mar 1, 2018 •

edited

Loading

LeonidIvanov commented Mar 1, 2018 •

edited

Loading

burcaw commented Mar 8, 2018 •

edited

Loading

SHxKM commented Apr 29, 2019 •

edited

Loading

SHxKM commented Sep 23, 2019 •

edited

Loading

SHxKM commented Apr 13, 2020 •

edited

Loading

SHxKM commented Apr 13, 2020 •

edited

Loading