Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idle database connections not reaped #871

Closed
andrewgodwin opened this issue Feb 6, 2018 · 54 comments
Closed

Idle database connections not reaped #871

andrewgodwin opened this issue Feb 6, 2018 · 54 comments
Assignees

Comments

@andrewgodwin
Copy link
Member

Initially reported in django/daphne#163

I suspect this is due to the lack of calling close_old_connections, as seen in Channels 1 here: https://github.com/django/channels/blob/1.x/channels/signals.py#L10

@andrewgodwin
Copy link
Member Author

@michaelhays If you want to verify this would fix it, you can try calling close_old_connections yourself at the end of your consumer's method (but likely not in disconnect, as django/daphne#152 will prevent that). I'll look at solving both issues later in the week.

@michaelhays
Copy link

close_old_connections didn't work for me as I used it. I tried it in a couple of places.

End of the consumer method:

# consumers.py
class ChatbotConsumer(JsonWebsocketConsumer):
    def connect(self):
        self.accept()

    def receive_json(self, content):
        # create Chat in scope if one doesn't exist
        if 'chat' not in self.scope:
            chat = Chat.objects.create_chat(self.scope['user'])
            self.scope['chat'] = chat

        # process chatbot response, store messages, and send response to client
        response_data = some_func(content['message'])
        self.scope['chat'].add_message(content['message'])
        self.scope['chat'].add_message(response_data)
        self.send_json(response_data)
        close_old_connections()

End of the model method:

# models.py
class Chat(models.Model):
    body = JSONField()
    ...

    def add_message(self, message):
        self.body.append(message)
        self.save()
        close_old_connections()

Setting CONN_MAX_AGE to 0 has helped a bit, in that certain actions like reloading the page removes 2 or 3 of the "COMMIT" idle connections (but, only sometimes?), and occasionally, the idle connections are reaped. But I've yet to notice any rhyme or reason to it.

@andrewgodwin
Copy link
Member Author

Probably need the startup method connected as well. I'll look into it more tomorrow when I have time to work on Channels!

@andrewgodwin
Copy link
Member Author

OK, I have replicated this, and the temporary workaround is calling django.db.close_old_connections right after you do an ORM call, inside the same sync context (calling it anywhere else hides the connections due to thread safety).

Working on a proper fix now.

@andrewgodwin
Copy link
Member Author

OK, this should be fixed in 6afd423 - any code inside of a SyncConsumer will work fine, and anything you're calling from an async context should use the new channels.db.database_sync_to_async (can either be called with a function or be used as a function/method decorator). You'll also need the latest release of asgiref (2.1.4) to make method decorators work correctly.

If someone could confirm this fixes it for them, I can close this.

@michaelhays
Copy link

Confirmed to fix the issue for me!

@andrewgodwin
Copy link
Member Author

Great! I'll put this out in a release in the next day or two, depending on what else I land.

@LeonidIvanov
Copy link

LeonidIvanov commented Mar 1, 2018

Hey, @andrewgodwin ! I'm steel having the same problem. And it's not only about consumer. I was following your tutorial (multichat). I have database_sync_to_async decorator that covers get_room_or_error in chat.utils. The main thing is that every page refresh (not only chat page) creates new database connection! Django + PostgreSQL + Daphne + Heroku

@andrewgodwin
Copy link
Member Author

If you can reduce it to a simple set of code that reliably reproduces the problem, using the latest versions of all of the repositories, then please open a new issue with that code snippet, and steps to replicate locally (I can't debug things that only happen in Heroku)

@LeonidIvanov
Copy link

LeonidIvanov commented Mar 1, 2018

It's exact same issue as @michaelhays described at django/daphne#163. In my case there is no such issue in local but it is in Heroku.
And I understand that you can't debug things that only happen in Heroku... But there's no support where I can text with that issue. You have more influence to report them about it as more people who updating from channels 1.0 to 2.0 will suffer this problem...

@LeonidIvanov
Copy link

LeonidIvanov commented Mar 1, 2018

I just recreated everything on your clear multichat example here and deployed to heroku https://django-channels-2-test.herokuapp.com/ (user: demo, pass: demo) and get same issue. Refresh page 20 times and you'll get

OperationalError at /
FATAL:  too many connections for role "jdknihvnjyczlm"

@andrewgodwin
Copy link
Member Author

Can you confirm what exact versions of each channels package you're running using a pip freeze from the Heroku environment? I just tried this again locally with multichat and verified that it only ever opens a single database connection.

@LeonidIvanov
Copy link

LeonidIvanov commented Mar 1, 2018

aioredis==1.0.0
asgiref==2.1.6
async-timeout==2.0.0
attrs==17.4.0
autobahn==17.10.1
Automat==0.6.0
channels==2.0.2
channels-redis==2.1.0
constantly==15.1.0
daphne==2.0.4
dj-database-url==0.4.2
Django==2.0.2
hiredis==0.2.0
hyperlink==18.0.0
idna==2.6
incremental==17.5.0
msgpack==0.5.6
psycopg2==2.7.4
pytz==2018.3
six==1.11.0
Twisted==17.9.0
txaio==2.8.2
whitenoise==3.3.1
zope.interface==4.4.3

Looks like packages in heroku are exactly the same as in local env... And yes, the problem only appears at Heroku, works fine at local though...

@andrewgodwin
Copy link
Member Author

Unless I can replicate it locally I'm afraid I can't fix it, sorry - I'm not sure what's different about the Heroku environment that would cause this.

@kennethreitz
Copy link

Hmmmm

@burcaw
Copy link

burcaw commented Mar 8, 2018

Same issue, Django + PostgreSQL + Daphne + Heroku.

Originally had this issue with the consumers, but was fixed with the prior release.

On any DB lookup within the Django app, postgres connections are left idle and never close. This occurs on any DB lookup, not the very narrow area where I use channels.

Relevant versions:
asgiref==2.2.0
channels==2.0.2
channels-redis==2.1.0
daphne==2.1.0
dj-database-url==0.5.0
Django==2.0
psycopg2==2.7.4

One additional item, I verified that these idle connections don't close after the conn_max_age 500 second have elapsed.

db_from_env = dj_database_url.config(conn_max_age=500)
DATABASES['default'].update(db_from_env)

UPDATE: Perhaps this article provides a clue: https://devcenter.heroku.com/articles/forked-pg-connections

@kennethreitz
Copy link

what happens if you drop conn_max_age from the dj_database_url call?

@seanwiseman
Copy link

I was experiencing the exact same problem @burcaw described. I can confirm @kennethreitz, dropping conn_max_age from the dj_database_url.config call has resolved the issue for me. No more idle db connections are left open 👍.

@kennethreitz
Copy link

✨🍰✨

@bsr3000
Copy link

bsr3000 commented Apr 25, 2018

aioredis==1.0.0
asgi-redis==1.3.0
asgiref==2.3.0
async-timeout==2.0.1
attrs==17.4.0
autobahn==18.4.1
Automat==0.6.0
channels==2.1.1
channels-redis==2.1.1
daphne==2.1.1
Django==1.11
psycopg2==2.7.4
six==1.11.0
Twisted==17.9.0
txaio==2.9.0
txredisapi==1.4.4
ujson==1.35
uWSGI==2.0.15
zope.interface==4.4.3

Hello, I have encountered the same issue as @burcaw , Nginx + Uwsgi(WSGI Layer) + Daphne (only ASGI Layer), I have already updated all packages to latest ones, but even when I dropped CONN_MAX_AGE, it creates and keeps idle connections to database, without terminating them, so if few clients connect to websockets, postgres reaches the max connection limit.

Is there any workaround for current issue?

Thanks in advance

@andrewgodwin
Copy link
Member Author

If you are still having the issue it means you are using code that talks to a database outside of database_sync_to_async. Make sure all your database access is wrapped in that (consumer methods on sync consumers already are).

@bsr3000
Copy link

bsr3000 commented Apr 26, 2018

Thanks for your reply, I have double checked my code and didn`t found any database access in methods without database_sync_to_async decorator.

So the problem was in my server configuration and ASGI_THREADS env var:

  1. I didn`t set an environment variable in docker image
  2. I had 10 instances of daphne, with max workers set to None (if I understand well, each daphne had it's own Threadpool set to default max_workers (os.cpu_count() or 1) * 5 ) and the test server a was running on os.cpu_count() was 16, so max worker count for each daphne server were 80, so that`s why I had so many connections to database.
  3. Setting ASGI_THREADS and "CONN_MAX_AGE": 60 works fine now, each worker reuses connections to database and cleans everything well, on my current settings ASGI_THREADS=20 and 4 daphne instances on different ports, number of database connections doesn`t exceed more than 45, while load testing them using artillery.io

P.S Maybe it can be usefull to add some info about ASGI_THREADS var to documentation

Also one more question, while load testing with an external tool, sometimes I get

ERROR    Exception in callback AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287
handle: <TimerHandle when=1713807.003341125 AsyncioSelectorReactor.callLater.<locals>.run() at /venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py:287>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/events.py", line 126, in _run
    self._callback(*self._args)
  File "/venv/lib/python3.5/site-packages/twisted/internet/asyncioreactor.py", line 290, in run
    f(*args, **kwargs)
  File "/venv/lib/python3.5/site-packages/twisted/internet/tcp.py", line 289, in connectionLost
    protocol.connectionLost(reason)
  File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 128, in connectionLost
    self._connectionLost(reason)
  File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 2467, in _connectionLost
    WebSocketProtocol._connectionLost(self, reason)
  File "/venv/lib/python3.5/site-packages/autobahn/websocket/protocol.py", line 1096, in _connectionLost
    self._onClose(self.wasClean, WebSocketProtocol.CLOSE_STATUS_CODE_ABNORMAL_CLOSE, "connection was closed uncleanly (%s)" % self.wasNotCleanReason)
  File "/venv/lib/python3.5/site-packages/autobahn/twisted/websocket.py", line 171, in _onClose
    self.onClose(wasClean, code, reason)
  File "/venv/lib/python3.5/site-packages/daphne/ws_protocol.py", line 146, in onClose
    self.application_queue.put_nowait({
AttributeError: 'WebSocketProtocol' object has no attribute 'application_queue'

Is it a channels issue or just some problem in testing tool, that it doesn`t close the connections properly to ws?

@SHxKM
Copy link

SHxKM commented Apr 29, 2019

For me, the only thing that worked was setting ASGI_THREADS (an an env var) to a very low number (6). Daphne never closes these connections by the way. I'm not sure if this is the expected behaviour, @andrewgodwin?

@andrewgodwin
Copy link
Member Author

Daphne and Channels both don't have enough visibility into the threadpool system to close any connections in there, sadly. This is something that has to be remedied in Django itself, which I am in the process of.

@vkolova
Copy link

vkolova commented May 9, 2019

I removed conn_max_age, tried with close_old_connections, as well as setting ASGI_THREADS to an even lower number (3) and the issue is still here for me for both rest requests and websockets.

channels-redis==2.3.3
channels==2.1.7
dj-database-url==0.5.0
Django==2.1.7

@LgArcida
Copy link

LgArcida commented Jun 4, 2019

Any update with this issue? @andrewgodwin

@carltongibson
Copy link
Member

@LgArcida It's a big project. The first steps are made here: https://github.com/django/deps/blob/master/draft/0009-async.rst

@joshmarlow
Copy link

For those who keep finding this issue, here's what I did:

I then was able to send 1,000 messages through my channels code on Heroku and it all worked fine (previously I would have issues after 20-30 messages).

@SHxKM
Copy link

SHxKM commented Sep 23, 2019

I have to say I'm still a little confused by this issue, even though setting ASGI_THREADS to a low number has "solved" the issue for me. I'm using a synchronous consumer.

Where am I supposed to use close_old_connections()? Here's my workflow with channels:

  1. User connects, starts a long-running celery task
  2. Celery task writes to DB, possibly thousands of times, calls Channels consumer each time with details of (updated) message to be sent to client
  3. Consumer sends a JSON string to the client each time a message is sent by Celery task, update is displayed to the user

Where does close_old_connections() go here?

@Paulcappaert
Copy link

I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()

@carltongibson
Copy link
Member

@Paulcappaert I don’t suppose we have any idea what changed there do we?

@Paulcappaert
Copy link

@carltongibson No idea, all I know is that I made that single change and the problem went away.

@carltongibson
Copy link
Member

Hmmm. Yes... 🙂 Thanks for the follow-up.

(asyncio is changing rapidly, still, but 🤷‍♂️)

@sagardspeed2
Copy link

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

@Paulcappaert
Copy link

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

I updated the runtime to python 3.8.1

@sagardspeed2
Copy link

@carltongibson No idea, all I know is that I made that single change and the problem went away.

which change do you perform so problem went away ?

I updated the runtime to python 3.8.1

ok thanks, i also updated runtime to python 3.8.2 and done

@Hardeepsingh980
Copy link

I was having this problem with channels and heroku postgres but it seems to have resolved itself when I changed my runtime from 3.6 to 3.8.1. I don't need to use close_old_connections()

You now what? i was struggling from last two days on this stupid error. I am glad you fixed it. Thanks a lot. I am really happy

@SHxKM
Copy link

SHxKM commented Apr 12, 2020

I'm happy to see @Paulcappaert's suggestion to switch runtimes helped some here, unfortunately for some of us that's impossible as we may be using other libraries that don't support 3.8 yet. I for one rely on Celery for background processing and cannot really update.

@andrewgodwin
Copy link
Member Author

For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.

@SHxKM
Copy link

SHxKM commented Apr 13, 2020

For what it's worth, merely getting off of 3.6 may be enough - asyncio was a little underbaked in 3.6, and is much better in 3.7, and so most issues just need 3.7, not 3.8.

OK, just a heads up that from my experience - that's not true. I was (and still am) using Channels 1, and when I upgraded to Python 3.7.2 to enjoy a Celery feature, I also had to upgrade to Channels 2 (all sorts of dependency clashes). That's when I encountered this issue again, and reverted my whole setup so Channels 2 didn't kill my server. I am hoping to give this upgrade another chance soon enough, after multiple failed attempts.

I'm hosted at Heroku; YMMV.

@andrewgodwin
Copy link
Member Author

Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.

Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest asgiref release, but Channels does not use the Local implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.

@SHxKM
Copy link

SHxKM commented Apr 13, 2020

Channels 2 versus Channels 1 is a very different conversation given that they use completely different asynchronous styles under the hood.

Generally, this issue is caused by Django not identifying the connections with threads or coroutines that are already-dead. For Django itself this was fixed in the latest asgiref release, but Channels does not use the Local implementation in there so it doesn't get the same fixes. I also feel like Heroku does something to the Python environment based on a whole slew of Heroku-specific bugs that I used to get, but I have no time to verify that, unfortunately.

Let me clarify. Maybe I shouldn't have mentioned Channels 1 at all. What I was talking about is Channels 2 on Python 3.7.2, it's why I had to revert back to Channels 1, which is working perfectly fine.

@phamhm
Copy link

phamhm commented Apr 23, 2020

I'm using Python 3.7.6 on heroku. My main server runs fine on with guinicorn. I observed this issue when I run daphne on my dev server.

On my dev server, I incorporated Channels 2 + Websocket and daphne into my app and saw this issue whenever I refreshed my page. I changed my app to use AJAX instead and still see this issue. I only have one view function that return JSONresponse for the AJAX call. That AJAX view function was decorated with 'login_required' and a database update. So i did the following:

Using AJAX + daphne:

  1. comment out the database update in my Ajax function => same issue with multiple ajax calls.
  2. comment out the login_require decorator on my ajax function=> no fatal on multiple ajax calls.

Using websocket(Channels2) + daphne

  1. 20+ Websocket calls with no database update in WebSocketConsumer=> no fatal
  2. 20+ Websocket calls with database update in WebSocketConsumer => fatal
  3. Refresh the login_required view 20+ times: fatal.

I looked at the heroku logs and found that in all instances, the fatal occurred when Django was doing session validation. I don't observe this behavior on my main server when I refresh the login_required view 20+ times. My main server uses guinicorn.

File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/auth/__init__.py", line 58, in _get_user_session_key
return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
File "/app/.heroku/python/lib/python3.7/site-packages/django/contrib/sessions/backends/base.py", line 64, in __getitem__
return self._session[key]

I believe that Django session management is causing this issue when I use daphne. When either Django View or a WebSocket consummer requires validation, a connection is opened to the DATABASE but it never closes. This behavior is not apparent on the local machine because we don't have the limitation on database connection.

@carltongibson
Copy link
Member

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

@phamhm
Copy link

phamhm commented Apr 23, 2020

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

@Hardeepsingh980
Copy link

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

Hey i fixed this by changing my heroku runtime to python 3.8. This fixed the issue. So, it is worth trying.

@phamhm
Copy link

phamhm commented May 4, 2020

Hi. Are you able to reproduce on a laptop, or only on Heroku? (That still seems to be the issue...)

I only observe this behavior on heroku. I don't know how to limit database connection on my laptop.

Hey i fixed this by changing my heroku runtime to python 3.8. This fixed the issue. So, it is worth trying.

I switched to 3.8 and it has fixed my issue. That was very helpful, thank you.

@dctalbot
Copy link

So glad I found this thread. Just chiming in here - the following manifested this behavior:

Heroku Postgres, Heroku Redis
Python 3.7.7
Django 3.0.3

channels==2.4.0
channels-redis==2.4.2
daphne==2.5.0

Dropping conn_max_age from the dj_database_url call solves my immediate issue, but this just defaults it to 0 such that connections are never reused.

I haven't tried upgrading Python yet, but I should do that and report back.

EchNet added a commit to EchNet/d20hex that referenced this issue Jul 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests