OperationalError: sorry, too many clients already

Nope, I haven’t set that up yet. I have been trying everything else first because I believe that using pgbouncer would just mask the underlying cause.

Looking at the logs in postgres, it looks like the issue first appeared after switching from gunicorn to Daphne, on a day with a lot of traffic.

Now that I’ve gotten rid of channels, I think the next move could be to try and go back to gunicorn.

If that doesn’t work either, the last resort would be to use conn max age 0 and pgbouncer.

Looking at my output from pg_stats, is there anything that jumps out to you? I have yet to understand how to interpret the values in the query column. To my understanding, they are the last query made by the connection before going idle?

The conversation from the other thread made it clear that you need to set conn_max_age = 0. I’m not sure why you think it’s just going to “mask the underlying cause”.

Using pgbouncer will reduce (eliminate?) the latency involved in opening the connection each time.

My understanding is that I need to do that if I’m using channels, right?

But under “normal” use, i.e. no channels, I would expect Django to properly close old connections.

Because I eliminated every usage of Channels, which allegedly was the culprit as it wasn’t closing the connections, and the issue still happened; therefore I assumed that if my app had some connection leak somewhere, using pgbouncer would only delay the issue.

Anyway, I switched back from daphne to gunicorn. We’ll see how it goes tomorrow with traffic.

You are still using Daphne. I don’t believe Channels, by itself, is the root cause. My impression from the other thread is that it’s a Daphne issue at its root.

Well, after one day of using gunicorn with 5 workers, even while having slightly less than a hundred simultaneous users all sending concurrent requests, the number of connections never seemed to surpass 58.

I guess Daphne really was the cause of this.

To be accurate, it’s using Daphne with a conn_max_age > 0 that is the cause. Daphne with conn_max_age = 0 shouldn’t exhibit this behavior.

2 Likes

Thanks for this thread, I’m running into the same issue. Will try Uvicorn when I get a chance to see if the issue carries over.

I’m facing this in development runserver while using Daphne with CONN_MAX_AGE = 0. It also exhibits the same condition of reaching too many clients when running Uvicorn.

In my application, I use Django to proxy most of my requests to a Node.js server using httpx in an async view. The node.js application receiving this proxied request executes graphql operations back to django.

I’m wondering if I have some lingering async connections that aren’t being closed properly. Most of my values in the query column in pg_activity are selecting a user from the Auth middleware.

EDIT: I swapped out my async view proxy with a sync view proxy and the connections are not building up. I’ll investigate further.

1 Like

I will have to add some Django Channels-based functionalities in my app, so I am once again faced with the issue of having to go back to an ASGI server.

Has anybody found a workaround for this issue? I was going to use uvicorn instead of daphne, but from @marty0678’s response I see that wouldn’t fix the issue.

I don’t understand how this hasn’t gotten more attention, as it seems to be a pretty big issue for anybody running ASGI.

1 Like

Unfortunately, I’ve never seen this issue. Granted, I don’t typically see more than 25 - 30 concurrent connections at any one time, but my environments running my mix of uwsgi and daphne behind nginx has never had anything like this problem, and have been up continuously for months at a time.

I’m also facing this issue. I originally thought it might have something to do with the dev environment exclusively, but it now also appeared in production. I had CONN_MAX_AGE set to 60 in production, but still see idle connections from 17h+ ago in the database, mostly regarding COMMIT.

I do not have ATOMIC_REQUEST set to True in dev, but do use it in prod. My prod environment only has like 5 users so far, 6 Daphne workers, 3 Celery workers. The 100 default Postgres connections should be enough! I deployed a lot of projects, but since using Channels and Daphne this is quite a terrifying outcome.

Other than that I’m also querying the user model, using the django-channels example of using websockets like this:

class FooConsumer(WebsocketConsumer):

    def connect(self):
        self.user = self.scope['user']
        ...

Has anyone been able to resolve this issue? Thanks.

1 Like