ASGI-Application with pgbouncer is suddenly hitting max_client_conn limit

Hi,

we have multiple projects/setups running Django+Wagtail (5.2/7.0) with gunicorn+uvicorn (latest versions, no threading) as ASGI applications with django-channels. We use pgbouncer and in Django we use CONN_MAX_AGE = 0. We are running this setups for about 4 years now and we didn’t have any issues with connection limits.

In the last weeks, I assume due to package updates (we’ve upgrade Django from 4.2 to 5.2), the setups started raising FATAL: no more connections allowed (max_client_conn) (so we’re hitting the relative high limit of pgbouncers client connections), after bots crawled for many unknown URLs concurrently. This is not unusual and never resulted in connection issues in the past.

One of those setups is very small: the instance has 2 CPUs, python3.12, it runs a gunicorn with 2 workers, no threading, 1 AsyncHttpConsumer without auth or database access (also it was never hit by the crawlers), and the rest is a typical Django/Wagtail site with synchronous middleware and views. pgbouncer is configured with max_client_conn = 100.

Even this simple and small setup got the no more connections allowed (max_client_conn) after being crawled.

Now, I’m trying to understand why this is possible.

My understanding of the ASGI-setup is, that the threads created by asgiref(?) to handle requests are limited by the default thread pool being used (which could be manually sized with ASGI_THREADS according to the channels-docs).
So in my understanding, if there are 2 CPUs, the default threadpool should have a size of 7 (2 CPUs + 5) and 2 workers/processes should therefore together hold a maximum of 14 connections, since due to CONN_MAX_AGE = 0 the connections are closed after every request.

Since the application is hitting the 100 connections, I probably miss something here, or there is an issue with database connections not being closed.
For me it looks like the request handling is somewhere leaking those connections, or it takes longer to close this connections while other threads are demanding new connections…

We actually wanted to switch to Django’s native connection pooling for performance, but as long as I don’t understand the setup correctly, this switch would probably make it worse, as with native pooling and therefore direct connections to the db, we would have to use smaller connection pools (since the db is shared with other projects), which would get exhausted even faster.

Can you help me?

  • Is my understanding of the ASGI-setup wrong? Is this normal behavior and those threads used for requests and therefore the used connections are actually never limited?
  • Is there are way to determine the maximum number of used threads?
  • Can we limit those threads and therefore the maximum used connections?

Hey @th3hamm0r — there’s not really enough to see what’s going on here — but it should work.

You can get the event loop and manually set the default executor like this:

    loop.set_default_executor(
        ThreadPoolExecutor(max_workers=4)
    )

(This is what Daphne does at start up.)

Otherwise, yes, there’s something going on, but can’t immediately say what. (Some logging normally goes a long way.)

4.2 → 5.2 — I can’t recall a change in Django that’s relevant here. (:thinking:) But there was also a jump in asgiref version there, so some implicit behaviour may have changed. (Must have somewhere, since you’re seeing a new outcome.)

Sorry that’s not more help, but there’s not a lot to go on.

Hey @carltongibson , thanks, appreciate your input!

I’ve started debugging the thread handling and now I think I’ve really misunderstood the thread handling in Django (or it changed in the past). When concurrent requests to synchronous code get handled, it actually creates one thread(-pool) per request. The executed code of asgiref sits here.

This also matches the documentation:

If there is a piece of synchronous middleware, then Django must use a thread per request to safely emulate a synchronous environment for it.

Since the majority of the project is a Wagtail site with synchronous views, there seems to be no limit to the number of threads. I really thought, there is a shared thread pool somewhere, which actually limits the concurrency of synchronous code in Django…

So now it doesn’t look like a bug actually, it seems like the performance has degraded somehow and now we’ve just finally hit the limit of pgbouncer’s client connections :expressionless:

Especially with database-heavy synchronous sites, and since the possible db connections are always limited, it would make sense to be able to limit the number of threads (and its resources) with a thread pool somehow, and so keep the requests queued in the upfront asgi server :thinking:

Hang on…

This is connections to pgbouncer right? (Docs) Not the pool size.

Can you observe the actual number of DB connections? And is the pool not serving to limit concurrency?

In theory you can raise max_client_conn significantly, but I’d probably want my reverse proxy to shelter my app some. (How hard is this crawler hitting you?)

If you can narrow it down to something specific in Django, very happy to have a look! (Possible that some kind of semaphore on the number of ThreadSensitiveContexts in play, but… I’d need to see a proof of concept for that to say sensibly)

Yes, the app is only hitting the client connection limit to the pgbouncer (so overall its not a big issue).

We don’t have a problem with the db connections, as those are limited by the pgbouncer. For example, in one project we have a max_client_conn of 1000 and an actual pool size for the db connections of 20 (default_pool_size). Still, there seem to be request peaks, which then hit the max_client_conn.

I just wanted to understand, why the Django app is actually hitting the connection limit to the pgbouncer, as I expected the synchronous requests to be limited somehow by a thread pool in Django.
But as posted above, I think this just works as expected(?). There is no thread limit and therefore theoretically no limit of db connections (to the pgbouncer) (?)

I’m not sure if this is necessary anymore, because I think this is just the way Django deals with synchronous views (one thread per request, no limit)(?) :grimacing:

Yes, that’s right, and at normal levels the desired behaviour.

I’m just half-pondering now what a limit here would look like (but also thinking it’s probably better handled at the reverse proxy level)

Until those issues occurred I had in the back of my mind, that one Django process always handles just one synchronous view at a time, and concurrent requests just queue up in the asgi server, but that is obviously not the case (anymore) :sweat_smile:
Also the channels docs probably falsely reassured me that there was a limit, since it is talking about the ASGI_THREADS which can be changed, which isnt’t the case anymore since this change.

I’m also not sure about a limit. On one hand it is nice, if you have x threads waiting for a db connection, but you can still have another new thread which doesn’t require db to be handled in the meantime. On the other hand I also think that not having any limit on the number of threads is a bit concerning to me, as at some point it just will start to have an impact due to the context switches and the required resources allocated by those threads, and it’s still python threading :grin:

I think Django should as a safe default use the ThreadPoolExecutor (using python’s defaults based on the CPU count) for synchronous code and not create unlimited threads, but allow the dev to override this. But I’m also quite sure there are good reasons, why this hasn’t been implemented yet :sweat_smile:

Probably because everything in here is monstrously hard. :sweat_smile: I’m happy to consider suggestions on the asgiref repo.

I look at this slightly differently… There always have been, and almost certainly always will be, any number of ways whereby it’s pretty trivial to DoS your app. (And that’s not Django specific.)

So normally the approach is to limit connections coming in from the reverse proxy. (Nginx, say, lets you set max connections, and queue size, to an upstream.) This is a much more robust approach, and lets the application focus on what it’s good at. (And the server too.)

1 Like

I think especially with asgi applications this isn’t really possible anymore. Django applications now can easily handle many thousands of concurrent async requests/connections, and in addition to that still serve synchronous views (which is a huge win!). But it cannot handle thousands of those synchronous views in parallel.
It actually makes no sense to create thousands of threads for those requests, since it just increases the overhead. I think the synchronous tasks/views must be seen as blocking IO, and as with every async framework (asyncio, nodejs), this blocking IO is typically handled in a limited thread pool, where the pool’s size is probably configured based on the CPU cores or other factors.

Of course, configuring limits in the reverse proxy is still important, but with asyncio/asgi the responsibility has shifted a bit towards Django, as only the application really “understands” those requests (sync vs. async).

Unfortunately, as you’ve noted, the responsible code is very hard to understand and I currently don’t have a solution for that at the moment :cry:
At some point, where Django handles the synchronous code, a ThreadPoolExecutor should probably be used :sweat_smile: