Final Draft: GSOC 2023 Database Connection Pool

valleyofblackpanther · April 4, 2023, 9:12am

Hi, @carltongibson @apollo13 this is the final draft of the Database Connection Pool.

Any feedback will be appreciated.

Thanks,

Kaushik

rafalp · April 5, 2023, 9:13pm

psycopg requirement to be able to use connection pool in MySQL seems very odd to me.

Also, what if one day this base class changes in a way where its no longer usable by the other backends?

valleyofblackpanther · April 6, 2023, 3:30am

Hi @rafalp. Thanks for taking the time to see my proposal.

Most of the pool code in Psycog is independent of PostgreSQL and Psycopg. And to answer your question in MySQL, there are drivers that implement the Python database API described in PEP 249. One of the drivers that we already use in django is MySQL/Connector Python and moreover the pooling implementation across database drivers has a similar approach, for example, in MySQL/Connector Python, we have a module called mysql.connector.pooling which has a lot of similarities with psycopg (see Connector/Python Connection Pooling). The similar architecture is because of the PEP 249, and Pscycopg is tightly paired with it. So as long as the modules follow PEP 249, there won’t be any nature of obsoleteness in the code we are trying to wrap in psycopg.

Thanks,

Kaushik

rafalp · April 6, 2023, 3:52pm

Don’t other DBAL libraries provide their own connection pool classes that their respective Django backends could use instead of relying on psycopg?

bluetech · April 8, 2023, 10:20am

Hi Kaushik,

Unfortunately I don’t have time to read the full document, I only read a few choice paragraphes, but here are some comments on it.

Django uses transaction pooling

Django does not use transaction pooling. Transaction pooling is a feature of pgbouncer, which the user may use, but Django doesn’t know anything about it.

Django’s transaction middleware is enabled

Django’s ATOMIC_REQUESTS is off by default, and in my experience most users do not enable it.
But I think it is mostly orthogonal to connection pooling. I would omit this sentence.

The goal of persistent connection is to save the cost (in response time) of establishing the database connection, without compromising reliability (see #733)

This is true, although “without compromising reliability” seems redundant to me (is there an approach which does compromise reliability?), so I would omit that part.

But now the issue begins with ASGI, which uses an asynchronous event loop to manage requests, whereas Django’s persistent database connections rely on a single database connection per thread. When using ASGI, multiple requests may be handled by the same thread asynchronously. This means that if one request is waiting for a database query to complete, other requests may be blocked until the query completes, leading to poor performance.

There is some confusion here. It is not accurate to say that “Django’s persistent database connections rely on a single database connection per thread”, when “thread” refers to an OS thread. Django’s persistent connection is stored in an asgiref local, which means that under ASGI/async, the connection is per asyncio Task, not OS thread.

The real problem which makes persistent connections useless in async is that, while in sync Django the same OS thread is used to process multiple requests (a “worker thread” kept in a thread pool, or just one thread in a single-threaded process worker), in async reusing Tasks is a bad practice. Instead, a task is created to handle a single request, then it dies. This renders Django’s persistent connection ineffective, because a new DB connection is created for each request which is what we wanted to avoid.

I’ll also mention a more subtle problem: one of the advantages of async is that it allows you to efficiently and ergonomically use concurrency within a single request, e.g. using a TaskGroup to perform e.g. 2 heavy queries against the DB concurrently such that the latency is max(A, B) instead of A+B. But the way concurrency is achieved in asyncio is for the task to create sub-tasks, and each sub-task needs its own DB connection, which exacerbates the problem.

To use pgbouncer with Django’s persistent database connections under ASGI, we need to configure Django to use a custom database backend that supports pgbouncer

This is incorrect, you don’t need a custom backend to use pgbouncer. pgbouncer is transparent to Django, you only need to change the DB connection string.

The package you link django-db-pool seems to use psycopg2’s pool to implement a per-process DB connection pool, no relation to pgbouncer.

Really, use of pgbouncer fixes the problem, and in fact does so better than whatever Django can come up with (because it’s host-level and not process-level; because it implements features like transaction pooling that Django is never going to implement; etc…).

The only real problem with pgbouncer (and similar solutions for other DBs which I’m not familiar with) is the complexity it adds. I’m in favor of adding DB pool support to Django so that pgbouncer is not required (in fact, I already roll my own using psycopg3’s pool…) for achieving a non-broken setup.

Hope that helps.

Topic		Replies	Views
Database Connection Pool GSoC 2023 Mentorship	1	505	March 30, 2023
Regarding database connection pool in Django GSOC 2023 Mentorship	0	374	March 30, 2023
Possible performance improvements of psycopg3 support in Django 4.2 ORM	14	10782	March 14, 2023
Using psycopg connection pooling ("pool": True) causes too eagerly discarded connections Deployment	0	224	December 31, 2024
Native connection pooling and other questions Getting Started	0	59	November 26, 2024

Final Draft: GSOC 2023 Database Connection Pool

Related topics