Calling for your help here.
What we are experiencing a lot is “InterfaceError: connection already closed” and “TransactionManagementError”. They are not necessarily accompany each other, but both are consistently happening.
PG logs are not showing anything unexpected or suspicious.
Affected endpoints are different, and I see no common behaviour in them. For example, it sometimes fails even in the admin interface during GET requests.
It does not happen locally.
Our setup pretty standard:
Python 3.11, Django 4.1. PostgreSQL 14.5.
We are hosted on Heroku, waitress works as a webserver (6 threads), 2-5 servers (dynos) could be running.
On top of that, we run 1-2 dramatiq workers (1 process, 4 threads each).
Database statement timeout is set to 10000 (10 seconds).
And we are using persistent database connections (we tried setting DATABASE_CONN_MAX_AGE between 10 and 600 seconds).
We are not using PGBouncer.
There are no issues with reaching connection limits, we have plenty of space there.
I am currently out of ideas how to debug this, and would love your advices or any help.
I did skim through our packages and there is nothing that could mess up with database connections.
Also, as @codingjoe mentioned that symptoms would be different - I do not see any burst of obsolete connections, but rather ones I am using are being randomly dropped.
Regarding Python and Django versions - we are running current latest Django (4.1.3) and this issue was there on Python 3.10.6 / Django 4.0.7 as well.
Okay, my latest findings show that Django opens up a connection and almost instantly closes it.
This happens in this part:
Close the current connection if unrecoverable errors have occurred
or if it outlived its maximum age.
if self.connection is not None:
self.health_check_done = False
# If the application didn't restore the original autocommit setting,
# don't take chances, drop the connection.
if self.get_autocommit() != self.settings_dict["AUTOCOMMIT"]:
I guess, it might be due to atomic transactions (the only thing I know that will turn off autocommit mode), but I am not sure how and why. As in case of GET request to admin object detail page, it does not make any sense…