InterfaceError and TransactionManagementError (PostgreSQL)

Greetings dear community.

Calling for your help here.
What we are experiencing a lot is “InterfaceError: connection already closed” and “TransactionManagementError”. They are not necessarily accompany each other, but both are consistently happening.
PG logs are not showing anything unexpected or suspicious.
Affected endpoints are different, and I see no common behaviour in them. For example, it sometimes fails even in the admin interface during GET requests.
It does not happen locally.

Our setup pretty standard:

  • Python 3.11, Django 4.1. PostgreSQL 14.5.
  • We are hosted on Heroku, waitress works as a webserver (6 threads), 2-5 servers (dynos) could be running.
  • On top of that, we run 1-2 dramatiq workers (1 process, 4 threads each).
  • Database statement timeout is set to 10000 (10 seconds).
  • And we are using persistent database connections (we tried setting DATABASE_CONN_MAX_AGE between 10 and 600 seconds).
  • We are not using PGBouncer.
  • There are no issues with reaching connection limits, we have plenty of space there.

I am currently out of ideas how to debug this, and would love your advices or any help.

1 Like

Here, I am going to share example tracebacks captured by Sentry:

  1. InterfaceError

The only custom code:

  1. TransactionManagementError:

Django third-party libraries are rarely guaranteed to be thread safe. Depending upon what Django-related packages you have installed, one of them may be the source of the problem.

I always suggest running multiple separate processes for your Django instances with single threads per process.

Also, Python 3.11 compatibility isn’t supported in 4.1 before 4.1.3 - make sure your Django is up-to-date.

Multithreading wouldn’t explain dropped connections, thought, right? It would only explain obsolete open connections. However, those wouldn’t be used in a running thread.

Thanks for the answer!

I did skim through our packages and there is nothing that could mess up with database connections.
Also, as @codingjoe mentioned that symptoms would be different - I do not see any burst of obsolete connections, but rather ones I am using are being randomly dropped.

Regarding Python and Django versions - we are running current latest Django (4.1.3) and this issue was there on Python 3.10.6 / Django 4.0.7 as well.

That’s not the symptoms that we have seen in that case. The connections can be opened / closed irregularly based upon load and activity.

I don’t have an explanation, merely the observation that our connection-related issues went away (along with a number of other occasional “odd” errors) when we switched to a “process-only” deployment.

Okay, my latest findings show that Django opens up a connection and almost instantly closes it.
This happens in this part:

    def close_if_unusable_or_obsolete(self):
        Close the current connection if unrecoverable errors have occurred
        or if it outlived its maximum age.
        if self.connection is not None:
            self.health_check_done = False
            # If the application didn't restore the original autocommit setting,
            # don't take chances, drop the connection.
            if self.get_autocommit() != self.settings_dict["AUTOCOMMIT"]:

I guess, it might be due to atomic transactions (the only thing I know that will turn off autocommit mode), but I am not sure how and why. As in case of GET request to admin object detail page, it does not make any sense…