Seemingly random PostgreSQL connection drops while handling Django requests

My first reaction when seeing this error is to ask whether you’re running gunicorn using more than 1 thread per worker process or if you’re trying to spawn off your own threads within your code.

If you’re running gunicorn with the -threads parameters, I’d try it without.

If you’re spawning off your own threads within your process, well, you really don’t want to do that.

If neither of these is true, then we need to keep looking.

1 Like

I’m running gunicorn without --threads. The default value is 1 on Heroku (some gunicorn defaults are different on Heroku). I’m also not spawning any threads within my code.

As far as I know, psycopg2 InterfaceError is only meant to be raised when there’s issues on the client-side. However, I keep thinking whether it’s possible that the issue is from the server-side? The only reason why I’m considering that is because we’ve had a slight increase in load and we haven’t changed any code relating to the database. I’m running the weakest Postgres instance on heroku (which has no specs or logs…).

As a temporary hack, I’m considering adding some sort of middleware that would catch this exception and redirect back to itself to try again. Should be fine (hopefully) as this issue never happened on 2 requests in a row.

I’d also give it a shot with CONN_MAX_AGE=0 instead of none

1 Like

Just tried it an hour ago, still getting the same error.

Do you have any middleware / third-party software that attempts to manage connection pools?

Side note: To answer your previous question, personally, I have never seen this caused by the PostgreSQL side - this has always been related to something happening within Django. This doesn’t mean I’m saying that it can’t be caused from the other end, I’ve just never seen it happen.

1 Like

I don’t have any connection management middleware.

For the record, my middleware config is like this:

MIDDLEWARE = [
    "django.middleware.security.SecurityMiddleware",
    "whitenoise.middleware.WhiteNoiseMiddleware",
    "django.middleware.gzip.GZipMiddleware",
    "django.contrib.sessions.middleware.SessionMiddleware",
    "django.middleware.locale.LocaleMiddleware",
    "django.middleware.common.CommonMiddleware",
    "django.middleware.csrf.CsrfViewMiddleware",
    "django.contrib.auth.middleware.AuthenticationMiddleware",
    "django.contrib.auth.middleware.RemoteUserMiddleware",
    "django.contrib.messages.middleware.MessageMiddleware",
    "django.middleware.clickjacking.XFrameOptionsMiddleware",
    "tolink.middleware.HostRedirectionMiddleware",
    "tolink.middleware.LanguageMiddleware",
    "tolink.middleware.OnboardingRedirectMiddleware",
    "htmlmin.middleware.HtmlMinifyMiddleware",
    "htmlmin.middleware.MarkRequestMiddleware",
    "debug_toolbar.middleware.DebugToolbarMiddleware",
]

Is it possible to print/log a stack trace whenever the connection close method is called? If yes, then I could find out what’s causing the connection drop.

You may be able to (either) monkey-patch django.db.backends.postgresql.base.DatabaseWrapper to inject a close() method that does this, or directly edit the code in your virtual environment, or copy the existing engine and patch it in your copy and configure Django to use it instead.

But I’ve never tried this and have no idea what sort of issues you may encounter when doing so.

1 Like

I’ve started getting different errors related to the DB as well. I’m guessing it’s related to the other errors I’ve been getting.

TransactionManagementError
An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.

with a stack trace like this:

KeyError: 'card_customization'
  File "django/db/models/fields/related_descriptors.py", line 444, in __get__
    rel_obj = self.related.get_cached_value(instance)
  File "django/db/models/fields/mixins.py", line 15, in get_cached_value
    return instance._state.fields_cache[cache_name]

TransactionManagementError: An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
  File "django/core/handlers/exception.py", line 56, in inner
    response = get_response(request)
  File "django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "tolink/views/purchase.py", line 138, in cart
    card_customization = order.products.first().card_customization
  File "django/db/models/fields/related_descriptors.py", line 452, in __get__
    rel_obj = self.get_queryset(instance=instance).get(**filter_args)
  File "django/db/models/query.py", line 646, in get
    num = len(clone)
  File "django/db/models/query.py", line 376, in __len__
    self._fetch_all()
  File "django/db/models/query.py", line 1867, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "django/db/models/query.py", line 87, in __iter__
    results = compiler.execute_sql(
  File "django/db/models/sql/compiler.py", line 1398, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/db/backends/utils.py", line 83, in _execute
    self.db.validate_no_broken_transaction()
  File "django/db/backends/base/base.py", line 520, in validate_no_broken_transaction
    raise TransactionManagementError(

I’ve also been getting TransactionManagementError on its own without any key errors (like in the stack trace above). Is it possible that the database is out of sync with the migrations? Is some middleware performing failing queries within a try/catch block that marks the transaction as finished?

However, the middleware theory couldn’t really work as I would always be getting errors on the first query within my views, which is not always the case. I am sometimes getting this error on the 3rd or 4th query.

Another stack trace that might give more clues:

KeyError: 'REMOTE_USER'
  File "django/contrib/auth/middleware.py", line 59, in process_request
    username = request.META[self.header]

AttributeError: 'SessionStore' object has no attribute '_session_cache'
  File "django/contrib/sessions/backends/base.py", line 187, in _get_session
    return self._session_cache

InterfaceError: cursor already closed
  File "django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

InterfaceError: cursor already closed
  File "django/core/handlers/exception.py", line 56, in inner
    response = get_response(request)
  File "django/utils/deprecation.py", line 135, in __call__
    response = self.process_request(request)
  File "django/contrib/auth/middleware.py", line 64, in process_request
    if self.force_logout_if_no_header and request.user.is_authenticated:
  File "django/utils/functional.py", line 266, in inner
    self._setup()
  File "django/utils/functional.py", line 419, in _setup
    self._wrapped = self._setupfunc()
  File "django/contrib/auth/middleware.py", line 25, in <lambda>
    request.user = SimpleLazyObject(lambda: get_user(request))
  File "django/contrib/auth/middleware.py", line 11, in get_user
    request._cached_user = auth.get_user(request)
  File "django/contrib/auth/__init__.py", line 191, in get_user
    user_id = _get_user_session_key(request)
  File "django/contrib/auth/__init__.py", line 60, in _get_user_session_key
    return get_user_model()._meta.pk.to_python(request.session[SESSION_KEY])
  File "django/contrib/sessions/backends/base.py", line 53, in __getitem__
    return self._session[key]
  File "django/contrib/sessions/backends/base.py", line 192, in _get_session
    self._session_cache = self.load()
  File "django/contrib/sessions/backends/db.py", line 42, in load
    s = self._get_session_from_db()
  File "django/contrib/sessions/backends/db.py", line 32, in _get_session_from_db
    return self.model.objects.get(
  File "django/db/models/manager.py", line 85, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "django/db/models/query.py", line 646, in get
    num = len(clone)
  File "django/db/models/query.py", line 376, in __len__
    self._fetch_all()
  File "django/db/models/query.py", line 1867, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "django/db/models/query.py", line 87, in __iter__
    results = compiler.execute_sql(
  File "django/db/models/sql/compiler.py", line 1398, in execute_sql
    cursor.execute(sql, params)
  File "django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(
  File "django/db/backends/utils.py", line 80, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "django/db/backends/utils.py", line 84, in _execute
    with self.db.wrap_database_errors:
  File "django/db/utils.py", line 91, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "django/db/backends/utils.py", line 89, in _execute
    return self.cursor.execute(sql, params)

Seems like someone else had the same issue with an identical setup (with no solution): https://groups.google.com/g/django-users/c/oYeh9KPxlLc
InterfaceError and TransactionManagementError (PostgreSQL)

Except that he is pointing out that he’s running with multiple threads per worker process - which you have ruled out previously.

All of the above tracebacks all lead me to believe that it’s something outside the sequence of events in that view that are the root issue here - which is what got us to the point of avoiding threads in our runtime.

1 Like

Any luck with this? We’re encountering the same two errors just with vanilla Django model admin queries. Can’t figure out what it is.

The setup is:

  • Django 4.1.9
  • gunicorn 20.1.0
  • psycopg2 2.9.6
  • Postgres 14.6

The errors we encounter randomly, but frequently, are:

  • InterfaceError connection already closed
  • TransactionManagementError An error occurred in the current transaction. You can’t execute queries until the end of the ‘atomic’ block.

I managed to fix the problem with a hacky solution. I never understood the root cause of the connection drops, as they seemed to occur outside of Django (and definitely not within our code).

I pasted the relevant code to the fix below. Unfortunately, I don’t really remember how it works off the top of my head and I am horrible at keeping records (or comments).

I can look into this further when I have more time. Let me know if this doesn’t make sense or doesn’t work.

settings.py

DATABASES["default"]["ENGINE"] = "project_name.db"

project_name/db/base.py

import time
import threading
import traceback

from django.db.backends.postgresql.base import (
    DatabaseWrapper as PostgreSQLDatabaseWrapper,
)


TRUE_VALUES = ["True", "true", "1"]


class DatabaseWrapper(PostgreSQLDatabaseWrapper):
    """Workarounds to debug database "connection already closed" errors."""

    def close_if_unusable_or_obsolete(self, close=False):
        """Log every attempt to close the connection."""
        if close:
            if self.connection is not None:
                self.health_check_done = False
                # If the application didn't restore the original autocommit setting,
                # don't take chances, drop the connection.
                if self.get_autocommit() != self.settings_dict["AUTOCOMMIT"]:
                    traceback.print_stack()
                    for thread in threading.enumerate():
                        print(f"Thread after connection: {thread.name}")
                    self.close()
                    return

                # If an exception other than DataError or IntegrityError occurred
                # since the last commit / rollback, check if the connection works.
                if self.errors_occurred:
                    if self.is_usable():
                        self.errors_occurred = False
                        self.health_check_done = True
                    else:
                        self.close()
                        return

                if self.close_at is not None and time.monotonic() >= self.close_at:
                    self.close()
                    return

signal_handlers.py

from django.dispatch import receiver
from django.core.signals import request_started
from django.db.utils import ConnectionHandler


import logging

logger = logging.getLogger(__name__)

connections = ConnectionHandler()

@receiver(request_started)
def request_started_callback(sender, **kwargs):
    logger.debug("Request started")
    for conn in connections.all(initialized_only=True):
        logger.debug("Closing if unusable or obsolete")
        conn.close_if_unusable_or_obsolete(close=True)

I’m hesitant to implement this as it does feel “hacky”. This seems to be an underlying issue in one of the dependencies. I’m not getting this on our other Django installations. I’ve even matched the versions of Django, gunicorn, psycopg2 that are in problem free Django apps that we have in the wild and it still occurs which is odd. Its a show stopper for sure as it’s very frequent. Hit up a Django admin list view, boom 500 Interface connection closed. Refresh, no problem. It’s hit or miss.

Hi!
Exact same issues here, and some other related:

  • InterfaceError: connection already closed
  • TransactionManagementError An error occurred in the current transaction. You can't execute queries until the end of the 'atomic' block.
  • InterfaceError: cursor already closed
  • 'NoneType' object has no attribute 'cursor'

My setup:
- Django 4.1.1
- Gunicorn 20.1.0 (and then 21.2.0)
- psycopg2-binary 2.9.3
- Postgres 14.9 ? (managed)

The 2 non-django native middlewares that I use amongst those mentionned by @lukas-slezevicius are whitenoise (6.5.0) and debug_toolbar (3.6.0).

I am also using a postgres managed database instead of hosting it locally. I suspected the service provider to be the cause but looking at the thread I’m not really sure anymore.

Since we all got the same gunicorn version I tried upgrading but still same issues.

After experiencing this issue on several projects, we were able to pin it down to WhiteNoise, specifically something that changed between 5.2.0 and 6.6.0.

We’re going to triage further and see if we can narrow things down to the specific version, and maybe some of the why (I’m guessing middleware changes but that’s just speculation), but not sure when we’ll have time to do that, so noting it here for others who stumbles on this issue.

2 Likes

Hey there!

Thank you for the hint, we did pin whitenoise version to 5.3.0, and it indeed solved our random connection errors.
So, there appears to be a relation; however, it is tricky (in our case) to pin down to a certain version/commit/functionality.

I also posted this as an issue in whitenoise repo:

(the only thing - I am struggling to reproduce it outside our production instance)

Best,
Rust