DB connection closed by uWSGI in the middle of handling a request?!

I’m using an nginx, uWSGI, django, postgresql stack deployed on Ubuntu 20.04. I’m seeing random (in time and code location) errors because the database connection is unexpectedly closed during the handling of a request. I think I’ve figured out why, but it’s so weird I don’t know who’s ‘wrong’. Here’s what I’ve found out…

In my DB settings I have CONN_MAX_AGE=None, so in routine operation connections stay alive forever. There are no errors in any logs (uWSGI, Postgresql, my internal logging) - there was no reported reason why the DB connection was being closed. So I hacked the Django source to print a stack trace in the BaseDatabaseWrapper._close() method, using traceback.print_stack().

This trace makes no sense - here’s an example:

...
File "/deployment/v_env/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1208, in execute_sql
  return list(result)
File "/deployment/v_env/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1646, in cursor_iter
  for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
File "/deployment/v_env/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1646, in <lambda>
  for rows in iter((lambda: cursor.fetchmany(itersize)), sentinel):
File "/deployment/v_env/lib/python3.8/site-packages/django/http/response.py", line 292, in close
  signals.request_finished.send(sender=self._handler_class)
File "/deployment/v_env/lib/python3.8/site-packages/django/dispatch/dispatcher.py", line 180, in send
  return [
File "/deployment/v_env/lib/python3.8/site-packages/django/dispatch/dispatcher.py", line 181, in <listcomp>
  (receiver, receiver(signal=self, sender=sender, **named))
File "/deployment/v_env/lib/python3.8/site-packages/django/db/__init__.py", line 38, in close_old_connections
  conn.close_if_unusable_or_obsolete()
...

In the middle of Django building a query, we have a call to HttpResponseBase.close(). This makes no sense - there’s no way that the caller shown in the stack made this call. I have many other examples with different upper call stacks - in all cases the call to close() appears to be injected from nowhere.

Now this method is called by uWSGI from the C runtime…so I’m thinking, is this normal? If you have a separate C thread call Python at some random time, does it simply inject into the call stack in this way? I’ve not been able to find out if that’s the case.

Let’s assume this is what is happening and that it’s actually okay - a big assumption but I’ve looked at the uWSGI source and it certainly takes lots of precautions to acquire the GIL before interacting with Python so it looks okay.

So maybe the HTTP client is closing the connection early, uWSGI closes the response, and we end up with that stack trace I’m seeing. So why is Django closing the DB connection if this is all correct?

Well, it’s doing that because Django’s end-of-request handling will close the DB connection if required or it’s in a mess. A DB transaction was in effect before the HttpResponseBase.close() call hijacked the call stack, and the BaseDatabaseWrapper.close_if_unusable_or_obsolete method has this line:

# If the application didn't restore the original autocommit setting,
# don't take chances, drop the connection.
if self.get_autocommit() != self.settings_dict['AUTOCOMMIT']:
   self.close()

Autocommit is set to False during a transaction, but the setting is True. So that’s why the DB connection is closed. When this injected close() call unwinds and the interrupted call stack continues, code accessing the DB finds the connection unexpectedly closed, and I get django.db.utils.InterfaceError: cursor already closed exceptions in my application code.

So if the call to HttpResponseBase.close() jumping into the call stack is legit, this code is wrong, and we have a bug in Django. If not, I guess there’s a bug in uWSGI.

Any insight greatly appreciated.

Thinking about it, I think this call stack injection must be the way calling from C to python works. How else could it work? The GIL could be obtained anytime, and Python isn’t going to start a new interpreter per call, so it’s got to just run in the current context.

Also I don’t think this is a bug in Django - the client has disconnected so there’s no way to write anything back - pulling the DB plug immediately makes some sense.

So how to save other folks spending ages trying to figure out what’s going wrong when this happens? Some documentation would be nice. I’ll raise a request.

Except, the client isn’t connected to uwsgi - they’re connected to nginx. And nginx doesn’t disconnect from uwsgi based upon client action - at least, not that I’ve ever seen evidence of. (I don’t know if there’s a difference with this among the different ways for nginx to connect to uwsgi - e.g. socket vs tcp, or uwsgi vs proxy.)

Now, if there’s some degree of unreliability between nginx and uwsgi, I guess this makes sense.

But my general expectation would be that nginx would be serving as an isolation layer between the client and uwsgi.

(Admitttedly, this is all mostly conjecture on my part - not direct knowledge.)

I’d have to say that based on my direct experience nginx is passing on the disconnect - that’s the only explanation of what I’m seeing.

@KenWhitesell Perhaps this section of my nginx config explains this:

  location / {
    include uwsgi_params;
    uwsgi_param UWSGI_SCHEME $scheme;
    uwsgi_pass unix:///tmp/app-uwsgi.sock;
    uwsgi_buffering off;
  }

I explicitly turn off buffering as I sometimes use streaming responses.

Just to finish this thread off…

I turned uwsgi_buffering back on in my nginx config but it made no difference.

I’ve deployed lots of django apps in the past but this is the first time this has been observed - I think there’s something a little unusual about how the client environment closes connections (a GameMaker app deployed in a custom Android environment).

Anyways…I ended up adding a handler for Django’s request_finished signal and used that to set a “request finished” flag in my code - if my request handler raises an unhandled exception and that flag is set, I know that the HTTP connection was terminated before the request handler finished, and I just ignore whatever exception was thrown.

Thanks for detailed explanation, I think we’re occasionally hitting this as well at Weblate. What we’ve figured out as well is that switching to gunicorn makes these errors disappear, what is most likely caused by the fact that it always completes the request even if the client has closed the connection (see Gunicorn processes requests from clients that have already timed out · Issue #2577 · benoitc/gunicorn · GitHub).

So far, we’ve seen that only in the Docker environment, so that probably makes it more likely that the request is prematurely closed.

Anyway, I find it unfortunate that Django throws an exception in this case - it does not make sense to everybody hitting this duplicating @richardthegit workaround to silence all exceptions in case the HTTP connection was terminated.

@richardthegit Do you have a full traceback to share? I’d be curious to see the whole trace.

@nijel

Anyway, I find it unfortunate that Django throws an exception in this case

What else could Django possibly do? If uWSGI hijacks the executing thread plenty of things can go wrong; closed database connections is just one example.

I wonder what the WSGI spec says with regard to closing the response.

So I just tested this with:

uwsgi --thunder-lock --http :9090 --master --processes 1 --wsgi testing.wsgi

a view like this:

def view(request):
    def test():
        c = connection.cursor()
        c.execute("SELECT * FROM auth_user")
        print(connection.connection)
        time.sleep(5)
        print(c.fetchone())
        print("done")
        yield "test"
        c = connection.cursor()
        c.execute("SELECT * FROM auth_user")
        print(c.fetchone())
        print(connection.connection)
        yield "test2"

    return StreamingHttpResponse(test())

(ugly but should do for testing) I did use a Streaming response to give the webserver (uWSGI) the possibility to close the connection early after the first yield (which I think it might be allowed to do).

and:

curl -v --max-time 1 http://localhost:9090/test/               

Even though the view took five seconds to finish the response did not get closed before (verified via print in HttpResponse.close). I am currently trying to figure out if I can get uWSGI to abort the request but I doubt it. --harakiri 2 just kills the worker.

Any tips on how to reproduce that would be great.

I managed to get uWSGI close the response mid request with something like this (and more than one user in the db):

def view(request):
    def test():
        c = connection.cursor()
        print(connection.connection)
        c.execute("SELECT * FROM auth_user")
        yield "1"
        while c.fetchone():
            time.sleep(5)
            yield "test"
    return StreamingHttpResponse(test())

Results in the following logs:

<sqlite3.Connection object at 0x7fcf6fdb5990>
(0.000) SELECT * FROM auth_user; args=None; alias=default
Thu Dec 23 18:57:08 2021 - SIGPIPE: writing to a closed pipe/socket/fd (probably the client disconnected) !!!
Thu Dec 23 18:57:08 2021 - uwsgi_response_write_body_do(): Broken pipe [core/writer.c line 429] during GET /test/ (127.0.0.1)
OSError: write error
  File "/home/florian/sources/django.git/django/http/response.py", line 276, in close
    import traceback; traceback.print_stack()
Closing connection
[pid: 109890|app: 0|req: 3/3] 127.0.0.1 () {30 vars in 324 bytes} [Thu Dec 23 18:56:58 2021] GET /test/ => generated 5 bytes in 10012 msecs (HTTP/1.1 200) 5 headers in 186 bytes (2 switches on core 0)

but I do not get it to interrupt python code yet. I wonder if I somehow need code that actually releases the GIL (dunno?)

@apollo13 I never did find a reproducible way to trigger this problem, so I’m afraid I can’t help on that front.

I wasn’t really happy with my work-arounds for this issue so like @nijel I switched to Gunicorn. I’ve not had the problem since and honestly not looked back - Gunicorn is consistently faster and more uniform in its response time (uwsgi would occasionally spike with multi-second response times), and moreover it just seems friendlier to work with. I’d recommend it to anyone! :slight_smile:

For everyone running into this; this will be fixed via plugins/python: Properly call `.close()` as mandated by WSGI specs. by apollo13 · Pull Request #2423 · unbit/uwsgi · GitHub

4 Likes

Hi everyone,
First of all I have one question, is this deployment running on Azure VM?

We also have a deployment in Azure with django, grpc, postgres and nginx. We have been facing lot of connection issue with grpc lately. rpc connections will automatically get clossed with inactivity period of 4 min. After lots of debugging we found that the issue is with Azure VM ubuntu image(btw it works perfectly in aws). Azure uses a custom kernel for their VM. To solve that we had to ping our grpc server every 3 min, just to keep the connection alive.

After solving that we face the same error you guys are getting, connection with postgress is getting clossed. I think there is something in azure kernel that causes this connection issue. As far as I know the azure kernel is specifically trailored by ubuntu to get better at networking.

Also if you guys figured any solution to this issue, please share it here.

Thanks, Nipun Shaji