Debugging "The client was disconnected by the server because of inactivity"

I’m using Django 4.2 connected to a MySQL 8 database. The server is nginx , with gunicorn (3 workes) and using wsgi.

I’m running asyncio for a function I really need to run asynchronously:

from django.db import connection
import asyncio

example_object = asyncio.run(example(request))
connection.close()
async def example(request):
    object = await ModelExample.objects.filter(example=example).afirst()
    return object

I’m having the following error:
‘’“The client was disconnected by the server because of inactivity. See wait_timeout and interactive_timeout for configuring this behavior”‘’

Initially the work codes fine, the error only happens after a few hours (maybe 8h, like the default wait_timeout variable) after I restart the (nginx) server.
It’s a low traffic website so it’s very possible the function is executed only once a day.

In my understanding, Django always opens a connection, executes a query and then closes a connection. So I tought it could be som issue with ‘’“afirst()”‘’ and openned this same issue in django issues but was closed because I didn’t explained in enough detail.

I want to do everything possible to solve this issue but I am still learning Django, would be very gratefull if you can help me debug this and provide helpful details:

  1. It was mentioned that a small project would help. But because of the nature of the error - needing to be 8 hours live and innactive in order to occur - I guess it would not be helpfull for reproducing locally.
  2. Then maybe it would be more helpful to simply give you direct sentry log? Is this helpful?
  3. It was stated that *" this has nothing to do with afirst() .". I don’t understand why, can you explain?
  4. What other ways can I debug this and provide relevant details?

Thank you so much in advance.


Other things that I tried:

  • Before doing the connection.close(), I would have an error like: ‘’“MySQL server has gone away”‘’
  • Adding close_old_connections() before asyncio.run. Don’t understand why there is an inactive connection left open at all.
  • Increasing the wait_timeout value and interactive_timeout variables in my MySQL config file. I find it very strange that this had no impact at all but the ‘’“SHOW VARIABLES”‘’ command shows me they are indeed currently set to 31536000.
  • Then I thought that maybe the connection from Django is somehow independent of that and tried setting ‘‘CONN_HEALTH_CHECKS’’ option to True, in the hopes that “if the health check fails, the connection will be re-established without failing the request”
  • Changing the ‘‘CONN_MAX_AGE’’ from the default 0 to “None” in the Django settings file, which according to Django docs, means an unlimited persistent database connection, but then I would have a ‘’“Lost connection to MySQL server during query”‘’
1 Like

Can you be more specific here?

Is this supposed to be a “persistent” process that is always running? If so, you do not want to do this within the context of your Django process. You’d want to set this up as an external process.

If this is just a long-running task that doesn’t need to return data as a response to a particular request, that’s the common case for Celery. If it’s an “always running” process then you might consider creating a custom management command.

1 Like

@KenWhitesell sure, willt ry to be more specific:

The function responsability is to save a request inside a db object (the query is to find that object).

The reason I need it to be async is because this function can be called from an asynchronous API broker so it needs to be able to handle that asynchronous.

But in order to reduce complexity I’m not even using the API broker here, right now is just a simple button that runs this function via asyncio.run() directly.

So nothing “persistent” like that no. Just a button that runs an async function with a query that fails if the button is clicked >8 hours after restarting the server.

Here is a more complete piece of the real code in case it helps:

This is the function called from the views.py:

@login_required
def submit_application_output(request, application_id, project_id):
    question = request.POST.get('question')
    answer_html = request.POST.get('answer_html')
    if request.method == 'POST' and question and answer_html:
        application_output = asyncio.run(save_application_output(request, application_id, project_id, question, answer_html))
        connection.close()

        if application_output:
           (...)
async def save_application_output(request, application_id, project_id, question, answer_html):
    """Save the ApplicationOutput model
    - Check if the application output already exists
        - If it exists, update it
        - If it does not exist, create it
    - Save the model
    """
    application_output = await ApplicationOutput.objects.filter(application_id=application_id, question=question, project_id=project_id).afirst()

    try:
       (...)

I’d think about making this a celery task then - it seems to me to be an ideal case for it.

I mean sure, changing the tech would be a solution. Not sure if the best one given the context of the project tough (time, budget, etc).

But then again wouldn’t be needed if the query actually continued working, just like it does, just fine, within the first 8 hours.

If the answer to “.afirst()” stops working after a while is to just “don’t use it”, then I don’t understand why launching this Django feature in the first place.

Or are you hinting that this is somehow the expected behavior and maybe I’m using it wrong? If that’s the case please explain why so I understand…

Basically, my general principle is that “you” don’t control the Django process. The wsgi container (gunicorn in your case) is in control.
I never recommend starting any kind of thread or process from within Django itself - there are too many ways that things can go wrong because you’re not in charge of the base process.
I don’t see “afirst” as being the issue. I see the root problem as being the launch of a subprocess from within a process you have no control over.
You have no way of knowing when or why gunicorn may decide to restart a worker…

(You can search through the forum here to find multiple cases where I’ve made that point in other circumstances. I just don’t believe it’s a wise idea to try and do it in a production environment.)

1 Like

Thank you for your insights. I think I got the overall root concern of lack of control over the WSGI container.

But even if there are better solutions, considering that in this case:

  1. I’m using asyncio (no new threats)
  2. that he process is really short lived (just a save button that triggers a query)
  3. The disconnection issue mentions “innactivity”…

Why should we just assume the fault is simply on gunicorn randomly restarting workers?

Couldn’t it have something to be with iddle dB connections not being closed, timeout settings, etc etc. Wouldn’t the fault be with Django then?

Apologies for the stubbornness, but basic premise for me is “if Django offers async WSGI and ORM support shouldn’t I be able to use it?”

@bernardotavares in my humble opinion, the main issue here it’s the fact that it seems your are using asyncio without a proper reactor or mainloop progressing/handling your async code.

From https://docs.djangoproject.com/en/4.2/topics/async/:

Async views will still work under WSGI, but with performance penalties, and without the ability to have efficient long-running requests.

Regarding:

“if Django offers async WSGI and ORM support shouldn’t I be able to use it?”

I think the support for async views requires an async capable web server runner, which I don’t think WSGI provides?

Yes, you should - but within the context for which it’s designed. In this case, it’s for potentially running multiple queries in parallel within the view.

While my information is more empirical than based on direct knowledge, I’ve learned that it’s best to stick to the principle that you only have control between the time that a request is received and handed off to a view and the time the response is returned. Trying to rely upon anything not between the time of those two events tends to cause problems.

I would totally get it if this was a long-running request. But is just a button that triggers a query. Also is not really performance issue, once this error occurs it stops working completely…

Also from the same link:

This is exactly my use case goal. API request and saving in db. It says WSGI should work.

That’s certainly the goal. But if I cannot run an isolate query, how would I run multiple… Don’t get it.

I don’t believe a query stop working after a while is only “working as expected” within Django.

I want to honestly help improve Django project, but I get the feeling you being too quick brushing off any potential issues with Django without going further into debugging.

Afirst was implemented not so long ago, why is it so hard to consider it can have some problem?

If this is not a bug, at the very least something is not clear in the docs for me…

Is there anyway you see I can debug this further in order to understand the “innactivity” part?

Please help me going deep into the technical side here.

Answering this for future people that might have the same problem if they are using DigitalOcean.

It appears that the initial approach of configuring MySQL was correct, but it was not actually being applied to the DB. If you are using DigitalOcean, applying configurations via SSH to the droplet does not correspond to the actual configurations in the database cluster.

To resolve this, you need to use DigitalOcean’s API to configure the database. The following changes to the variables seemed to solve the issue for me:

  • connect_timeout to 60
  • interactive_timeout to 604800
  • wait_timeout to 2147483
  • net_read_timeout to 120

With these configurations in place, Django’s async functionalities worked seamlessly on my low-traffic website.

I want to emphasize that while Django may not be directly at fault, understanding how Django can work effectively with the appropriate database configurations is crucial.

I would suggest you add some note in the docs about the possibility of idle connections in MySQL using async if the db is not configured to handle bigger timeouts. If you think it makes sense let me know, I can try to propose the copy.

Kind regards.

1 Like