Django can't start multiple Threads

Hi, I’m trying to build a Django application that requires to spawn multiple threads other than the main one of the Django server itself.

However, I’m encountering this weird behaviour: Django is able to start one of the two threads that my application requires to use but when i add the other two therads in both apps.py and settings.py, I’m greeted with the following error code in the django console:

django.core.exceptions.ImproperlyConfigured: Cannot import ‘grafico’. Check that ‘pages.apps.InizializzaThreadGrafico.name’ is correct

I also tried to check for typos in the code but I didn’t find any typos in the code so far.

I also can’t use extrnal multitherading libraries such as celery due to project contraints

I put the code for both my apps.py (Thread declaration) and settings.py

I also put the stackoverflow limks that i have used to implement the multithread part with django

[Link 1] (python - How to start a background thread when django server is up? - Stack Overflow)

[Link 2] (python - How to avoid AppConfig.ready() method running twice in Django - Stack Overflow)

Thank you in advance for the support

apps.py

import os
from threading import Thread
from django.apps import AppConfig

# pages application declaration
class PagesConfig(AppConfig):
    default_auto_field = 'django.db.models.BigAutoField'
    name = 'pages'

# filtra_dati_unico application declaration
class InizializzaThreadFiltraDati(AppConfig):
    name = 'filtra_dati_unico'
    
    def ready(self):
        FiltraDatiThread.daemon = True
        run_once = os.environ.get('CMDLINERUNNER_RUN_ONCE') 
        if run_once is not None:
            return
        os.environ['CMDLINERUNNER_RUN_ONCE'] = 'True'
        FiltraDatiThread().start()

# Application declaration grafico_impianto
class InizializzaThreadGrafico(AppConfig):
    name = 'grafico_impianto'
    
    def ready(self):
        GraficoThread.daemon = True
        run_once = os.environ.get('CMDLINERUNNER_RUN_ONCE') 
        if run_once is not None:
            return
        os.environ['CMDLINERUNNER_RUN_ONCE'] = 'True'
        GraficoThread().start()

# Thread FiltraDati declaration
class FiltraDatiThread(Thread):
    def run(self):
        print('Thread FiltraDati initialized')

# Thread GraficoThread Initialization
class GraficoThread(Thread):
    def run(self):
        print('Thread GraficoThread initialized')

settings.py

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'bootstrap5',
    'pages.apps.PagesConfig',
    # Custom app definition (Threads)
    'pages.apps.InizializzaThreadFiltraDati', 
    'pages.apps.InizializzaThreadGrafico',
]
1 Like

Don’t try to do this.

This is an excuse, not a reason.

Always keep in mind that “you” do not have control over the process itself in which Django runs. Your Django code runs within some type of wsgi container such as gunicorn or uwsgi. As a result, you have no control over when this process is started or stopped.

If you need a task to run longer than the life of an individual request, use Celery or some other kind of worker job queue.

Hi,

thank yor for the feedback! I appriciate it.

Since it’s my first Django, project, I will need some time to understand how to do things in the best way possibile.

I will try to use Celery in my projecy like you suggested.

Thank you again for the feedback

From your code it appears that you’re trying to run several different services from the same application code base but each in different threads?

Consider creating a different wsgi app with your other applications. Look into the generated Django files. You’ll essentially then contain your applications in the same codebase.

You can execute each application separately. Threads might not be the right call to isolate your services because you ideally want each service to work run with as many threads as possible.

———

Why not? It’ll be more productive to explain to OP why their approach is unusual and suboptimal rather than issuing what sounds like an edict.

I do not understand why you think a developer that configures and executes their application in production doesn’t have control over the process where the application executes.

Both of these are highly configurable providing the developer with a high degree of control.

You can run uwsgi and disable threading.

There is a spectrum of options available to the developer to run their jobs everywhere from within a sliver of a request, to over or after several requests. The environment and specifics of how their request and app works are all configurable and Django is just a Python app.

It doesn’t make sense why you would post a blanket statement like this. Django is not a serverless framework. Should cache operations end after a request is completed? Should the database be reinitialized for each request?

I’m sorry but I was confused by your response when I stumbled across this thread for a completely unrelated reason.

Note that the word “you” is in quotes. This is a reference to your Django code running within the Django framework. Your code - your views and models - have no control over the process in which it is running.

But not from within your Django code.

You need to understand my response from within the context of the question from a person writing code in a view to start and stop threads.

Yes, taken out of context, my answer would be misleading.

However, given that the question is asking about running threads from within the app, I stand by my answer as written. Starting background threads within your app is a bad idea.

@KenWhitesell Even for very short-lived tasks (let’s say under 50ms each), wouldn’t Celery be overkill?

For example, if we have multiple I/O-bound JSON dumps to a file in an API view, couldn’t ThreadPoolExecutor handle this efficiently?

Thank you in advance.

Welcome @gregverm0-create !

“Efficiently”? Under some definition, yes.

Safely? No.

What percentage of failed tasks are you willing to accept?

Nice intuition, thank you @KenWhitesell.

So between the “heavy” Celery and “somewhat unsafe” ThreadPoolExecutor, is there anything in between that fits well with Django for very lightweight tasks?

I’m sorry, I don’t accept the premise that Celery is “heavyweight”. Yes, I have seen alternatives mentioned here and other places, but I’m not personally familiar with them. But they all share the common (and required) characteristic that they are an external process that exists to manage tasks requested through a message broker.

A Celery worker using the redis backend uses effectively no CPU and less than 100K of memory for my basic tasks.
(More memory is consumed when the tasks are “active”, but that memory will be used regardless of whether the task is being run by Celery or something else.)

So even for ultra-lightweight tasks (taking let’s say) under 50ms, Celery would still be your go-to solution, right?

This is an architectural decision that would depend upon the larger context of the system requirements and the runtime environment.

If it’s a 50 ms task (total “wall-clock” time, not CPU time), I’m more likely to do it inline.

If there are a lot of these to be performed in a single request, and if they are truly I/O bound*, I may consider using an asynchronous view to initiate these requests in parallel.

*Note: There are a lot of situations that people historically think of as being “I/O bound”, but in reality end up not being so. For example, depending upon your precise hardware and software configuration, writing a file might not be I/O bound - you might be writing to an asynchronous device driver that is buffering the data, and get the response before the I/O actually occurs.

If it’s “generally” a 50 ms task, but has the theoretical potential of taking 500 ms or more, I’m definitely going to spawn it off to an external process.

If it’s highly I/O bound and not CPU bound, I might use an asynchronous Channels worker.

I also wouldn’t automatically rule out the possibility of making an internal request to another defined endpoint.

However, trying to run a background thread from within a Django view is never an option.

Could you share a minimal snippet for some I/O bound tasks (let’s say, writing multiple JSON objects to files)? Would you suggest using asyncio for this? What if the results of the tasks are needed before proceeding?

*Note: There are a lot of situations that people historically think of as being “I/O bound”, but in reality end up not being so. For example, depending upon your precise hardware and software configuration, writing a file might not be I/O bound - you might be writing to an asynchronous device driver that is buffering the data, and get the response before the I/O actually occurs.

I also wouldn’t automatically rule out the possibility of making an internal request to another defined endpoint.

Really eye-opening insights.

If it’s highly I/O bound and not CPU bound, I might use an asynchronous Channels worker.

If they were CPU bound, you would propose something else?

However, trying to run a background thread from within a Django view is never an option.

Clearly, put. Thank you @KenWhitesell for your detailed analysis.

I don’t actually have one - I’ve never needed to do it this way. (All my current solutions are either / both Celery or Channels based.)

This is a different issue. The risks of running view-managed threads are greatly reduced since those threads should exit before the view returns a response. In that situation, I can see where the maintenance of a thread pool may be advantageous if the environment surrounding the use of that thread pool is thread-safe.

Again, I’ve never needed to do this. I’ve never worked with a project that both:

  • Needs to spawn off multiple tasks
    and
  • Needs to get the results to prepare the response

Absolutely - Celery. In this type of situation, I want the Operating System(s) to manage the CPU allocation - I don’t want any part of that. I’ll let Celery spawn off “x” number of worker processes to ensure that no individual CPU is over-subscribed to the point that it affects the general response of the web site. And, if this is a frequent and large-enough event, I’d probably be looking at having a separate server processing those tasks.

1 Like