Many abnormal runserver processes

Hello everyone
I have a task that connects thousands of servers through the Ansyncssh library and executes commands. Normally, it can be completed within one minute, and through the APScheduler scheduled task library, it is executed every minute. However, after a period of normal execution, there will be a random error message for the scheduled task, skip: maximum number of running instances reached (1), At this point, there will be a maximum of more than 70 identical Python management. py runserver processes on the Linux server. Killing them will immediately start. What are these processes and why do scheduled tasks not end (I have scheduled tasks for batch ping servers that do not occur)? How can I use django to troubleshoot what these processes are doing or where they are stuck?

dxgmapp 3281146 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281147 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281148 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281149 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281150 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281151 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281152 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281153 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000
dxgmapp 3281154 3262243 0 22:00 ? 00:00:01 /appdata/mainProject/server/venv/bin/python manage.py runserver xxx.xxx.xxx.xxx:8000

I think we’re going to need a clearer description of the situation with more specific details about what’s happening.

What task? Is it a bash script, python program, or something else?

How is it being run? (What’s running it? APScheduler?)

If this is an external process that is running external tasks, how is Django involved here? (Yes, I know you’re showing a log with a number of runserver commands executing, but it’s not - or shouldn’t be - a Django process doing this. If something is running runserver, it should be done outside the context of Django itself.)

Thank you for your reply
This task is to execute commands(bash script) on other Linux servers through the ansyncssh library, and then obtain the results of the commands. Due to the large number of servers, I used multiprocessing + asyncio,and set a scheduled task through the APScheduler to execute once a minute.At the beginning, there was no problem running. I don’t know when this task suddenly got stuck and won’t be completed(), but it’s just that this task has problems, and other tasks are normal (also SSH server executes commands, but the number of servers is not large)
I am also trying a little bit of testing. Currently, I only know that some hosts got stuck while executing commands in bulk. I only connected to the client and did not execute commands, and there were no exceptions. It is possible that the commands got stuck during bulk execution. Do you have any system logs that can tell from django, Can we further query any information through so many abnormal processes (May the use of multiprocessing cause multiple processes?)

this is sshclient class

import asyncio
import asyncssh
import socket


class SSHClient:
    def __init__(self, ip, username, password, port):
        self.ip = ip
        self.username = username
        self.password = password
        self.port = port
        self.known_hosts = None
        self.connect_timeout = 20
        self.command_timeout = 30
        self.conn = None
        self.result = None

    async def connect(self):
        try:
            self.conn = await asyncssh.connect(
                self.ip,
                username=self.username,
                password=self.password,
                port=self.port,
                connect_timeout=self.connect_timeout,
                known_hosts=self.known_hosts,
            )
            self.result = {"code": 0, "ip": self.ip, "data": "Successfully connected"}
        except Exception as e:
            self.result = {"code": -1, "ip": self.ip, "data": str(e)}
            
    async def run_command(self, command=None):
        if self.conn is not None:
            try:
                results = await self.conn.run(
                    "source /etc/profile >/dev/null 2>&1 && " + command,
                    timeout=self.command_timeout,
                )
                stdout = results.stdout
                stderr = results.stderr
                status = results.exit_status
                data = stdout if stdout else stderr
                self.result = {"code": status, "ip": self.ip, "data": data}
            except Exception as e:
                self.result = {"code": -1, "ip": self.ip, "data": str(e)}

this is get result

ssh_client = SSHClient(ip, username, password, port)
await ssh_client.connect()
if ssh_client.result["code"] == 0:
    await ssh_client.run_command(command)
    await ssh_client.close()
    result_command = ssh_client.result
    pass
else:
    await ssh_client.close()
    return ssh_client.result

What are the commands that you are issuing to each system? I’m trying to understand where these runserver commands are coming from.

Keep in mind that runserver is a persistent command. It’s designed to not terminate. It doesn’t make sense to me that you would be trying to run it through a script like this.

Thank you for your reply
The bash command sent is very ordinary, such as ‘pwd’,I conducted another test and it seems that it is only possible for a large number(about 2000) of connections to occur. After a few hours, the task will automatically end, and then it will be generated again. And the number of abnormal processes is the number of multiprocessing libraries I set up,I suspect that multiprocessing did not stop the task properly,Is there any way to tell where it’s stuck?
this is batch class

class BatchTask:
    def __init__(self, data, task, *args, **kwargs):  # Data is a list. Task is a function
        self.data = data
        self.split_num = 30
        self.args = args
        self.kwargs = kwargs
        self.task = task
        self.result = None

    def startInfo(self):
        if len(self.data) > self.split_num:
            cpu_count = multiprocessing.cpu_count() # my system is 80
            data_len = len(self.data)
            thread_num = min(round(data_len / self.split_num), cpu_count)
            
            # splitting big data and executing multiple processes
            num_parts = thread_num
            part_size = len(self.data) // num_parts
            data_part = []
            for i in range(num_parts):
                start = i * part_size
                end = start + part_size
                if i == num_parts - 1:
                    end = len(self.data)
                    part = self.data[start:end]
                else:
                    part = self.data[start:end]
                data_part.append(part)

            # multiprocessing + asyncio
            resl = []
            pool = multiprocessing.Pool(thread_num)
            for item in data_part:
                reslPool = pool.apply_async(self.asyncrun, args=(item,))
                resl.append(reslPool)
            pool.close()
            pool.join()
            result = []
            for res in resl:
                result.append(res.get())
            result = [num for sublist in result for num in sublist]  # combining split data
            self.result = result
        else:
            result = self.asyncrun(self.data)
            self.result = result

    def asyncrun(self, async_data):
        result = asyncio.run(self.runInfo(async_data))
        return result

    async def runInfo(self, servers):
        coroutines = [
            self.task(server, *self.args, **self.kwargs) for server in servers
        ]
        result = await asyncio.gather(*coroutines)
        return result

I’m sorry, I’m completely lost here. I’m guessing I’m missing something, but I’m still not seeing or understanding how any of this is possibly a Django issue.

There are indeed many conditions and aspects involved in the occurrence of this problem, and it may not be a problem with django. It is just a matter of investigation, as there are indeed many processes of django services appearing on the system. Therefore, I would like to see if I can get any information from django,thanks

A waste of effort. Django will have no visibility into whatever processes are trying to run it.

Thank you for your reply
The may be a problem with multiprocessing,After reading this page

I saw that it described my problem and I suspect that sometimes the fork method can cause some deadlocks. Following the page’s solution, I tried using the “spawn” method to start my process

multiprocessing.set_start_method("spawn")
 pool = multiprocessing.Pool(thread_num)

When executing multi process tasks, django reported an error,How to modify the startup method of multiprocessing in django?

Process SpawnPoolWorker-2:
Process SpawnPoolWorker-1:
Traceback (most recent call last):
Traceback (most recent call last):
File “/usr/local/python3/lib/python3.9/multiprocessing/process.py”, line 315, in _bootstrap
self.run()
File “/usr/local/python3/lib/python3.9/multiprocessing/process.py”, line 315, in _bootstrap
self.run()
File “/usr/local/python3/lib/python3.9/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/usr/local/python3/lib/python3.9/multiprocessing/process.py”, line 108, in run
self._target(*self._args, **self._kwargs)
File “/usr/local/python3/lib/python3.9/multiprocessing/pool.py”, line 114, in worker
task = get()
File “/usr/local/python3/lib/python3.9/multiprocessing/pool.py”, line 114, in worker
task = get()
File “/usr/local/python3/lib/python3.9/multiprocessing/queues.py”, line 368, in get
return _ForkingPickler.loads(res)
File “/usr/local/python3/lib/python3.9/multiprocessing/queues.py”, line 368, in get
return _ForkingPickler.loads(res)
File “/appdata/mainProject/server/teleapps/host/views.py”, line 13, in
from teleapps.host import models
File “/appdata/mainProject/server/teleapps/host/views.py”, line 13, in
from teleapps.host import models
File “/appdata/mainProject/server/teleapps/host/models.py”, line 8, in
class Host(models.Model):
File “/appdata/mainProject/server/teleapps/host/models.py”, line 8, in
class Host(models.Model):
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/db/models/base.py”, line 127, in new
app_config = apps.get_containing_app_config(module)
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/db/models/base.py”, line 127, in new
app_config = apps.get_containing_app_config(module)
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/apps/registry.py”, line 260, in get_containing_app_config
self.check_apps_ready()
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/apps/registry.py”, line 260, in get_containing_app_config
self.check_apps_ready()
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/apps/registry.py”, line 138, in check_apps_ready
raise AppRegistryNotReady(“Apps aren’t loaded yet.”)
File “/appdata/mainProject/server/venv/lib/python3.9/site-packages/django/apps/registry.py”, line 138, in check_apps_ready
raise AppRegistryNotReady(“Apps aren’t loaded yet.”)
django.core.exceptions.AppRegistryNotReady: Apps aren’t loaded yet.
django.core.exceptions.AppRegistryNotReady: Apps aren’t loaded yet.

thanks