I have a long running process in a django function. It uses an iterator to process 100,000 records.
The processing code does a number of queries, standard stuff, filtering, updating, creating new records etc…
The problem I’m having is that every time the loop runs the memory usage goes up by 4mb and eventually hits 64GB triggering the process to be killed. To put it in perspective my complete database is only 300mb.
Looking for any tips to debug this.
What I have done so far is:
- Check the local and global variables - Combined are less than 1mb so don’t believe it is python objects.
I used tracemalloc to print the top 10 stats at the start of every loop and it seems that django/db/utils.py is increaseing by 4mb with every iteration.
Top 10 memory consuming lines:
/home/patrick/.local/share/virtualenvs/envision-oiVnV5sy/lib/python3.13/site-packages/django/db/utils.py:98: size=74.4 MiB, count=22598, average=3453 B
Top 10 memory consuming lines:
/home/patrick/.local/share/virtualenvs/envision-oiVnV5sy/lib/python3.13/site-packages/django/db/utils.py:98: size=78.1 MiB, count=22870, average=3581 B
I have tried adding the following to the start of each loop to try and clear some memory.
gc.collect()
cache.clear()
django.db.reset_queries()
I have also tried turning off caching (dummy backend).
debug is false.
Nothing seems to improve the situation.
Anything else I can try.
Thanks!