Reloading the same page drives memory usage up until the workers die

I’m (now) on Django 4.0.1 and deploying to Google App Engine instance class B2. I’ve been having problems with my worker processes regularly quitting and getting restarted. I finally was able to find a debugging message explaining that they are being terminated due to OOM:

2022-01-26 12:38:05.664 PST Exceeded soft memory limit of 512 MB with 515 MB after servicing 86 requests total. Consider setting a larger instance class in app.yaml.
2022-01-26 12:38:05.664 PST After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.

If I reload one of my particularly intensive pages several times in a row I can kill the workers within 2-4 page loads. I have a couple questions here:

  1. Why would the memory usage increase over time? If responding to a single request required more than 512 MB this would make sense, but if that were the case my page would never be able to load. Is there some way I could be leaking memory so that after this (admittedly somewhat heavy) page is loaded I’m not releasing the memory? Aside from pandas I’m not really using any external libraries. How is it even possible to leak memory in a python app like this?

  2. On my laptop I’m not encountering this issue, probably because I have a lot more memory than the instance running out on GAE. Is there a way to see where all of my memory is going? I’m using PyCharm Professional for an IDE but it’s not clear how to leverage it to figure out where my memory is going. How would you recommend beginning the debugging process here? Considering that I’m blowing past 512 MB in just a couple reloads of this page, there has to be something pretty dramatic going on that uses memory.

  3. Are there any really dangerous anti-patterns w/ the ORM that could increase memory usage in this way? I enabled QueryCountMiddleware to keep track of the number of queries, and the numbers aren’t too crazy. I wonder if there is something systematically wrong because although I can trigger the OOM faster with a couple very intensive pages I eventually watch the workers crash and reboot on their own without going into these corners of the app.

Thanks for the help, people have been really wonderful in this forum and I appreciate it.

The short answer is “Yes, it’s very easy to blow by 512 MB”, and “Yes, there are reasons why the garbage collector may not free all the memory being used”, and “Yes, external libraries (such as pandas) can be a significant contributing factor”, and “Yes, large queries can very much eat up the memory.”

There are reasons why tools such as uwsgi have options like max-requests to restart a worker periodically.

See the Python gc module for some facilities available for tracking memory usage.

After a request is rendered, how could we hold onto memory in a queryset or a data frame? All of the local variables are out of scope at that point, shouldn’t it fairly straightforward? (Clearly not, I suppose my question is how is this happening).

Unfortunately, the type of detailed answer you may be looking for is
(A) At or beyond my personal knowledge
(B) Far more intricate a topic than what would be suitable here.

I take a very pragmatic approach to such things.
I know it happens.
I’ve read enough about it to have a general feel why.
I know what to do about it as far as it affects me - and yes, in this case it translates to “throwing hardware at the problem”.
And I let it go at that because I generally have more urgent items to address.

There are enough resources out there, documentation, blog posts, etc, to where you should be able to find the answers you’re looking for, and your options for solving them.

Just keep in mind that this is a Python issue more than just being a Django issue. Any long-running non-trivial Python program doing a lot of object creation will tend to grow larger than you might otherwise expect.