I’m (now) on Django 4.0.1 and deploying to Google App Engine instance class B2. I’ve been having problems with my worker processes regularly quitting and getting restarted. I finally was able to find a debugging message explaining that they are being terminated due to OOM:
2022-01-26 12:38:05.664 PST Exceeded soft memory limit of 512 MB with 515 MB after servicing 86 requests total. Consider setting a larger instance class in app.yaml.
2022-01-26 12:38:05.664 PST After handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
If I reload one of my particularly intensive pages several times in a row I can kill the workers within 2-4 page loads. I have a couple questions here:
-
Why would the memory usage increase over time? If responding to a single request required more than 512 MB this would make sense, but if that were the case my page would never be able to load. Is there some way I could be leaking memory so that after this (admittedly somewhat heavy) page is loaded I’m not releasing the memory? Aside from pandas I’m not really using any external libraries. How is it even possible to leak memory in a python app like this?
-
On my laptop I’m not encountering this issue, probably because I have a lot more memory than the instance running out on GAE. Is there a way to see where all of my memory is going? I’m using PyCharm Professional for an IDE but it’s not clear how to leverage it to figure out where my memory is going. How would you recommend beginning the debugging process here? Considering that I’m blowing past 512 MB in just a couple reloads of this page, there has to be something pretty dramatic going on that uses memory.
-
Are there any really dangerous anti-patterns w/ the ORM that could increase memory usage in this way? I enabled QueryCountMiddleware to keep track of the number of queries, and the numbers aren’t too crazy. I wonder if there is something systematically wrong because although I can trigger the OOM faster with a couple very intensive pages I eventually watch the workers crash and reboot on their own without going into these corners of the app.
Thanks for the help, people have been really wonderful in this forum and I appreciate it.