File upload blocks memory regardless of UploadHandler

Flauschbaellchen · October 14, 2021, 5:56am

Hi all. I have a question about file uploads.

As this is a feature used by more or less every webserver, I feel that the question might be a bit stupid but I hope you can point me in the right direction.

When I upload large files, it seems that the full request is saved into memory and parsed first, before the upload handlers are called.
This results in an OOM kill of my application if the memory is not sufficient, regardless of if the upload handler would write the file directly into the filespace or not.

I tested it with a docker container limited to 2G RAM and uploading a 6GB file using either the default upload handlers or my own one and also using rest_framework, GraphQL or directly the Django Admin Panel.
Also, even if memory is sufficient, it takes a long time as the file is first read into memory, waits until the upload has finish, then the handler is called and the file is written to disk.

I’ve expected that the file would be written to disk as soon as the upload starts.

Is this an intended behavior - how can I upload large files when I have only limited memory resources?

The upload needs to be within one request/transaction and without any additional client-side libraries (like JS code) as the endpoints are provided as an API to third party tools which use it e.g. by curl.

I am very grateful for any help.

KenWhitesell · October 14, 2021, 12:03pm

I don’t know just how large a file you’re trying to upload, but the answer to your question is more complex than it may appear.

An HTTP request does not go directly to your application code. In a typical production environment, it goes to a web server (Apache, nginx, lightspeed, etc) before being handed off to an “application container” (gunicorn, uwsgi, etc) and from there it is passed to your application.

It’s not until the request gets passed to your application that it can even begin to know how or where that file is supposed to be stored - by which time the entire file has already been uploaded. (Your code doesn’t even begin to execute until the complete request has been received.)

Most of these types of connections (Browser → Web server, Web server → application server) assume that the entire request be passed in memory from one step to the next.

A memory-constrained system where you don’t have complete configuration control over the environment leaves you with very few options - none of them “pretty” or “easy”.

Flauschbaellchen · October 15, 2021, 11:21am

Hi Ken,

thanks for your answer.
You are correct. I completely forgot about the webserver (in my case this is uvicorn).
I will try to narrow down where the memory is actually allocated in my case.

At least within the ASGI handler it seems that it tries to move it into the filespace as soon as it hits the memory limit (defaulting to 2.5MB) [1].
However, this line is only triggered after the client finished the upload which is too late.

Testing it against another project (Puma+Rails) the memory footprint stays the same thus there needs to be a solution for this issue.

Flauschbaellchen · October 19, 2021, 7:07am

I found an issue within the uvicorn repository describing exactly this kind of problem:
WSGI middleware should stream request body, rather than loading it all at once. · Issue #371 · encode/uvicorn · GitHub.
There is an open draft which will hopefully fix this: Improved wsgi middleware, mainly a copy of a2wsgi by euri10 · Pull Request #1049 · encode/uvicorn · GitHub

Topic		Replies	Views
Unable to upload file size >1GB Using Django	1	898	October 28, 2021
Django crashes when upload a big file Using Django	8	3177	January 27, 2021
Uploading files parameters seems to have no effect in deployment Using Django	2	1663	May 18, 2020
django big file upload Forms & APIs	1	1898	May 29, 2022
Process Uploaded file Using Django	2	1577	September 30, 2021

File upload blocks memory regardless of UploadHandler

Related topics