I’m trying to send a big file (more than 2.7 GB) with axios to Django:
const formData = new FormData()
formData.append('myFile', myFile)
const config = {...}
axios.post(url, formData, config)...
Now, it sends all the data, but the memory usage starts growing even before the view starts!
def my_view(request: HttpRequest) -> HttpResponse:
print('Starts the view')
...
If the file is small the message prints correctly, now with the huge file the server crashes due to memory usage before the print
is reached. I’ve tried to change the upload handler to only use disk in settings.py
:
FILE_UPLOAD_HANDLERS = ['django.core.files.uploadhandler.TemporaryFileUploadHandler']
But had the same result. I don’t know whats happening, I can’t even try this solution as nothing of the view’s code is executed. What I’m missing? Any kind of help would be really appreciated
You need to construct your POST such that the data gets submitted as “Content-Type: multipart/form-data”, this allows Django to see the multipart header and write the intermediate data out to disk.
(Note, your observation is correct, all this happens before your view is invoked. The server needs to receive the entire request before it can be dispatched to your view.)
Hi Ken, first of all thank you so much for answering. Unfortunatelly the content-type header didn’t do the trick. Here there are two pictures showing headers and memory usage. The server consumes 2.5GB until the SO kills the process.
Do you any another idea of what could be going on?
What server are you using to run your application? (Apache? nginx? Gunicorn?) It’s possible that that may be caching the entire request in memory before even handing it off to Python.
1 Like
I’m using the Django development server.
I noticed something interesting: I made my own upload handler because maybe there was some bug which made Django consider the uploaded file as a one which fits in memory. So I defined the following class:
class MyUploadHandler(TemporaryFileUploadHandler):
def new_file(self, *args, **kwargs):
print('Creating new file')
super().new_file(*args, **kwargs)
def receive_data_chunk(self, raw_data, start):
print('Received data chunk')
super().receive_data_chunk(raw_data, start)
def file_complete(self, file_size):
print('File completed')
return super().file_complete(file_size)
And setted in settings.py
:
FILE_UPLOAD_HANDLERS = ['api_service.utils.MyUploadHandler']
If the file is small it works corretly printing all the information in every stage. But if the file is big it doesn’t print anything! So, in my opinion, it indicates that there’s a memory leak in some place between the user request and the upload handler. What do you think? Should I report this in Django issues page?
Sorry for my ignorance
I do not think there’s a memory leak involved. I think it’s just an issue of where the data is being held as it’s passed from stage to stage. Again, the symptoms you’re describing are explained by the idea that an earlier stage of the process is holding all the data before passing it along to the next stage.
I’d give this a try using either gunicorn or uwsgi behind nginx, making the appropriate configurations for each.
Sorry but I don’t understand, If it’s not a memory leak then it’d be a wrong usage of an UploadHandler, what would be the usage of it if it can not manage big files, which is what it was created for?
Unfortunatelly I’ve just tried with NGINX + Daphne with the same result. Seems to be a framework problem
No, you’re thinking that it’s in the upload handler, when I’m saying it’s happening before that. The http server has to receive the entire request before it hands it off to Django. Your code is never seeing the data because the server is exhausting memory before the request is completely received.
Do you have the nginx upload module installed and configured? If so, can you post your configuration for it? (Hmmm, I’ve never used Daphne behind it either, I don’t know if there are any configuration options for it.)
If by “framework problem”, you mean that it’s an artifact of the design of the HTTP protocol, you’d be correct. Web servers from the very beginning were designed to cache the entire request in memory before handing it along to the next stage of the process. There’s a reason why many tools for web-based file management set upload limits in the 10 - 250 MB range.
1 Like
Sorry if I am expressing myself badly, English is not my native language. I understand perfectly and I agree with you that the UploadHandler code is never executed.
It is a pity that the problem cannot be solved in a natural way, 2.5 GB is not a big file for this time. I will try to use the plugin you recommended and see if I can upload the file in batches.
Thank you very much for your help