I need to implement a Django endpoint which is able to receive a large unsorted JSON payload, sort it and then return it. I was thinking about:
-
ijsonstreams over JSON arrays, yielding items without loading the whole file. -
Each chunk is written to a temporary file, sorted.
-
Then
heapq.mergemerges them like an external sort -
Then the data is returned using
StreamingHTTPResponse
But I’m currently stuck on getting the data in. I’m using the Django dev server and I think the issue is that the dev server buffers the entire request body before passing it to Django, meaning incoming chunks are not available incrementally during the request and large JSON payloads will be fully loaded into memory before the view processes them.
So, my questions are if this is a viable idea and do I need something like gunicorn for this ? I’m not looking to build a production grade system, just a working poc.
The task is (homework), not to build a production grade system, just to sort and return a “large” json given a memory constraint. So, the way I see it, I don’t have much other choice than to use disk space in form of temporary files. I don’t want to get bogged down in infrastructural complexity.
Thanks in advance. I’d be very grateful for any tips, ideas or just being pointed in the right direction.