Streaming download performance question

I discovered today that we had a user who was scripting our site to perform downloads. They were behind the numerous 503/504 errors we’d been getting the past few days. They had 20 concurrent workers that were each doing a download. They were working around a streaming download performance issue that I’m yet to address and it got me thinking.

The download functionality on our advanced search interface was originally implemented about 3 or 4 years ago when I was just learning not only Django, but python as well. I really didn’t know what I was doing (although I did know in a different language what I was doing), and I’d implemented the streaming download in a sub-optimal way.

In another thread on here earlier this year, I’d been given advice to use the python csv package for another streaming download interface for our ListView pages (which has a simple search interface), because what I’d done on the advanced search (gasp!) was using templates.

So my question is, when it comes to downloading the results of an ORM query, are there any other strategies I should consider instead of streaming strings created from a generator that uses the csv package? One of my colleagues recommended writing to a file first and downloading the file so that the user would get a progress bar, but I don’t know if that’s worth exploring or not?

With what I did for the ListView pages, some pages are slower than others due to some many-related data inclusions (delimited values in the output table for those columns) and some aggregate annotations. If I was just downloading basic data from a single table, there would be no issue at all with the csv strategy, as it has been generally lightning fast until you start going beyond a single table. So I’m also considering maybe some indexes and other enhancements. I did go through general optimization strategies for the list view downloads, like using the debug toolbar and eliminating duplicate queries. I need to do the same on the advanced search interface as well. But all things being equal and assuming I optimize things well, what general strategy for downloading search results is best practice?