I have a long running simulation task that executes via a django management command. Takes about 2 hours to run. The task reads a bunch of data from the db, and then progressively writes results back to the db.
Because of the nature of the simulation it runs as one process, so the process is either doing cpu based calculations or writing to the database. Profiling shows that time is spent 60% calculation and 40% writing to db.
It’s not very efficient as I have 15 cores idling while one core switches back and forth between working/writing.
Does anyone have any suggestions on a good way to offload the db writing tasks to another process?
You’ve got at least three ways that I am aware of. The choice between them depends in part on your comfort level with each, how much data you’re generating to be written, and what facilities you’re currently using within your project.
- Use the multiprocessing module to spawn the writing task off to its own process and use a Python Queue to pass the data from the calculator to the writer.
- Set up the writing task as a celery worker and allow the calculator to call it through celery.
- Set up the writing task as a separate Django management command, then use redis as a queue to pass data.
(There are probably minor variations on each of these ideas, along with some others, but these are the three basic variations that I’ve used in the past for similar requirements.)
Thanks @KenWhitesell some great ideas there I can investigate.