We have several expensive operations in our Django app. They perform heavy work involving scheduling, potentially taking a long time to run, much processing and many I/O operations.
Very rarely the user request would time out because of how long these can run, so we were thinking about sending their execution to the background with Celery, where we can tweak the time limit specifically to these tasks.
However, this means suddenly offloading a large portion of the workload from our web server and sending it to Celery.
I am not sure if Celery is designed to handle so much work and so many tasks, but rather to execute light background tasks. Surely the web server is better suited and optimized?
Hi,
we use it a lot, we have thousands of diferente tasks and millions of executions per minute, and we use it for reports and things we call “heavy”. the longest task i know takes 45 minutes, but i think it is better to split it in multiple ones.
Among other things, Celery is designed to be scaled horizontally. We have a data collection tool that requires processing on incoming data. The data comes in at a peak of about 15,000,000 requests / day. In order to keep up with that load, we run 4 instances of the celery worker on up to 4 different machines. Celery has managed that perfectly for years. (It’s no longer operating on that peak load, but it could.)
Yes celery is a very good choice for offloading work, imho the best you’d get in pythonland.
I stumbled over a few of your remarks:
many I/O operations
For very IO heavy stuff you’d have to keep in mind, that the brokers are meant to get the work data through messaging in the first place. So if your work relies heavily on data transformations with FS access, you might want to check if the data transport will be a new bottleneck, and whether you can make the data available by other means (NFS share?)
Surely the web server is better suited and optimized?
Well that depends on your workload and the bare metal specs of your webserver vs. the worker nodes. If your webserver is a Zen 32-core thingy, and your worker nodes are 10 raspberry pis - well prolly keep things on the webserver. If your workers are equally specced, but all the data resides on the webserver w’o easy way to get it over to the workers - then you’d have to find the sweet spot between costly data transport vs. webserver CPU utilization.
Last but not least with scaling across different processes/machines you may run into persistence issues, and how to safely merge things back when done. If thats an issue for you, maybe look into HA and replication with celery. This may also apply to your DB setup (single instance vs. master/slave vs. sharded) and your transaction strategy in particular.