I wrote up an article on monitoring and alerting based on queue wait times in Celery. There are a lot of important metrics to watch, but I think queue wait time is a good reflection of what your users are actually experiencing.
It’s a little tricky getting queue wait time set up as a metric to monitor, so if nothing else, this post will serve as a good set of notes for myself!
Sharing here in case anyone else finds this useful.
Monitoring Celery in Production
Happy to hear feedback or alternative ideas on Celery monitoring!