Recommendations on how to do a repeated background task

Hi i have use django to do a couple projects mainly usign Django REST Framework, im not a seasoned developer by any means.

Right now basicly i need to call an external API to get some data but, i need do it using a long polling aproach,so i’ll need to set up a task that will repeat around maybe every 5 seconds that makes a request to the API, then i will filter some stuff form the body of the response and store this data(in a global python variable or some form of cache i guess) so that way when the client makes a resuqest to my DRF endpoint i will return this data that i have saved from the previus request made to the external API. As right now i only need one task for a single espesific endpoint of the external API, but this could increase to around 4 or 5 task that will do something similar but for diferent enpoints of the same external API.

Searching online the use of celery with redis came up a lot, so if i go down this route, redis will also be the cache where i will store the data that i get from the request to the external API, but for this i will probably also have to set a different machine(a cloud VM/server) to run the celery workers, or thats what i think it will be better option.

so i also saw that there are other tools to set recuring background task in django, like for example Huey, django-rq and async_rq. But my question is, what di you guys think from your experience that will be the best way to aproach this problem?

Also i get that going the celery-redis way is going to be more expensive money wise for the deployment, do you think is worth it?

thanks a lot for yout help guys.

You would use Celery if the tasks are to be run “on demand”. If you’re already using Celery for on-demand tasks, then you could use Celery Beat for scheduled tasks.

However, for scheduled (periodic) tasks, my first and primary recommendations remain the use of standard OS-provided facilities. Cron would be my first choice, a Systemd timer task would be my second choice, and one of the other native periodic task schedulers would be my third choice.

I absolutely do not, under any circumstance, recommend the use of an “in-Django” tool for this. There are too many potential issues and edge-cases for me to seriously consider one of them.

So in this case it will be for example a cron job on a linux AWS-EC2 VM that will execute a python script, i have a doubt here, how well will this play interacting with the cache(redis or another thing), since in this case it seens like the main way that this python script will be able to let the Django server know the result of filtering the data from the API response will be via the cache.

Or, perhaps more precisely, a custom Django management command.

Don’t. This works extremely well.

I’ve referenced a couple times here a project that I worked on, where a Raspberry Pi was collecting data. The management command (a persistent process) would read the sensor data and update a redis cache. The Django app (also running on the Pi) would read the cache in the view to prepare the page. (There was more to it than that, but that’s the basic idea.)