Hourly Tasks

Hi,

Question on running hourly batch jobs.

Every hour i collect data from various data sources and insert this data into my Django DB using SQL Commands running from seperate python functions running in AKS there are around 10 functions that run.

Id rather do this using Django functions, but was concerned about the load impacting the performance of the app for my end users.

The data being inserted isn’t a lot, maybe 1000 rows, 10 columns per row and i doing batch inserts so i dont think the inserts would affect the database/app performance too much.

Im thinking that if i just used celery to execute my django functions, would this be ok? What do other people do in this scenario?

Thanks

Tom

Tom

You could use Celery, but then you would still need to implement Celery Beat to have these run on a schedule.

We do it an easier way. We set up a cron job to run tasks on a regular and periodic basis. Those jobs run custom management commands to do different things, like send out scheduled emails. (We don’t have any hourly jobs, but we do have daily and weekly jobs.)

A mere 1000 rows as a batch insert is generally going to be negligible to your application. Fortunately, it’s easy enough to test - especially if you set it up as a custom management command.

I run Celery and Celery Beats already in my app that runs some simple overnight calcs. So i could expand that out to run more tasks? I suppose i would just create more workers?

But lets say my app grows to needing to insert 10,000 rows of data each hour? What would be a good solution to handle that?

Im feel like some kind of staging database might be sensible?

Whether you need to “expand” that depends upon what you’re currently doing.

Whether more workers are needed depends upon what you’re currently doing.

There’s no one “pat answer” for questions like this. Anything said here would be raw conjecture.

You’re only going to be able to determine this by testing. The possible solutions for this are also going to be sensitive to the context in which this is being done.

In the general sense, no - I don’t see where a staging database is going to be particularly helpful. You’re taking the system load of updating one database and doubling it.

Remember, both your Celery workers and cron jobs are running externally to your Django project. They are separate processes that are not going to directly interfere with your website processes. The only point of intersection between these is going to be the database itself.

Understood. Thanks, Ken.

I think i will test with Celery/Beats and see how i get on.

Thank you