How to update all database entries in intervals

ajfriesen · January 18, 2023, 10:18pm

Hey folks,

I have this Model:

class App(models.Model):
    github_url = models.URLField(default="", blank=False, max_length=300)

    repo = models.CharField(blank=True, null=True, editable=False, max_length=300)
    owner = models.CharField(blank=True, null=True, editable=False, max_length=300)

    github_stars = models.PositiveIntegerField(blank=True, null=True)

Only github_url is populated when creating a new entry. repo and owner are automatically populated by splitting the github_url and will be saved in the database as well.

I need the owner and repo for the PyGtihub client to get data from the GitHub API.
I have tested this in the shell.

Now I want to update all database entries for the App model to get the github_stars.
Since the data is changing frequently I know I can use celery to update the data frequently.

However, I don’t know how I would update all entries.
Can someone push me in the right direction?

KenWhitesell · January 18, 2023, 10:58pm

Create a custom management command to iterate over all instances of App, retrieve the stars for each instance, and save it.

I don’t know how frequently you want to do this, but Celery may not necessarily be your best choice.

ajfriesen · January 20, 2023, 3:12pm

Thank you very much. Did not know about the custom command yet.

For sake of completion here is my custom command:

from django.core.management.base import BaseCommand, CommandError
from hosted.models import App
from github import Github


class Command(BaseCommand):
    help = 'Update all entries GitHub data '

    def handle(self, *args, **options):

        all_apps = App.objects.all()

        for app in all_apps:
            app.get_github_repo_data()
            app.save()
            self.stdout.write(self.style.SUCCESS('Successfully updated github data for app "%s"' % app.github_url))

I understand I can run this via cron, systemd-timers, etc to update the data.
However, I thought that celery would be a good candidate for this but I may have misunderstood.

Regarding the frequency:
I am not sure yet. Maybe daily will be a good interval, but that also depends on the rate limit by the GitHub API. Have to test this.

KenWhitesell · January 20, 2023, 3:37pm

Celery wouldn’t be my first choice for this unless I’m already using Celery and beat for other purposes.

Celery alone isn’t going to do it, you would also need to run the beat process. So adding this to your system requires three new processes - the message broker (e.g. RabbitMQ or redis), the Celery worker task, and the Celery beat task.

If you’re using Celery but not beat, then you’re still adding the beat task.

ajfriesen · January 20, 2023, 3:40pm

I didn’t know I had to run beat as well.
I am currently in no need of celery except for this GitHub data script.
I see your point about the overhead.

I am going to keep it simple with cron for the first iteration.

Topic		Replies	Views
Background Tasks using Celery based on model instances Deployment	5	4188	July 11, 2022
Hourly Tasks Deployment	4	164	January 30, 2025
update value certain time later using django appscheduler Forms & APIs	22	2018	January 21, 2023
initiate Django model data with new data daily and automate this process? Using Django	3	1994	August 12, 2021
insert pandas dataframe from API into model without a form Forms & APIs	3	2289	March 11, 2022

How to update all database entries in intervals

Related topics