How to update all database entries in intervals

Hey folks,

I have this Model:

class App(models.Model):
    github_url = models.URLField(default="", blank=False, max_length=300)

    repo = models.CharField(blank=True, null=True, editable=False, max_length=300)
    owner = models.CharField(blank=True, null=True, editable=False, max_length=300)

    github_stars = models.PositiveIntegerField(blank=True, null=True)

Only github_url is populated when creating a new entry. repo and owner are automatically populated by splitting the github_url and will be saved in the database as well.

I need the owner and repo for the PyGtihub client to get data from the GitHub API.
I have tested this in the shell.

Now I want to update all database entries for the App model to get the github_stars.
Since the data is changing frequently I know I can use celery to update the data frequently.

However, I don’t know how I would update all entries.
Can someone push me in the right direction?

Create a custom management command to iterate over all instances of App, retrieve the stars for each instance, and save it.

I don’t know how frequently you want to do this, but Celery may not necessarily be your best choice.

1 Like

Thank you very much. Did not know about the custom command yet.

For sake of completion here is my custom command:

from django.core.management.base import BaseCommand, CommandError
from hosted.models import App
from github import Github


class Command(BaseCommand):
    help = 'Update all entries GitHub data '

    def handle(self, *args, **options):

        all_apps = App.objects.all()

        for app in all_apps:
            app.get_github_repo_data()
            app.save()
            self.stdout.write(self.style.SUCCESS('Successfully updated github data for app "%s"' % app.github_url))

I understand I can run this via cron, systemd-timers, etc to update the data.
However, I thought that celery would be a good candidate for this but I may have misunderstood.

Regarding the frequency:
I am not sure yet. Maybe daily will be a good interval, but that also depends on the rate limit by the GitHub API. Have to test this.

Celery wouldn’t be my first choice for this unless I’m already using Celery and beat for other purposes.

Celery alone isn’t going to do it, you would also need to run the beat process. So adding this to your system requires three new processes - the message broker (e.g. RabbitMQ or redis), the Celery worker task, and the Celery beat task.

If you’re using Celery but not beat, then you’re still adding the beat task.

1 Like

I didn’t know I had to run beat as well.
I am currently in no need of celery except for this GitHub data script.
I see your point about the overhead.

I am going to keep it simple with cron for the first iteration.