I have this Model:
github_url = models.URLField(default="", blank=False, max_length=300)
repo = models.CharField(blank=True, null=True, editable=False, max_length=300)
owner = models.CharField(blank=True, null=True, editable=False, max_length=300)
github_stars = models.PositiveIntegerField(blank=True, null=True)
github_url is populated when creating a new entry.
owner are automatically populated by splitting the
github_url and will be saved in the database as well.
I need the
repo for the PyGtihub client to get data from the GitHub API.
I have tested this in the shell.
Now I want to update all database entries for the App model to get the
Since the data is changing frequently I know I can use celery to update the data frequently.
However, I don’t know how I would update all entries.
Can someone push me in the right direction?
Create a custom management command to iterate over all instances of
App, retrieve the stars for each instance, and save it.
I don’t know how frequently you want to do this, but Celery may not necessarily be your best choice.
Thank you very much. Did not know about the
custom command yet.
For sake of completion here is my custom command:
from django.core.management.base import BaseCommand, CommandError
from hosted.models import App
from github import Github
help = 'Update all entries GitHub data '
def handle(self, *args, **options):
all_apps = App.objects.all()
for app in all_apps:
self.stdout.write(self.style.SUCCESS('Successfully updated github data for app "%s"' % app.github_url))
I understand I can run this via
systemd-timers, etc to update the data.
However, I thought that celery would be a good candidate for this but I may have misunderstood.
Regarding the frequency:
I am not sure yet. Maybe daily will be a good interval, but that also depends on the rate limit by the GitHub API. Have to test this.
Celery wouldn’t be my first choice for this unless I’m already using
beat for other purposes.
Celery alone isn’t going to do it, you would also need to run the
beat process. So adding this to your system requires three new processes - the message broker (e.g. RabbitMQ or redis), the Celery worker task, and the Celery
If you’re using Celery but not
beat, then you’re still adding the
I didn’t know I had to run
beat as well.
I am currently in no need of celery except for this GitHub data script.
I see your point about the overhead.
I am going to keep it simple with
cron for the first iteration.