Hey folks,
I have this Model:
class App(models.Model):
github_url = models.URLField(default="", blank=False, max_length=300)
repo = models.CharField(blank=True, null=True, editable=False, max_length=300)
owner = models.CharField(blank=True, null=True, editable=False, max_length=300)
github_stars = models.PositiveIntegerField(blank=True, null=True)
Only github_url
is populated when creating a new entry. repo
and owner
are automatically populated by splitting the github_url
and will be saved in the database as well.
I need the owner
and repo
for the PyGtihub client to get data from the GitHub API.
I have tested this in the shell.
Now I want to update all database entries for the App model to get the github_stars
.
Since the data is changing frequently I know I can use celery to update the data frequently.
However, I don’t know how I would update all entries.
Can someone push me in the right direction?
Create a custom management command to iterate over all instances of App
, retrieve the stars for each instance, and save it.
I don’t know how frequently you want to do this, but Celery may not necessarily be your best choice.
1 Like
Thank you very much. Did not know about the custom command
yet.
For sake of completion here is my custom command:
from django.core.management.base import BaseCommand, CommandError
from hosted.models import App
from github import Github
class Command(BaseCommand):
help = 'Update all entries GitHub data '
def handle(self, *args, **options):
all_apps = App.objects.all()
for app in all_apps:
app.get_github_repo_data()
app.save()
self.stdout.write(self.style.SUCCESS('Successfully updated github data for app "%s"' % app.github_url))
I understand I can run this via cron
, systemd-timers
, etc to update the data.
However, I thought that celery would be a good candidate for this but I may have misunderstood.
Regarding the frequency:
I am not sure yet. Maybe daily will be a good interval, but that also depends on the rate limit by the GitHub API. Have to test this.
Celery wouldn’t be my first choice for this unless I’m already using Celery
and beat
for other purposes.
Celery alone isn’t going to do it, you would also need to run the beat
process. So adding this to your system requires three new processes - the message broker (e.g. RabbitMQ or redis), the Celery worker task, and the Celery beat
task.
If you’re using Celery but not beat
, then you’re still adding the beat
task.
1 Like
I didn’t know I had to run beat
as well.
I am currently in no need of celery except for this GitHub data script.
I see your point about the overhead.
I am going to keep it simple with cron
for the first iteration.