Best way to schedule long-running task

Hello everybody.

I’m making a Django-powered (specifically, using Django REST Framework) web application that’s meant for a school. Teachers can create tests made up of multiple-choice questions and students can take such tests.

There’s a view that allows teachers to end an exam, after which it stops accepting answers. What I need to do, once the exam ends, is generate a pdf file for each of the students that took the test, showing the questions and the answers given by the student. The teacher then needs to be able to download a zip file containing all the pdf files.

The issue is that up to 200 students might take the exam at once, so generating the pdf’s will take a long time—what I’m trying to do is find a non-blocking way to schedule the task so the teacher who “closes” the exam (it’s this action that would trigger the generation of the files) doesn’t have to wait minutes on end just to get an OK from the server for the exam closure.

A much better UX would be for the system to immediately send the OK and then show a message telling the user that the pdf files’ generation is in progress.

How would I go about this? I have never done async programming in Django besides with Channels, so I have no experience with async views or with tools like Celery.

I have a feeling Celery might be the way to go, but before I jump into the overhead of learning how to use it, I’d like some guidance on whether that could be the best way to go about this.

In case it’s of any relevance, here’s the source code to the project GitHub - samul-1/js-exercise-platform

If you need any more information to better understand what I’m looking for, ask away!
Thank you to everyone who will take the time to help me.

Celery is probably the easiest way to do this. It’s really not that hard. Give it a shot, see what you come up with. You might want to start with a more trivial task to get used to working with it - rather than going through the full pdf creation process initially, I’d do something like having the celery task change some field in a table to show that the task was executed.

1 Like

I ended up following your suggestion. Here’s what I do now:

when a client asks for the report the first time, the corresponding ExamReport object hasn’t been created yet, so I create it and immediately schedule the celery task, then I send a 202 response to the client. At that point, the client starts polling the server every other few seconds. The server will send a 206 response until the task is done. The cool thing with this approach is I can actually count how many pdf reports I generated already and send that info to the client. At the end of the task, the server will send a FileResponse which is the zip archive containing all the reports.

This is the view: js-exercise-platform/views.py at newexamprogress · samul-1/js-exercise-platform · GitHub
And this is the task: js-exercise-platform/celery.py at newexamprogress · samul-1/js-exercise-platform · GitHub

Works like a charm!

1 Like

I probably spoke too soon… It does work like a charm, but ony in local!

If I run it on dokku, in production, what happens is everything works fine up to the point where the zip archive is saves to a models’ FileField.

However, when I later try to access the file as model_obj.filefieldname, I get a FileNotFoundError. Mind you this only happens in production.

What I think is happening is celery is somehow using a different underlying storage system, therefore the file doesn’t get saved where the rest of django code then looks for it.
I tried a method generating a file and saving it to another FileField outside of celery, and I don’t get this error. I also didn’t get this error with the same exact procedure but when I was still calling it from inside a view as opposed to celery.

Is there any way I can verify this is the case, and fix it? If any of you has a clue, that’d be very helpful.

I always build my celery tasks as part of the primary project - that way the tasks are using the same settings and models as the main application. I’ve never run into anything as you’ve described it.

I don’t think I’d be able to offer any suggestions without seeing the appropriate sections of the settings, tasks and models involved.

celery.py (there’s only one task): js-exercise-platform/celery.py at main · samul-1/js-exercise-platform · GitHub

celery settings: js-exercise-platform/base.py at main · samul-1/js-exercise-platform · GitHub

model method that’s called: js-exercise-platform/models.py at main · samul-1/js-exercise-platform · GitHub the involved model is the one this method belongs to

I’m learning now that the issue is probably needing to mount a persistent storage point with dokku (which I’m no expert of). I’m trying to get that to work. If you have any input, it’s much appreciated!

Yes, if these are docker containers, you do need to ensure they’re using the same volume(s) for file storage. (I’ve never used dokku, I don’t really know anything specific about it other than what I can learn browsing their web pages.)

So from their docs: Dokku - Docs - Advanced Usage - Persistent Storage

USE CASES

Sharing storage across deploys

Dokku is powered by Docker containers, which recommends in their best practices that containers be treated as ephemeral. In order to manage persistent storage for web applications, like user uploads or large binary assets like images, a directory outside the container should be mounted.

Shared storage between containers

When scaling your app, you may require a common location to access shared assets between containers, a storage mount can be used in this situation.

1 Like

yeah, I was reading that part too… I guess that’ll solve my issue. I’ll post here once I gave it a try and see if it works. Thank you

so, what I did is I followed dokku docs and created a storage mount point in my app, then edited my MEDIA_ROOT setting to point there. now, I can see that the files being saved in FileFields when the method is called directly by django are indeed being saved to the permanent storage outside of the container.

However, if celery invokes a model method that saves the file to a FileField, the file still isn’t being saved to that directory (it probably gets saved somewhere in the container) and isn’t accessible. Is there any celery specific setting I need to take care of? This seems very weird.

While the celery container is running, open up a shell in the container to see if you can locate the file. (I don’t know if you can do that with dokku or not.) If you can find the file in the intended directory, then something isn’t mounted correctly. If you can find the file but it’s somewhere else, then check everything regarding where that file is being written. If you can’t find the file at all, then you need to try and diagnose why you’re not seeing a file.

if I try to access the FileField from shell (I don’t know what gets stored in the db itself, but when you access a FileField it resolves to a class in django that you can read the name and path from), i get this:

>>> e.csv_report.path
'/storage/exam_reports/10/PALGO_-_III_Estivo_2021_12Luglio.csv'
>>> e.zip_report_archive.path
'/storage/exam_reports/10/PALGO_-_III_Estivo_2021_12Luglio.zip'

csv_report is the file that’s accessible (not genereated by celery), wheras zip_report_archive is the problematic one --weirdly enough, the claimed path is the same

however, if I call readline() for the first file, I get a line of text from the file. In the second case, I get:

>>> e.zip_report_archive.readline()
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/app/.heroku/python/lib/python3.8/site-packages/django/core/files/utils.py", line 44, in <lambda>
readline = property(lambda self: self.file.readline)
File "/app/.heroku/python/lib/python3.8/site-packages/django/db/models/fields/files.py", line 44, in _get_file
self._file = self.storage.open(self.name, 'rb')
File "/app/.heroku/python/lib/python3.8/site-packages/django/core/files/storage.py", line 38, in open
return self._open(name, mode)
File "/app/.heroku/python/lib/python3.8/site-packages/django/core/files/storage.py", line 238, in _open
return File(open(self.path(name), mode))
FileNotFoundError: [Errno 2] No such file or directory: '/storage/exam_reports/10/PALGO_-_III_Estivo_2021_12Luglio.zip'

so apparently django thinks the file is saved at that location, but it doesn’t get saved there.

I manually called the method that generates the zip archive in the shell, and this time the file was located correctly and worked fine. So the problem is definitely with celery running the method.

Ok, that’s a good start. But I should have been more clear - I meant a bash shell, not a django shell. My suggestion is to try and locate that file anywhere in the file system.
(You probably also want to check the logs to ensure there’s not something silly like permissions getting you messed up.)

Hey,

So, yesterday I spent a whole bunch of time trying to get to the bottom of this with dokku’s creator. What I noticed is that, sometimes the process actually works and finishes fine. On those times, the file is located no problem.

This issue is hard to reproduce, it almost seems random–sometimes it works, sometimes it doesn’t. Something I noticed, though, is that the times it doesn’t work, it doesn’t even look like the celery task gets scheduled.

What’ll happen is the model object that’s supposed to end up having the generated file in a field will have a nonexistent file, and none of the prints from the method that celery is supposed to invoke will appear on the logs either.

I’ll give you an example.

As you can see, the first time this task is ran without issues and at 0:08 it produces the file correctly. When I attempt it again at 0:18, it’ll jump straight to trying to download a nonexistent file. How did that file even get in the FileField? No celery task was scheduled.

Also, note that I added a post-delete signal that deletes files in a FileField when the model object is deleted. So, the new file name should be the same as the old one, but it’s not the case–you see that little random sequence added at the end.

I honestly have no idea what’s going on. I know I’m not providing a whole bunch of info, but it’s hard to even try and reproduce the issue. If you have any clues, that’d be very helpful.

The view that’s trying to send a FileResponse with the nonexistent file is: js-exercise-platform/views.py at main · samul-1/js-exercise-platform · GitHub
So somehow, an ExamReport gets created, has a file in the zip_report_archive field, and isn’t in_progress. All of this without celery apparently doing anything…?

Update (yes, I’ve been at this all day long):

Something felt very off since I also realized that the debug prints I added in the method that the celery task was invoking weren’t showing up in the server console (to be precise, they were only showing up sometimes, as if some of the times the celery worker was being ran from a different process than the one I had access to).

So, I commented out everything in the celery task and left it looking like this:

@app.task(bind=True)
def generate_zip_archive(self, exam_id, user_id):
    from jsplatform.models import Exam, ExamReport

    logging.warning("IN AND OUT OF HERE")

after redeploying, wanna know what happened? Some of the times, trying to schedule the task would print IN AND OUT OF HERE and exit, BUT some of the times, the old task was being executed!

So I went ahead and commented out the whole method in the model itself, that was being previously called by celery. Wanna know what happened then? Same as above. Nothing changed. Celery is calling and successfully running a method that effectively doesn’t exist anymore. I know that happens because the ExamReport is still created, and half of the times I’m actually able to download the zip. The other half I still get the FileNotFound exception as yesterday.

This is by far the weirdest thing that’s ever happened to me with django. I honestly have no idea where to go from here.

My initial reaction to that is that it could be a docker related issue. I’d stop the container and make sure it’s stopped and gone and then rebuild it with your current code.

1 Like

I tried doing like you suggested, and while I still haven’t fixed the issue it appears we could be starting to figure something out.

Earlier, I tried commenting out the task altogether, and once I un-commented it, redeployed etc., I realized that now about half the time trying to schedule the task was working, and half of the times it was failing with this message:

Task of kind 'core.celery.generate_zip_task' never registered, please make sure it's imported.

Moreover, while monitoring the logs of my celery process, I realized that the times the task was failing, there was no output to the console, despite there being several debug prints in the task code.

This feels to me like the workers from the previous deploy somehow still hang around, and sometimes they get to run the task, which fails because, at the time of the last deploy, it wasn’t defined.

This might also explain why, when the zip archive was generated earlier, sometimes the error FileNotFound was raised–I made some changes to the storage via dokku, but the “old” worker(s) wouldn’t see them.

In the django shell, if I run generate_zip_task.delay(exam_id=..., user_id=...), approximately one third of the time will produce output in the celery console, along with a successfully ran task. The other times, the task will fail with no output to the console and the info message I pasted above.

Now the only two questions remaining are, how do I find those zombie celery workers? And how do I kill them?

EDIT:

After some more tries, and thanks to the help of the creator of dokku, I figured out the problem was caused by some old containers sticking around after the re-deploys. It now seems to be working fine. Thanks to everyone involved who tried to help me!

2 Likes

Just to make a note for future lookups to this thread, Django Background Tasks — django-background-tasks latest documentation may also be used. It is very simple to setup and yet it makes the job.

1 Like