Celery vs Rabbitmq for this particular use case

I’m finding myself having to decide between at lest two options as to how to implement a specific use case for my Django app.

Essentially, it’s a webapp that allows user to submit code relative to JS exercises and C exercises. Those exercises have test cases associated to them, to the user-submitted code is evaluated against those to ultimatelty give the user feedback as to whether they passed the tests.

When it comes to actually executing the code, I’m going to use vm2 package for JS code and isolate for executables compiled from C.

What I’m debating is how I should actually get to run those from my Django code. Currently, what I’m doing (I have only implemented JS execution so far) is, when a method on my Submission model is called to evaluate the code, I’ll call subprocess.check_output passing in the node script and the stringified test cases as input. The script will output an execution results object that’ll be saved in a JSONField in the submission object. This approach is very simple and has been working well so far, but it does come with some drawbacks, namely the synchronous execution of code which inevitably slows down my webserver since it’s being executed in the same container as the Django app.

Moverover, C submissions come with a two-step process for execution: first, gcc has to be called passing in a file containing the user code; then, isolate can be run passing in the generated executable. There is no way this can work well if done synchronously.

So I’m finding myself having to pick one of two options:

  • use celery. A JS execution task will pass in a submission and the exercise test cases, and subprocess.check_output will be called as I’m doing now, but this time it’ll be in a separate container because celery runs in a different process than the Django app itself. For C stuff, I would have to run shell commands inside of the task code, for example via os.system. First, I would call gcc and then isolate, once I get the executable. Task code would also have to handle deleting temporary files (we generally don’t need to keep executables around) and saving the results to the JSONField I mentioned above.

  • use rabbitmq. I would have to make a separate script, say node, that would run a worker program which indefinitely pops messages out of a shared rabbitmq queue sent from Django. Django would simply enqueue those messages using a rabbitmq client such as pika. Those messages would have to be a little more complex, e.g. specify a “type” field (run_js, compile_c, run_executable) and the relevant payload. Django would periodically try and dequeue response messages sent from node and process them (i.e. save them to the JSONField). The advantage of this is the worker is completely separated from Django and it’s easier for me to create an ad-hoc dockerfile for it and stuff like that. However, I still don’t have clear in my mind how Django would wait to dequeue messages from rabbitmq. It would either have to be blocking (defeating the purpose of this architecture altogether), or I would have to run a worker node to do it (which would essentially have me fall back to option 1).

So yeah, I’m a little confused as to how to proceed. Maybe you have something different in mind altogether?

Thank you in advance for the input :slight_smile:

Use both? and supervisor or similar?

Celery actually make use of rabbitmq (or redis or others) as its message broker. As far as I’m aware, Celery needs a message broker?

You could probably use rabbitmq directly without the overhead of Celery on top, though how easy would that be to implement? Just a thought.

Confirming the previous response - Celery does require a message queue of some type. See Backends and Brokers — Celery 5.2.3 documentation

So from your perspective, you’re making a choice between:

  • Use RabbitMQ directly, writing all the interface and task management code yourself.
  • Use Celery, allowing it to manage the RabbitMQ connections.

You actually have a couple different variations on this. For instance, you could go the Django Channels route, creating what Channels refers to as a Worker process that runs in the background. This gives you the benefit that you could also use this platform to create / manage / use a Websocket connection between the browser and your Django app, allowing the Channels layer to push notifications to the browser to (for example) update the status of the background process.

1 Like

I had been considering using Channels for this, since I’m already going to use it to implement another feature on the same app.

Does the task of compiling/executing the submitted code itself have to be wrapped in a celery task that gets scheduled by my channels consumer, or is there a way I could run the task in the consumer itself, but without blocking it? I feel like channels + celery might end up being too many levels of indirection, so if I could get away with only channels at this point, that’d probably be better.

See Worker and Background tasks. They’re a separate process, just like you would have with Celery, but without using Celery.

1 Like

Gotcha, so I should be able to do without celery. I’ll report back once I have a working solution. Thank you!

You should be aware that there are issues around this decision that go beyond just identifying what might be easier or faster.

Pay particular attention to the first note on the referenced page. There are very real differences between those two choices. You want to be sure you understand the implications of the difference between “At most once delivery” and “At least once delivery” and how that affects what you’re trying to do. (Asynchronous process management is not a trivial topic.)

to be honest, I would go with celery.
be aware that using rabbitmq as its broker will limit you to the usage of some primitives only (for example, no chords, which is a shame… (only with and RPC backed, IIRC))

I read the note you mentioned, and was thinking about whether I could somehow implement a reliability/retriability mechanism myself, even if it’s not provided by the protocol. Might not be a good idea though.

I’ll give you guys the full context so as to better understand the situation I’m working in and what the pros and cons of going with each approach would be for me.

I’m deploying my Django app using dokku. If you are familiar with heroku—it’s the same thing. You provide a Procfile (i.e. a list of commands to run each process, like ./manage.py runserver or runworker), a list of dependencies in the form of a Pipfile, and an appropriate docker container is created for your app.

In my case, I’m not specifying a dockerfile manually because dokku can automatically detect the environment. This works well, but if I add isolate, the executables’ sandbox, I will need some additional depdendencies and therefore might need to switch to a manually created dockerfile. This would happen both in case I use celery and if I end up using channels, because they would all be separate processes in the same codebase, and dokku doesn’t let you use different dockerfiles for different processes inside the same app/codebase.

On the other hand, rabbitmq would have me create a completely separated app for my worker, in which I could write the ad-hoc dockerfile without having to touch anything on the Django app.

That’s basically what currently has me undecided between the two (3?) options.

I’m not familiar with, do not work with anything like that kind of environment. In the environments in which I work, I’m able to build and run “n” docker containers, where each container is a single process (e.g. uwsgi for Django, redis, celery worker, channels worker, celery beat - each one of those is a separate container)

You can, it’s doable. But in all cases, it’s not trivial. You need to answer the fundamental question of “at least one”, “at most one”, or “guaranteed one” background process being executed for any requested task - and just how strong you want that guarantee to be. (90%? 99%? 99.9%? How many 9s? And what has to happen for it to fail? And do you care about a missed task - or replicated task in that case?)

In the practical sense, the risks aren’t really all that great. For what you’ve described, the choice between them probably doesn’t matter enough to worry about it - other than deciding how you want to handle the occasional failure.

When it comes to my use case, the tasks are either compiling code or executing it. As far as the end user is concerned, the tasks are perfectly idempotent. Ideally, each task gets executed exactly once, but it if does get executed more than once, I just waste some resources, but it’s not a big deal (no data loss).

In no case do I want the user to not get the results of its code execution though. I don’t know how much that is in percentage, but let’s say I want my users to get their results back with a 99.99% rate of success. If a task does fail, it needs to be retried until it succeeds.

I liked the idea of using channels because it simplifies getting the results back. When a user sends code to be executed, their frontend would just open a ws connection and they’d be notified of the end of the execution. The other approaches would most likely require the frontend to do some polling to get the results back, wasting time on the user end (unless they poll every 10ms…).

So, for the time being, I decided to go with celery and use the following approach:

  • the code execution happens inside a celery task
  • the frontend app, before submitting the to be run, opens a WS connection to a consumer on my Django app. I’m using a beatiful library called DjangoChannelsRestFw with a type of abstract consumer found in that package that allows to subscribe to a model instance and automatically receive updates when the instance changes.

Therefore, when the code is submitted by the frontend, it subscribes to changes to an instance of a model that has a JSONField containing the execution details. Once the execution is complete and the celery task updates that field, my frontend app should receive a message… except it doesn’t happen.

My setup is correct because if I manually modify the model in any other way except through celery (including if I manually call the task method instead of delaying it), the message is correctly fired.

I have a feeling the reason is the following (quoting this article): “There are some limitations to this approach: DCRF achieves this by observing Django’s signal system. This means it does not detect bulk operations such as queryset.update() . Also it does not observe changes made to the database from outside of our Django codebase.”

I suspect there is something that prevents a call to a model instance’s save method done inside of a celery task from firing the appropriate signals. Is there a way around this?

That’s essentially correct. (Not precisely accurate - the celery task could be firing the appropriate signal - but Django signals only exist within the current process. Django signals are not what many people expect or want them to be. Your celery task is an external process, therefore the signal wouldn’t be available from within your primary Django process.)

Your means of communication from your Celery worker task back to Django is through the Celery Result backends.

1 Like

Understood.

Just to be clear, I also tried using get_channel_layer() inside the task code and then group_send to a group I know there’s a consumer subscribed to. That message didn’t go through either. Is that to be expected as well? For the same reasons as the signal thing?

No, I would find that to be unexpected. It’s normal / common / appropriate for independent processes to communicate to each other through the channel_layer. I do it frequently.

I suspect that might be caused by channels using the InMemoryChannelLayer as its backend.

I’m trying to set up redis as its backend, but now I’m dealing with an independent issue. Every time I try to connect to a consumer, I get:

aioredis.errors.ConnectionClosedError: Reader at end of file

ever had to deal with that?

Yes, that would cause it. The InMemoryChannelLayer is for within a single process only. See the warning in the docs at InMemoryChannelLayer

I have never seen that issue before.

Apprently it was just some misconfigured setting. Channels receives messages from celery just fine now. I think I can just do without the signal-based observer consumer shipped with DjangoChannelsRest and just manually send a message to my custom consumer from celery. Thank you for your help!