Steering Council vote on Background Tasks DEP 14

Jake Howard (@theorangeone) has put together a draft DEP to add Background Workers and Tasks to Django.

This would allow, for example, sending emails outside of the request-response cycle, and is something people have long had to reach for Celery and the like in order to do.

The PR for the DEP is here:

There’s been significant discussion, and it really needs to progress to the required Steering Council vote now in order to move forward.

Jake has gone well beyond the call of duty, and has a WIP reference implementation nearly ready in this repo as well:

The core interfaces for this should be complete in time for DjangoCon Europe, in the first week of June, three weeks from now. That puts us well into implementation ground, hence the need to push this to a vote now.

To phrase it as a Yes/No question for the vote:

Shall the Django project accept and begin implementation of the Background Tasks DEP 14?

Can I ask the @steering_council members to review the draft PR and vote accordingly. The rendered version of the RST file can be found here.
I’m absolutely sure that Jake would be happy to address any comments or questions that SC members may have.

Thanks.

Kind regards,
Carlton.

10 Likes

I am Jake, and I endorse this message!

There are some contact details at the top of the DEP PR, or I’m happy to accept comments here. I’ll also be at DjangoCon EU in a few weeks if you’d prefer a face-to-face chat.

5 Likes

I would be an absolute fan. So far @theorangeone 's code looks very similar to what can be achieved using django-q2
I’ve used celery and redis in the past for high volume background tasks, but more and more we see demand for small, sometimes long-running tasks.

Especially in projects with a small user base, but demanding tasks, the administration and installation overhead of celery and redis is substantial. django-q2 facilitates the use of the default ORM and allowed us to implement these long running tasks.

one thing in the discussion that should not be forgotten is ‘locking’ against double submission of the same task. (for example, we have tasks that may run for hours/days with high core count and memory usage). We currently implemented this as part our task code, but its not a standard option in django-q2, celery or other background task framework afaik

Hi all - this looks great, and I really appreciate the work done here already.

One question I have is whether you have considered including task metadata as part of this? I understand the need to limit features for the initial release, but it would be good to know if this has been considered or is part of the roadmap.

The justification for including task metadata is that it opens up the possibilities of this functionality by a huge margin, especially when dealing with long-running tasks. The key use case that I have used many times is that if a background worker can update the metadata of the task during the processing, then it can write status updates including which task is running or the progress of the task. This metadata can then be accessed by another process to display ongoing progress or progress bar.

A working example can be seen at https://djcheckup.com/ with the progress bar that is displayed while a site is being checked. This uses the Django-RQ library to handle background tasks, which itself uses RQ.

A basic implementation of this would require a meta field on the task which would store a dictionary, and then methods to get_meta and save_meta.

Hope this all makes sense. Thanks again.

Locking sounds like a good idea, although I think falls firmly in the “Nice to have at a later date” category.

The same is true for metadata. It’s possible to emulate that yourself with the current implementation using a separate data store (eg ORM model) to store the metadata state. In future, this could be added, although adding an extra interface like this for underlying libraries which don’t support it could be challenging. The current API is intentionally simple to aid adoption - I don’t have a great plan yet for how to handle larger and more complex features which are only implemented by some backends.

1 Like

Thanks for bringing this to a formal vote Carlton. I will take the time to read everything and reply later with my vote.

1 Like

Thanks Adam. Take the time you need! (I just panicked thinking the vote only had a week, but reviewed DEP10, and that time rolls over until a majority have voted, and there’s a clear result, so there’s not a rush.)

Having read through the DEP I think it’s a good balance of a standards boundary and flexibility - while there are a few things that I think need clarifying (e.g. being explicit that priority is a number where higher means more important), they’re all minor details that can be established during the implementation and documentation phase - thus I vote +1.

3 Likes

I vote +1.

This seems like a good pragmatic proposal. I like that it’s “opt-in” courtesy of the default backend not requiring any external infrastructure, and I think the interface is going to be familiar and convenient to most people who’ve set up task runners like Celery before.

2 Likes

Thanks for the response. And appreciate the focus on simplicity, especially for the initial release. Keen to get this into core.

Finally found time to read through the DEP and the proposed default implementation. It was a well formed and detailed read. Thanks to everyone involved in writing it down.

While I do agree with the idea of having an interfacing layer to allow tasks backends to plugin would be a great additions to the framework and a dearly lacking piece of functionality to Django I do worry about a few points.

First this ought to be something that framework like Celery and RQ plug into but I don’t see any involvement from members of the community of these background tasks library in the PR comments. I could be missing something here but wouldn’t be worth getting some of their feedback as well? Maybe this can be deferred to the implementation time.

Secondly I see that the default implementation greatly depends on the usage of Python types which wouldn’t be a problem in itself if it wasn’t that Django took a decision a while ago not to make use of type annotations. I could see us revert this decision now that the landscape has greatly changed but that’s something that should not be ignored.

Lastly I have some doubts about our ability to maintain a production grade database backend for managing task particularly around semantics like at-least/at-most once delivery, contention, ETA, large results, signal handling, and tasks timeouts. As an example, the database powered cache backend comes with a few disclaimers and it is arguably a simpler piece of software.

The idea of having to maintain this task backend so it works on all the supported database backends is also a non-negligible constraint. Building what would qualify as a production grade queuing system only for Postgres which supports features such as SKIP LOCKED is far from trivial. I can’t imagine what sort of hacks we’ll have to resort to in order to get working things working on MySQL and Oracle through the ORM given we were never able to achieve it with the cache backend over the past decade.

With all that said I think the proposal has merit and even if we just end up exposing a pattern for declaring background tasks, scheduling them, and retrieving results it would be a huge win for Django so I’m also voting +1. I do think that there a quite a few major questions still awaiting an answer during the implementation phase though.

2 Likes

Great points, well made.

At the moment, this is sadly true, although I expect this is more from a “lack of awareness” as opposed to a “lack of interest”. I would be very interested in their feedback, and my hope is my upcoming Djangocon talk will help with awareness. From discussions in the DEP comments, we’ve tried to make something compatible with other libraries, but opinions on the integration points and featureset is always welcome. Ensuring there’s buy-in from existing library maintainers is IMO a requirement for this DEP’s success.

Types were included in the DEP for exactly 1 reason: To help explain what’s going on. This is also true for the default implementation, however here we’re also ensuring that the API is type safe and usable. I fully expect to remove all type signatures in the upstreamed version. Having a type-safe API makes supporting it easier for the likes of django-stubs, especially as I plan to keep the external implementation with types.

This is absolutely a risk - arguably the biggest risk of the entire DEP. What we have at the moment is a relatively simple API contract, and 2 simpler-still backends. A database implementation is going to add complexity, however perhaps this complexity is worth it. I would expect though that the implementation is gradual - starting out with something simple, and then adding the more complex features as time goes on. With that should come more eyes, more chances to test, and likely more interest from the community. And even once this merges, I don’t plan on going anywhere.

Regarding SKIP LOCKED and alike - my plan would be to use the existing ORM primitives before needing to delve too far into engine-specific implementations. APIs like select_for_update will likely get us a long way, and builds on the existing well-trodden path of the ORM.