Steering Council vote on Background Tasks DEP 14

Jake Howard (@theorangeone) has put together a draft DEP to add Background Workers and Tasks to Django.

This would allow, for example, sending emails outside of the request-response cycle, and is something people have long had to reach for Celery and the like in order to do.

The PR for the DEP is here:

There’s been significant discussion, and it really needs to progress to the required Steering Council vote now in order to move forward.

Jake has gone well beyond the call of duty, and has a WIP reference implementation nearly ready in this repo as well:

The core interfaces for this should be complete in time for DjangoCon Europe, in the first week of June, three weeks from now. That puts us well into implementation ground, hence the need to push this to a vote now.

To phrase it as a Yes/No question for the vote:

Shall the Django project accept and begin implementation of the Background Tasks DEP 14?

Can I ask the @steering_council members to review the draft PR and vote accordingly. The rendered version of the RST file can be found here.
I’m absolutely sure that Jake would be happy to address any comments or questions that SC members may have.

Thanks.

Kind regards,
Carlton.

10 Likes

I am Jake, and I endorse this message!

There are some contact details at the top of the DEP PR, or I’m happy to accept comments here. I’ll also be at DjangoCon EU in a few weeks if you’d prefer a face-to-face chat.

5 Likes

I would be an absolute fan. So far @theorangeone 's code looks very similar to what can be achieved using django-q2
I’ve used celery and redis in the past for high volume background tasks, but more and more we see demand for small, sometimes long-running tasks.

Especially in projects with a small user base, but demanding tasks, the administration and installation overhead of celery and redis is substantial. django-q2 facilitates the use of the default ORM and allowed us to implement these long running tasks.

one thing in the discussion that should not be forgotten is ‘locking’ against double submission of the same task. (for example, we have tasks that may run for hours/days with high core count and memory usage). We currently implemented this as part our task code, but its not a standard option in django-q2, celery or other background task framework afaik

Hi all - this looks great, and I really appreciate the work done here already.

One question I have is whether you have considered including task metadata as part of this? I understand the need to limit features for the initial release, but it would be good to know if this has been considered or is part of the roadmap.

The justification for including task metadata is that it opens up the possibilities of this functionality by a huge margin, especially when dealing with long-running tasks. The key use case that I have used many times is that if a background worker can update the metadata of the task during the processing, then it can write status updates including which task is running or the progress of the task. This metadata can then be accessed by another process to display ongoing progress or progress bar.

A working example can be seen at https://djcheckup.com/ with the progress bar that is displayed while a site is being checked. This uses the Django-RQ library to handle background tasks, which itself uses RQ.

A basic implementation of this would require a meta field on the task which would store a dictionary, and then methods to get_meta and save_meta.

Hope this all makes sense. Thanks again.

Locking sounds like a good idea, although I think falls firmly in the “Nice to have at a later date” category.

The same is true for metadata. It’s possible to emulate that yourself with the current implementation using a separate data store (eg ORM model) to store the metadata state. In future, this could be added, although adding an extra interface like this for underlying libraries which don’t support it could be challenging. The current API is intentionally simple to aid adoption - I don’t have a great plan yet for how to handle larger and more complex features which are only implemented by some backends.

1 Like

Thanks for bringing this to a formal vote Carlton. I will take the time to read everything and reply later with my vote.

1 Like

Thanks Adam. Take the time you need! (I just panicked thinking the vote only had a week, but reviewed DEP10, and that time rolls over until a majority have voted, and there’s a clear result, so there’s not a rush.)

Having read through the DEP I think it’s a good balance of a standards boundary and flexibility - while there are a few things that I think need clarifying (e.g. being explicit that priority is a number where higher means more important), they’re all minor details that can be established during the implementation and documentation phase - thus I vote +1.

3 Likes

I vote +1.

This seems like a good pragmatic proposal. I like that it’s “opt-in” courtesy of the default backend not requiring any external infrastructure, and I think the interface is going to be familiar and convenient to most people who’ve set up task runners like Celery before.

2 Likes

Thanks for the response. And appreciate the focus on simplicity, especially for the initial release. Keen to get this into core.

Finally found time to read through the DEP and the proposed default implementation. It was a well formed and detailed read. Thanks to everyone involved in writing it down.

While I do agree with the idea of having an interfacing layer to allow tasks backends to plugin would be a great additions to the framework and a dearly lacking piece of functionality to Django I do worry about a few points.

First this ought to be something that framework like Celery and RQ plug into but I don’t see any involvement from members of the community of these background tasks library in the PR comments. I could be missing something here but wouldn’t be worth getting some of their feedback as well? Maybe this can be deferred to the implementation time.

Secondly I see that the default implementation greatly depends on the usage of Python types which wouldn’t be a problem in itself if it wasn’t that Django took a decision a while ago not to make use of type annotations. I could see us revert this decision now that the landscape has greatly changed but that’s something that should not be ignored.

Lastly I have some doubts about our ability to maintain a production grade database backend for managing task particularly around semantics like at-least/at-most once delivery, contention, ETA, large results, signal handling, and tasks timeouts. As an example, the database powered cache backend comes with a few disclaimers and it is arguably a simpler piece of software.

The idea of having to maintain this task backend so it works on all the supported database backends is also a non-negligible constraint. Building what would qualify as a production grade queuing system only for Postgres which supports features such as SKIP LOCKED is far from trivial. I can’t imagine what sort of hacks we’ll have to resort to in order to get working things working on MySQL and Oracle through the ORM given we were never able to achieve it with the cache backend over the past decade.

With all that said I think the proposal has merit and even if we just end up exposing a pattern for declaring background tasks, scheduling them, and retrieving results it would be a huge win for Django so I’m also voting +1. I do think that there a quite a few major questions still awaiting an answer during the implementation phase though.

2 Likes

Great points, well made.

At the moment, this is sadly true, although I expect this is more from a “lack of awareness” as opposed to a “lack of interest”. I would be very interested in their feedback, and my hope is my upcoming Djangocon talk will help with awareness. From discussions in the DEP comments, we’ve tried to make something compatible with other libraries, but opinions on the integration points and featureset is always welcome. Ensuring there’s buy-in from existing library maintainers is IMO a requirement for this DEP’s success.

Types were included in the DEP for exactly 1 reason: To help explain what’s going on. This is also true for the default implementation, however here we’re also ensuring that the API is type safe and usable. I fully expect to remove all type signatures in the upstreamed version. Having a type-safe API makes supporting it easier for the likes of django-stubs, especially as I plan to keep the external implementation with types.

This is absolutely a risk - arguably the biggest risk of the entire DEP. What we have at the moment is a relatively simple API contract, and 2 simpler-still backends. A database implementation is going to add complexity, however perhaps this complexity is worth it. I would expect though that the implementation is gradual - starting out with something simple, and then adding the more complex features as time goes on. With that should come more eyes, more chances to test, and likely more interest from the community. And even once this merges, I don’t plan on going anywhere.

Regarding SKIP LOCKED and alike - my plan would be to use the existing ORM primitives before needing to delve too far into engine-specific implementations. APIs like select_for_update will likely get us a long way, and builds on the existing well-trodden path of the ORM.

As I understand the DEP 10 voting procedure, a voting period would have ended yesterday, with a +3 score and only one vote outstanding, which wouldn’t affect the result. As such, I think that means that the DEP is Accepted and can move to the implementation phase.

Thanks all for your involvement.

:tada:

7 Likes

This is not actually correct, and the voting is still open – a score of 3 with votes outstanding can’t go final because a -1 vote would make the final score 2 (No Action). The voting period just keeps getting extended until we receive the last vote.

Ah, interesting. Since the SC currently only has four members I’d understood the 1 score to represent No Action, since otherwise it gives a single vote effective veto, which doesn’t seem the intention.

All down to @adamchainz then.

DEP 10 largely assumed that there’d always be five actively-engaged members voting. When any part of that assumption is violated, some weird things can happen. And yes, at the moment with a four-member group effectively each member has a veto because a single -1 vote ensures the votes of the other three cannot reach the threshold for acceptance.

@ubernostrum Yes, it’s certainly written that way (assuming the 5). Given that doesn’t hold, we interpret what’s left. Two clear available readings: Either a -1 becomes a veto, or Accepted &co get realigned. I think the latter is a more helpful reading, but hopefully that’s moot here. :crossed_fingers:

Could I perhaps ask you to invoke the "The Technical Board MAY fill a temporary or permanent vacancy on the Technical Board. " clauses? There’s lots of young fresh members of the community that would do a great job in a SC role. :heart:

1 Like

The voting process specification in DEP 10 does not allow the thresholds to be changed – the thresholds are always the exact numeric values given in DEP 10.

So this DEP’s voting is still open and will be until all the votes are in.

Personally, I would love to see a group of fresh new faces take over, but the track record of suggesting that is not great and I do not personally have the energy to deal with the amount of being yelled at that would come with trying to take it up again. Someone else can if they really want to, or we can just try to coast along to 5.2 and the next mandatory election trigger.

1 Like

Hey James. Yes, I see how you want to interpret it. I don’t agree (equally clear) but let’s not go down that hole now. It’s not fruitful. Hopefully Adam can add his vote soon.

Thanks all for your patience. I wanted to review the DEP thoroughly, and have only had time to do so now at DjangoCon Europe. I’m now posting this during Jake’s talk.

I am eager for Django to gain a background task capability. I vote +1, so we can call the DEP properly “accepted” now. But I would like to see several changes to the DEP before we call it final.

1. Clarify the mapping between tasks and backends

Two parts of the DEP imply that tasks only run on a single backend:

  1. The Task.backend attribute: “The name of the backend the task will run on”.
  2. “The task will be validated against the backend’s validate_task during construction.”

Meanwhile, the Task.using() and Backend.enqueue() APIs allow running tasks on other backends.

This is a confusing mismatch. If tasks can be mapped to any backend, why validate them against one?

Maybe we can remove validate_task() and rely on runtime validation. If a synchronous-only backend is asked to enqueue a coroutine, it could raise an error at that point.

This behaviour would be analogous to how models work. Django’s database router feature allows arbitrary mapping of queries to database backends, even though that might fail, for example, because the table doesn’t exist on the destination database.

(Or maybe I am reading the DEP wrong, and validate_task() is already for runtime validation? If so, let’s clarify the wording.)

2. Remove Backend.enqueue() from the public API

The “Queueing tasks” section documents Backend.enqueue() as an alternative way to enqueue a task.
But then it says: “it’s best to call enqueue on the Task directly”.

Let’s avoid documenting Backend.enqueue(), making it private and leaving Task.enqueue() as the only public API.
Backend.enqueue() is not accurately type-hint-able, and it’s “another way to do it” when the first way works fine.

3. Remove the backgrounded SMTP backend

I think this idea is too risky to bundle in with the DEP. Let’s leave it as an experiment for third party packages.

Such an email backend would break the APIs of send_mail() and send_mass_mail() since they return success or failure information.

Also, whilst opening an SMTP connection is often slow, generating an email requires other backgroundable work, such as loading extra data and rendering body templates. It will be safer to guide users towards the tried-and-tested pattern of creating per-email background tasks.

4. Remove the “deferred tasks” / run_after feature

This feature is a bit of a “footgun” on Celery (at least with RabbitMQ). It stores tasks in priority order and keeps the “run after” time as metadata. When a worker starts, it pulls messages from the front of the queue. Any tasks due later, from their “run after” time, are kept in memory in a separate list. If you have thousands of “run after” tasks, workers will hold them all in memory until due, even if they are due months into the future. OOM can occur, stopping the task queue without any chance of automatic recovery.

Generally, I don’t think it’s easy to store and sort tasks by both their priorities and “run after” times. I think removing the feature and the risk is better than committing to supporting it along with the rest of the DEP.

Users who need deferred tasks can have a database table of “due later” tasks (or a “due after” time on an existing model) and a background scheduled job to enqueue tasks when ready.

5. Add Backend.check() method for system checks

Django’s other swappable backends (database, cache) are integrated with the system check framework through a check() method. Let’s add one to the queue Backend class, so configuration, such as queue names, can be validated and clear messages returned to users.

6. Redefine Backend.__init__()

The docstring of Backend.__init__()`` says “set up connections”. I don’t think it’s an appropriate place to set up connections, and we should not encourage this pattern. Rather, connections can be created when needed, such as for enqueue()`, as already done in database and cache backends.

Lazily creating connections will prevent them from being opened long before use and potentially timing out. Also, if we add a check() method, we will want to initialise all task backends when running system checks.

Suggested alternative __init__() docstring: “Store configuration. Connections should not be established here, as the backend may not be used immediately or in this process.”.

7. Remove Backend.close()

This API would not be reliable, so I don’t think we should provide it. The sample implementation calls close() on all task backends at the end of requests in a request_finished signal receiver. But that still leaves other code paths, like management commands or custom scripts, responsible for calling close() when appropriate, which will probably be forgotten by users most of the time.

Also, the method is a no-op for the three backends in the proposal.

So, let’s remove close(). Backends that need to manage connections can handle it themselves, such as by adding a request_finished receiver or using a connection pool.

8. Cover transaction.on_commit

It’s necessary to use transaction.on_commit() to enqueue tasks, so they don’t run before the data they need is committed. Automatic use of on_commit(), or not, was raised several times in the PR discussion. Notably, Alistair Lynn’s comment, section “Task/transaction race condition”.

The DEP still doesn’t address those comments, or even mention on_commit(). We need some clear thinking here, because the DatabaseBackend generally won’t need on_commit, so tasks are committed as part of the current transaction, whilst other backends will need on_commit.


Also, I’d like to improve the grammar and formatting of the DEP a bit. I opened this PR already for some initial improvements: Tidy DEP 14 a bit by adamchainz · Pull Request #90 · django/deps · GitHub . I plan to open another shortly.

5 Likes