Thanks all for your patience. I wanted to review the DEP thoroughly, and have only had time to do so now at DjangoCon Europe. I’m now posting this during Jake’s talk.
I am eager for Django to gain a background task capability. I vote +1, so we can call the DEP properly “accepted” now. But I would like to see several changes to the DEP before we call it final.
1. Clarify the mapping between tasks and backends
Two parts of the DEP imply that tasks only run on a single backend:
- The
Task.backend
attribute: “The name of the backend the task will run on”.
- “The task will be validated against the backend’s
validate_task
during construction.”
Meanwhile, the Task.using()
and Backend.enqueue()
APIs allow running tasks on other backends.
This is a confusing mismatch. If tasks can be mapped to any backend, why validate them against one?
Maybe we can remove validate_task()
and rely on runtime validation. If a synchronous-only backend is asked to enqueue a coroutine, it could raise an error at that point.
This behaviour would be analogous to how models work. Django’s database router feature allows arbitrary mapping of queries to database backends, even though that might fail, for example, because the table doesn’t exist on the destination database.
(Or maybe I am reading the DEP wrong, and validate_task()
is already for runtime validation? If so, let’s clarify the wording.)
2. Remove Backend.enqueue()
from the public API
The “Queueing tasks” section documents Backend.enqueue()
as an alternative way to enqueue a task.
But then it says: “it’s best to call enqueue
on the Task
directly”.
Let’s avoid documenting Backend.enqueue()
, making it private and leaving Task.enqueue()
as the only public API.
Backend.enqueue()
is not accurately type-hint-able, and it’s “another way to do it” when the first way works fine.
3. Remove the backgrounded SMTP backend
I think this idea is too risky to bundle in with the DEP. Let’s leave it as an experiment for third party packages.
Such an email backend would break the APIs of send_mail()
and send_mass_mail()
since they return success or failure information.
Also, whilst opening an SMTP connection is often slow, generating an email requires other backgroundable work, such as loading extra data and rendering body templates. It will be safer to guide users towards the tried-and-tested pattern of creating per-email background tasks.
4. Remove the “deferred tasks” / run_after
feature
This feature is a bit of a “footgun” on Celery (at least with RabbitMQ). It stores tasks in priority order and keeps the “run after” time as metadata. When a worker starts, it pulls messages from the front of the queue. Any tasks due later, from their “run after” time, are kept in memory in a separate list. If you have thousands of “run after” tasks, workers will hold them all in memory until due, even if they are due months into the future. OOM can occur, stopping the task queue without any chance of automatic recovery.
Generally, I don’t think it’s easy to store and sort tasks by both their priorities and “run after” times. I think removing the feature and the risk is better than committing to supporting it along with the rest of the DEP.
Users who need deferred tasks can have a database table of “due later” tasks (or a “due after” time on an existing model) and a background scheduled job to enqueue tasks when ready.
5. Add Backend.check()
method for system checks
Django’s other swappable backends (database, cache) are integrated with the system check framework through a check()
method. Let’s add one to the queue Backend
class, so configuration, such as queue names, can be validated and clear messages returned to users.
6. Redefine Backend.__init__()
The docstring of Backend.__init__()`` says “set up connections”. I don’t think it’s an appropriate place to set up connections, and we should not encourage this pattern. Rather, connections can be created when needed, such as for
enqueue()`, as already done in database and cache backends.
Lazily creating connections will prevent them from being opened long before use and potentially timing out. Also, if we add a check()
method, we will want to initialise all task backends when running system checks.
Suggested alternative __init__()
docstring: “Store configuration. Connections should not be established here, as the backend may not be used immediately or in this process.”.
7. Remove Backend.close()
This API would not be reliable, so I don’t think we should provide it. The sample implementation calls close()
on all task backends at the end of requests in a request_finished
signal receiver. But that still leaves other code paths, like management commands or custom scripts, responsible for calling close()
when appropriate, which will probably be forgotten by users most of the time.
Also, the method is a no-op for the three backends in the proposal.
So, let’s remove close()
. Backends that need to manage connections can handle it themselves, such as by adding a request_finished
receiver or using a connection pool.
8. Cover transaction.on_commit
It’s necessary to use transaction.on_commit()
to enqueue tasks, so they don’t run before the data they need is committed. Automatic use of on_commit()
, or not, was raised several times in the PR discussion. Notably, Alistair Lynn’s comment, section “Task/transaction race condition”.
The DEP still doesn’t address those comments, or even mention on_commit()
. We need some clear thinking here, because the DatabaseBackend
generally won’t need on_commit
, so tasks are committed as part of the current transaction, whilst other backends will need on_commit
.
Also, I’d like to improve the grammar and formatting of the DEP a bit. I opened this PR already for some initial improvements: Tidy DEP 14 a bit by adamchainz · Pull Request #90 · django/deps · GitHub . I plan to open another shortly.