Hi there
,
I hopped on the bandwagon pretty quickly after the django.tasks release, as common interface would make my life as a 3rd-party maintainer simpler. I found out about it pretty late, but I was so grateful that someone (Jake) mustered the energy for such a massive proposal.
However, even though I am working on multiple commercial Django 6.0 projects, not a single one has adopted tasks in half a year. Why?
So, I did what any sane person does and tried building the tools I was missing. First was django-crontask. I was already maintaining its Dramatiq sister project for years—easy transformation, right?
No, that’s when I discovered Django has introduced dataclasses. I love dataclasses, but since they have special metaclasses, inheritance is tricky. For an task framework aimed to be extended by the community, this felt like an odd choice. The feeling was amplified by the fact the dataclasses are frozen. I have yet to uncover why the dataclasses are frozen. Especially since they are not immutable like a namedtuple, and freezing comes at a known performance disadvantage: dataclasses — Data Classes — Python 3.14.4 documentation
Shifting to the TaskResult it’s even stranger. It’s a frozen (emulated immutable) dataclass, but we treat it as mutable. When you call TaskResult.refresh it updates the object in place (using object.__setattr__) instead of returning a new immutable object. There was a review comment about this but to my knowledge it was sadly never addressed. It’s especially odd to me, since Python has a native dataclasses.replace function, which correctly returns new dataclass instances and would have greatly simplified the Django implementation.
Now, let’s chat about efficiency for a minute. For Django’s tasks framework to be a good base, it must be versitile and enable performance. It should be able to handle a few tasks a day, just as good as a couple million per second. Luckily Django 6.1 will have Picklable tasks to support multiprocessing, nice!
Still, I’d like to propose a few more changes to improve both versitility as well as performance:
- Unfreeze
Task: Dataclasses are a good choice here. Tasks are instanciated during module loading as quasi singletons. People can better use inheritance or even do in-place updates with decorators. Attribute read performance could be improved by slotting them. - Use
typing.NamedTupleforTaskResultandTaskContext: You may hold millions of those objects in memory if you trigger map-reduce tasks (or other bulk operations). Dataclasses are objects (with a__dict__or__slots__), whereas named named tuples are C structs. They use less memory both during runtime or in transport while piping them to different processes. They are also actually immutable, so no more in place updates. - Lazily reference
TaskResult.task( viaimport_stringand a property): Currently every task result holds a reference to a task instance. This can lead to task instances (quasi singletons) being copied. This can create unwanted memory bandwidth and allocation overhead. - Make task results comparable: Tasks have a priority (wonderful!), but don’t implement a comparison method they would need for Python’s native
PriorityQueue. I would suggest a default priority LIFO order.
This is a difficult topic and a difficult read. I am genuinely impressed by the work that has been done. But being so complex, I believe it will take multiple iterations to reach a robust framework that best serves the community.
Best!
Joe