The short summary of this is that I’m asking for a vote to allow a backport of the fix from ticket 34063 into all currently-supported feature releases of Django: Django 4.1, Django 4.0, and Django 3.2.
Full context, my reasoning for wanting this, and the exact issue to be voted on, follow below.
Why does Django have a backport policy? It would, after all, be simpler from a maintenance perspective to just not fix bugs in older feature releases of Django, and instead require everyone to always upgrade to the latest feature release if they also want the latest bugfixes. Yet Django has a backport policy, and according to that policy certain types of fixes are backported into older feature releases. Doing this has many advantages, not least of which is that it promotes the image of Django as a stable, reliable framework that developers can use with confidence.
Yet that policy does not and cannot cover every conceivable case. Here we have a case that the policy did not foresee: a bug of such severity that nobody – as far as I am aware – disputes it would have been a release blocker if only it had been discovered early enough. It was not discovered earlier, though, and now it just misses on technicalities of several of the criteria in the backport policy. For example: it is a major functionality bug, but it is no longer a bug in a new feature of the latest stable release, and it causes a crash, but of your test suite rather than of Django itself.
When I asked on the django-developers list about making an exception to backport the fix for ticket 34063, though, I received a reply that cited the strict letter of the policy and seemed to admit no possibility of exceptions ever being granted.
Yet historically, exceptions have been granted from time to time. And not only have been granted, but have even led to changes in the backport policy. For example, once upon a time the policy did not allow for backporting fixes for regressions. That did not stop ticket 25548 from being backported; notably, in that case the backport policy was retroactively updated to cover the case of backporting a regression.
I continue to believe that a similar exception ought to be granted in the case of ticket 34063, and that after it has been backported, the text of the backport policy should be revisited to ensure that a case like this one is not excluded in the future.
The actual bug appears to have been introduced in Django 3.1, at exactly the same time that async view support, async middleware support, and async test support were initially added to Django.
The issue specifically is in
django.test.AsyncClient and perhaps also in the associated
django.test.AsyncRequestFactory (though I have not personally verified the bug in the case of the latter on its own): if any async view or async middleware, invoked as a result of a request from
AsyncClient, happens to access
request.POST, it will raise an exception with the message:
"AssertionError: Cannot read more than the available bytes from the HTTP incoming data."
The fix was quite small – only a few actual lines of code changed in
django/test/client.py. The majority of the patch was retroactively adding basic functional tests that ought to have been included in the original addition of async support, and which would have caught this bug had they been included at the time.
The bug exists in all currently-supported releases of Django: 3.2, 4.0, and 4.1.
The bug is easy to trigger: if you use the built-in async test client as documented, and any async view or middleware invoked as a result accesses
request.POST, you will get a crash with the above-mentioned error message.
The bug is hard to avoid: due to the way Django supports running both sync and async code side-by-side, there are few reliable ways to ensure async code paths are exercised in a test. Using the async test client is the most obvious way to do this and, as far as I am aware, the only officially reliable way.
The bug has a workaround: any view or middleware which wants to access
request.POST in an async code path can first execute a throwaway acccess of
request.body, after which reading
request.POST will behave as expected.
But the workaround has several problems:
It is difficult to discover. I only found it after about a day of off-and-on debugging and research into a failing test in some of my own code. Search engines were not particularly helpful, even when given the full exact error message.
It is burdensome. All it takes to trip the bug is one async code path reading
request.POSTwithout first touching
request.body, which means a developer needs to hunt for and find every such path in their code.
It is not completely effective. Because real-world use of Django often involves use of views and middlewares from Django itself, from the
django.contribapplications, and from third-party applications, it simply may not be the case that all code paths are under the developer’s control and able to have the
request.bodyworkaround applied. Attempting to implement an “access
request.bodymiddleware” to universally apply the workaround is also unreliable, since both middlewares and some request attributes are sensitive to ordering issues; it is not too difficult to wind up in a situation where no order of middlewares/accesses avoids all potential problems.
Furthermore, if no fix for the bug is to be backported due to strict interpretation of the backport policy, then a similarly-strict interpretation would forbid adding any type of helpful note to the documentation for Django 3.2, 4.0, and 4.1, thus making it even more difficult for developers using Django to discover the cause of the bug or its workaround.
As noted above, Django’s backport policy plays an important role in promoting Django’s image as a stable, reliable framework that can be used with confidence. Developers who use Django know that the project is responsive to bug reports, and that each feature release has a documented support period in which it can receive fixes for major bugs without forcing the developer onto a fast “upgrade treadmill” of feature releases.
Django is also going through an important period in its development history: the addition of support for async Python. There are numerous competing frameworks which already heavily advertise their full-async stacks or support, and which are growing in popularity on a daily basis. Django must keep up if it is to continue thriving.
Yet at the moment, using Django’s own documented async support, as documented, is significantly more difficult than it ought to be, since the primary documented mechanism for testing async code – the
AsyncClient – has a bug of such severity that it would have been an obvious release blocker if found earlier, with a workaround that is both difficult to discover and difficult/burdensome (in some codebases, likely impossible) to apply effectively.
If the fix for
AsyncClient is not backported into older feature releases, async-curious developers will be faced with a conundrum: async Django view and middleware code, despite being officially supported by the framework for some two and a half years at this point, will be effectively untestable (which, in many projects, is synonymous with “unusable”) for at least another four months, since it will not be until Django 4.2, in April 2023, that the Django project finally issues a release with a non-broken
The situation will be even worse for maintainers of third-party applications and libraries: many of them adopt a policy of supporting whatever Django versions continue to receive upstream support from the Django project itself. This would push the earliest date of reliable
AsyncClient availability out into the second quarter of 2024, which is when all currently-supported Django feature releases will finally drop out of support.
It was suggested in the original django-developers thread that maintainers of third-party libraries/applications should pre-emptively cut off Django versions prior to 4.2 if they want to provide async support. As a maintainer of several third-party Django applications, I would find that unacceptable – a major plus of the Django ecosystem is the way that maintainers tend to sync up their support cycles with Django itself whenever possible. Not to mention that I and everyone else who uses Django should be able to use and rely on documented APIs and features of Django as of the releases they are documented in.
It also has been suggested both in the django-developers thread, and on IRC, that async support in Django should be treated as more of a work in progress with lessened expectations of reliability, and/or that bugs in async support perhaps would not or should not be eligible for backporting even in cases when strict reading of the policy would support a backport, due to async support being in such a state of flux. This fails for multiple reasons:
It runs against the idea that Django is and should be a reliable framework – an idea that the backport policy is supposed to support and promote!
The documentation states that async support “for the ORM and other parts of Django” is still being worked on. But for the parts of async that have already landed, the reader is given no hint that these things might have been shipped and then knowingly left in a badly broken state. If this is such unstable code that it cannot be trusted and bugs – including some functionality just being outright broken – are to just be expected, then why are there not loud warnings about it in the documentation?
The existing async support – APIs, tools, and so on – is documented in a way which clearly places it under the Django API stability and backporting policies. Either those policies apply as written, or they do not. If they do, then the framework owes the support and stability guaranteed by those policies. If they do not, then the policies cannot be used as strict letter-of-the-law rules which admit of no exceptions.
In fact, it seems to me that if policies are to be relaxed on the stability and reliability of async in Django, then that relaxation must also affect the backport policy, and allow for these kinds of severe but late discovered bugs, when they crop up, to be backported to all supported releases containing the bugged feature.
Finally: leaving a bug of this severity, in a part of Django that has been hyped for the past several releases, deliberately and knowingly unfixed in all the currently “supported” Django releases, would be a significant breach of the trust and goodwill Django enjoys from its community and ecosystem. I have repeatedly emphasized Django’s reputation for reliability and stability. Such reputations take years to build, but can be lost in an instant.
The maintenance burden of backporting the
AsyncClient fix appears to me to be minimal; as noted above, only a handful of actual lines of code changed in
django/test/client.py. The impact of not backporting it seems large: it acts as something like a “missing stair” for async Django in that the bug is of high severity and easy to to trip over once you begin writing and testing async code, but difficult to find out about before you do trip over it (and can be difficult to find a solution for afterward). This will hinder and fragment adoption of async Python in the Django community and ecosystem in exchange for no clear gain that I can see on maintainability of Django itself, and will also hurt the reputation Django currently enjoys as a reliable and stable framework with good support policies.
In light of which, I strongly urge that you override the objections from the django-developers thread. I put before you the following:
Shall it be directed:
- That backport of ticket 34063 SHALL be allowed into each of Django 3.2, Django 4.0, and Django 4.1,
- That in light of such permission, the Mergers MUST NOT use the backport policy as a reason to refuse merging of such a backport,
- That the Releasers SHALL issue new bugfix releases of Django 3.2, Django 4.0, and Django 4.1, containing the backported fix of ticket 34063,
- That after the backport is complete and merged into each of Django 3.2, Django 4.0, and Django 4.1, the backport policy SHALL be revisited, discussed, and revised as needed to cover cases such as the case of ticket 34063, in which backporting is worthwhile but in which backporting is not presently, by a strict reading of the policy, permitted.