Our current practice around crashing errors–I take this to mean unhandled/undocumented Python exceptions in a supported workflow–is that if they’re regressions, we handle them under the regressions bucket of the backport policy, which subjects them to a how-recently-was-it-caused qualification, rather than the “crashing errors” bucket, which doesn’t.
Crashing errors are always either regressions or bugs in new features. The point of listing them separately from those two buckets, I take it, would be to clarify that a clock doesn’t apply. But our practice has been to apply the clock anyway.
A recent example is ticket-35596. I think this qualifies for a backport to the current stable/mainstream support version (5.1), since whether it was a regression in any certain Django version is orthogonal to whether a Python exception escapes.
Is that right?
PS – I’m more interested in the priniciple (and updating docs if necessary) and less in any particular ticket, including the above one.
I sympathize with your feeling here, and I’m not sure what the right answer is. We have to weigh the needs of the reviewers and fellows against what would be ideal for users. I think it might be helpful if we filled in some of specific answers to help us understand the costs associated with a policy that aligned better with your intuition.
How far back should reviewers or fellows be required to go test we just call a regression a bug fix since it’s been around long enough anyway?
Is there a bright line rule that’s appropriate for that boundary, if there is one?
Or if there’s an enterprising contributor willing to go spelunking enough into running old code to find a place where it works (or if it is easy to figure out where exactly the bug was introduced) does it just not matter how old the regression is, and maybe the rule only applies to reviewers and fellows as a default for when things are hard to determine?
It’s worth emphasising that the backport policy is for users, rather than reviewers and fellows. Folks have stable deployments, and every backport introduces the risk of regressions in those deployments. (This isn’t theoretical. In my time as fellow, we would have reports of breakages almost every time we backported anything.)
I think the policy is fine as it is. It’s stood the test of time well, and strikes a good balance. There are regularly cases where one wants to backport, and those are frustrating when one can’t, but we benefit from Django’s stability guarantees far more than this annoyance, I would/do maintain.
It’s always possible to make the case for a backport in the particular instance. I glanced at the particular issue. I wasn’t immediately sure. I’d usually defer to the fellows judgment in this kind of case.
Personally, I believe the change in question should not be backported, for the same reasons outlined by Carlton. The proposed change modifies production code to resolve an issue with test runs. I feel the potential risk of breaking production systems outweighs the benefit of addressing an exception in a test run when using pytest.
More generally, I think we need a clearer policy on what qualifies as a “crashing bug,” as well as the scope and limitations of such issues. This would help me, as a Fellow, make more informed decisions. Is this something that would fall under the purview of the @steering_council?
I’m happy to add discussing the policy to our todo list.
In the specific case of #36056, my question would be can the exception cause a problem in a live system, or is it just a bit noisy? If the former, I think we should backport, otherwise I think we shouldn’t.
I always thought “crashing bug” was quite clear: something that causes an unhandled error in handling actual web requests. (Contrast to a bug in test code, which clearly doesn’t.) It’s worth distinguishing single request vs do you crash the process? (Think of an error where you need to restart the development server, that you probably see all the time. That would clearly need a backport if it made it to a release.)
“Data loss” (the other one) seems clear enough too.
Thanks @Lily-Foote, the answer is that l#36056 does not crash a live system, it’s only a bit noisy when using pytest on a test suite that exercises a management command. So I agree that this does not qualify for a backport.
Thanks all for the input. In the end I agree it may not be worth the risk if the only effect is in tests. Theoretically there’s a path to affecting production systems too, given one can swap stdout/stderr when running a management command, but we can reconsider the decision if anyone reports it.
For those experiencing the issue on Django 5.1 or earlier, I wrote a blog post covering a way to silence the error.
I always thought “crashing bug” was quite clear: something that causes an unhandled error in handling actual web requests. (Contrast to a bug in test code, which clearly doesn’t.)
But management command crashes are valid too, right? If a crash prevented users from running legitimate tests, we’d backport that, right? I think some case-by-case analysis is always needed.