Add an option for tests to always run against clones

Sorry, I’m doing this backwards, because I already have a PR up, but it needs to have a thread here. It’s in some ways a small change that fixes a long-standing bug (ticket #25251), but it adds a command line option, so in that way it is a very big change.

The problem I’m running in to is that --keepdb doesn’t always work. I mean, it never deletes the database, so technically it always works. But it also keeps any changes done to the database during testing, so it’s a bit of a monkey’s paw. What could mutate a database while testing? TransactionTestCases, TestCases that delete (? still digging in to that), SimpleTestCases given database access (really), integration tests, honest mistakes, bugs, and that’s just my last week.

I think the root issue is that ticket #25251 isn’t really a bug: there are two valid and useful interpretations of “keep”. The bug is that Django only implements one (unfortunately, I believe the less useful one). The definition keepdb is using is “to stay or continue in”. It keeps the database where it is. But there’s also “preserve or maintain”. To keep the database what it is. If you have large test databases and a very stable test suite, then you might be willing to trade some reliability for speed and would prefer to keep your test database where it is. In many other situations, it’s better to trade a small bit of speed for the reliability and consistency of keeping the test database what it is.

That’s what I do with the --use-clones option. It’s just two small changes when used: sequential runs also use the parallel code path and clones are always overwritten (keepdb=False). That’s it. Simple, but a large change to the functionality and semantics and not something that can currently be accomplished with the existing flags. And in the case of any errors it does more to keep DB than keepdb.

Does it need to be in Django? I think so. My initial implementation was a custom test runner (includes some details of my specific problems), but it has to copy a chunk of logic out of test.utils.setup_databases, which is a good sign that it isn’t operating at the right layer of abstraction. And I really believe this is something from which a lot of people would benefit. If anything, I think it should be renamed --skip-copies and turned on by default: it has almost all of the performance gains of keepdb with none of the footguns. For beginners, it’s all upside and for users advanced enough to experience the performance downside, it’s an easy change.
(I’m joking, that would be too large of a change in expected behavior alone.)

Another option, given the functional overlap, would be to modify --keepdb to have profiles. So --keepdb and --keepdb fast would use the legacy behavior and --keepdb strict would use the --use-clones behavior. And that would enable options like --keepdb strict-clean, which could clean up the cloned databases after itself (maybe even only clones that had no test failures; not sure how accessible that information is from that layer🤔).

Any thoughts or feedback?

I’ve been working with @florean on this, but haven’t been deep in the code. My understanding of what he’s dealing with is that when you run --keepdb currently, it’s supposed to give you a fresh start each time, but it doesn’t because of #25251 (Inconsistent availability of data from migrations in TestCases when using --keepdb) – Django and similar issues.

What has happened in my project is that --keepdb no longer works. It does in fact keep the DB, but the DB isn’t fresh after the first run, so keeping it is sort of meaningless.

If I understand the solution that @florean is proposing correctly, he adds one more step to the --keepdb startup process, which is to immediately clone the DB after it’s migrated, do the tests on those clone(s), and keep the originally-migrated DB as the fresh, honest-to-goodness database.

My take is that I don’t love it because it is one more layer of abstraction on --keepdb, but I think it honors the intent of the feature, without really slowing things down much. It also fixes a bug from 2015 that seems to otherwise be intractable (or uninteresting?). That’s nothing to scoff at.

I’m also curious to hear other thoughts on this one.

1 Like

Yeah, I’ve run into similar issues with, keepdb too, things behaving unexpectedly between runs because the DB isn’t truly clean. Using clones as the default testing target seems like a good compromise between speed and consistency. It feels more in line with what most devs expect from “keep.” Curious to see how others feel about layering it into, keepdb vs having it as a separate flag.

Thanks for the feedback and support. As I’ve thought about it more, I think --keepdb profiles would be the best approach. No new command line option to learn, very clearly about DBs (use-clones maybe leans that direction, but it definitely isn’t as clear), and I can see there being other database strategies that would be worth adding. I think I’ll draft a separate PR for this and link it, because I think it will add some complexity. But maybe it will simplify it, we’ll see.

I strongly believe this should live in test/utils.py, as that’s where all the cloning decisions and logic happens. I can accomplish most of what I want to do in a custom runner, but as @mlissner rightly noted, it’s a little janky and that quickly accumulates the more I want to do.

Having dealt with an unruly test suite over the last month and taken a deep dive into the test runner internals, here’s my ultimate, dream test runner:

  1. I never have to rebuild my test database other than for new migrations or to test DB creation. My first step when looking at a non-obvious test failure is no longer to rerun it without --keepdb (the Django testing equivalent of “have you tried turning it off and back on again?”)
  2. If all tests pass, the runner picks up after itself and my DB server only has my test db.
  3. If there is a failure, the DB isn’t deleted, but left for forensic analysis.
  4. In addition to the failure message, the runner tells me which DB the failed test ran against.
  5. I get a list of the other tests that ran on that DB.
  6. In chronological order.
  7. And while I’m dreaming, I can give the test runner a test to run multiple times. Every parallel runner runs it, preferably multiple times. Bisection for dependent test failures.

So, my PR deals with 1. I know 2, 3, and 4 are doable. I’m pretty sure 5 is. I have no idea on 6, but I’m not hopeful without more changes to the runner and TestCases than people would be comfortable with.

As 4-7 are completely separate from my PR and original, narrower topic, should I move them to a new thread or discuss them here? Still new to this place, any help appreciated.

Thanks!

I haven’t looked at the merit of this particular solution, but I can weigh in that a solution to 25251 is worth it and certainly not “not interesting”. Just look at this monstrosity to see how I’m forced to deal with it now: bugsink/bugsink/test_utils.py at 5bdfe89a6aeea7cdc6c04bba29521cf68269eec0 · bugsink/bugsink · GitHub