[GSOC 2020] Parallel Test Runner

p.s. I’ve just run the test suite on Django master on Windows 10 with Postgres 12.

======================================================================
FAIL: test_order_by_relational_field_through_model (m2m_through.tests.M2mThroughTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\smith\PycharmProjects\django2\tests\m2m_through\tests.py", line 247, in test_order_by_relational_field_through_model
    [self.jim, self.bob]
AssertionError: Sequences differ: <QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]> != [<Person: Person object (2)>, <Person: Person
 object (1)>]

First differing element 0:
<Person: Person object (1)>
<Person: Person object (2)>

- <QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]>
? ----------                         ^                            ^   -

+ [<Person: Person object (2)>, <Person: Person object (1)>]
?                          ^                            ^

OK, ticket for the m2m test failure https://code.djangoproject.com/ticket/31752

2 Likes

PR, as suggested by Mariusz https://github.com/django/django/pull/13126

2 Likes

Don’t worry :smiley: I’m not upset over it nor do I feel like it’s a failing. In fact, I’m kinda happy that I do now think my original proposal’s timeline is inadequate; it means that I have a better understanding of the tools I’m working with. I take pride in that process of learning.

Will do! I want it to be as clear as possible so everyone can easily drop-in and see everything at a glance. I can show a draft later to get your opinion on it.

Thanks David!

Thanks Ahmad. Let me know when you have something, I’m happy to support.

On a separate topic I’ve just written some comments on a ticket about adding pytest as an alternative / supplementary test runner. Now, whilst I think the answer to the next question should be “absolutely not” and “out of scope” I thought I’d put it here “just in case”.

IF (that is a big if) the project chooses to head towards pytest in the future, even if it is only as a supplementary test runner, does this work still give us benefits. I’m really unclear how something like pytest-xdist would/could help with this project.

I’m completely out of my depth on a technical level here, so as I say, just putting it out here “just in case”.

David

Well, with py-test, we wouldn’t have to implement our own parallel test runner since we’d have that out of the box supported on all major platforms through pytest-xdist, rendering both our existing custom test runner and this entire project redundant.

A huge benefit behind pytest is the ecosystem around it. The plugins are amazing. You can see a lot of tickets on the issue tracker amounting through discussions to “implement a feature that already exists in pytest” more or less. Of course, one could argue the reason Django doesn’t have a similar ecosystem is because we need to refactor our code more to make it easier for others to modify our test runner.

If we assume that pytest is the definite future for python testing, then it’s simply a matter of time till we adopt it, and a matter of much more time for some of our users to update their Django installations to adopt it as well. For us, I don’t think we’ll find an adequate django pytest solution in time for this project to be considered redundant. Knowing Django’s release cycles and deprecation policies, we’d still get a good amount of time where people wouldn’t move toward using pytest and would instead mostly depend on our own test runner to do their tests. Having the option to run them in parallel would be a godsend for them.

Hello all, after talking to David about the next milestone, here is the general workflow we’ve developed:

Work structure for the next milestone

  • First week (July 4th-11th):

    • Work out existing MySQL failures
    • Implementing a basic _clone_test_db method
  • Second week(July 11th-18th):

    • Patch remaining MySQL failures
    • Start testing Oracle with spawn
  • Third week(July 18th-25th):

    • Done with MySQL and need a code review for MySQL & Oracle cloning method
    • Patching Oracle failures
  • Fourth week(25th-27th):

    • Testing Oracle under fork
    • Finalizing all test runner-related failures
    • General code review for the project till now

I’m not sure if these code review dates are suitable for the two of you, @orf and @adamchainz. Let me know how you would like for me to restructure them.

This week at a glance

Current tasks for this week are as follows:

  • SQLite and PostgreSQL:
    • Ensure the test runner can reverse them
  • MySQL:
    • Get an idea of the patterns behind the failures since they’re mostly the same
  • Oracle:
    • Implement the _clone_test_db method

Current issue

Furthermore these are the current observations/notes I have about the SQLite reverse failure:
Errors are mostly caused by database access and raise this error:

django.db.utils.NotSupportedError: SQLite schema editor cannot be used while foreign key constraint checks are enabled. Make sure to disable them before entering a transaction.atomic() context because SQLite does not support disabling them in the middle of a multi-statement transaction. 

This makes me think there’s a problem with database connection setup for SQLite.

There are also two other errors that are HTTP 500 server errors raised in the servers test app. Namely, servers.tests.LiveServerDatabase and LiverServerThreadedTests.

No current working theory why these errors are only seen when reversing the test suite. I will update when I have more information.

I’ll also update my progress on the other tasks as I start to tackle them.

Python 3.6 compatibility

Since backup() isn’t supported in Python versions earlier than 3.7, this project will need to pivot towards using VACUUM INTO to maintain compatibility.

I’m having a slight hiccup with making VACUUM INTO work with copying on-disk into in-memory databases. When executing this code snippet, VACUUM INTO creates a 0 bytes undefined file called ‘file’ instead of copying the on-disk database to the specified URI filename in-memory one:

sourcedb.execute('VACCUM INTO ?', (f'file:memorydb_{str(alias)}_{str(_worker_id)}?mode=memory&cache=shared',))

I’m going to read the documentation to debug this. It might just be a simple URI filename check that I need to turn on.

That’s all for now. Looking forward to writing up the next update!
Ahmad

2 Likes

These dates look fine to me @Valz :+1:. Awesome work so far! If you push any outstanding changes I can help look into the reverse failures today.

I would make a general suggestion about the project: it seems we are running into a lot of hard-to-debug issues with the test suite. As such I would strongly suggest hard time-boxing the investigations and fixes as these have the danger of sucking up a lot of effort and delaying some of the future work.

If by the end of the project we have a mostly working solution and a list of caveats (mysql runs slow, sqlite fails when running the test suite backwards, etc etc) then I would still consider it a big success :partying_face:. Those caveats might also include using backup() instead of VACCUM INTO if you cannot resolve any of the related issues that it brings up.

Sorry for the late reply, just saw this now. Will push the latest changes when I get back home later tonight.

Tasks done:

  • Python 3.6 compatibility done (woohoo). it did turn out to be a URI check
  • Implemented a basic Oracle _clone_test_db method. I’ll probably end up refactoring some of the codebase to make it more DRY when I started adding exception handling.

This week at a glance(in priority):

  • Finalize Oracle _clone_test_db method with exception handling
  • Track and fix MySQL failures
  • Fix PostgreSQL reverse failure (time-boxed to middle of the week)

Areas of uncertainty:

  • SQLite reverse (will re-visit after finishing Oracle and MySQL)

Reverse failures update

SQLite’s reverse failures are non-deterministic when I skip the two failing servers test cases. I initially thought the issue was due to an incorrect order of backup/connect calls: we connect to the database, adding in certain tables, then backup overriding those tables, but that theory is a bust. Still not sure what the cause is, but like you said I shouldn’t get stuck on the small details when we have more pressing matters.

PostgreSQL’s reverse failure is isolated when running the tests with spawn. It passes normally when run with --parallel=1. It also fails when you run the multiple_database test suite in isolation in reverse, so at least it’s consistent in that regard. I think I can figure this one out if I give it some time this week.

Oracle

I’ve implemented a basic _clone_test_db method that isn’t completely correct yet. I’m missing a couple of important schema changes I think. I’m still getting an understanding of what I need to change between the main_db and the cloned db. I’ll be finalizing said changes and adding exception handling this week.

MySQL

Frankly, I didn’t spend time on the MySQL failures this week because I took too much time working out the SQLite and PostgreSQL reverse failures (would’ve saved time there if I read Tom’s post). I’m going to spend more time this week working them out and I’ll be making a post in the middle of this week updating my progress on it to see if its something I’ll need help on or if its under control.

3 Likes

Hello all, here are a couple of interesting updates:

Week at a glance:

  • MySQL failures finalized (by Thursday)
  • Parallelizing the cloning process (by Wednesday)
  • Better exception handling for Oracle & remaining failures

Oracle updates

Cloning process for Oracle is complete and functional at the moment including creation and destruction. The exception handling needs to be cleaner and DRY-er (Tom’s comments on the PR solve this really well)

I’m still in the process of doing a full Oracle test suite run with the cloned databases so I have no errors/failures to report as of yet. I was stuck on a couple of non parallel failures, which Mariusz pointed out to me were solved three weeks ago!

Reason for parallelizing the cloning process

There’s an issue with the cloning speed on Oracle. It is very slow. To tackle that, I’m going to parallelize the cloning process for all database backends. This will lead to a noticeable speed-up for Oracle cloning and MySQL(at least). I’m beginning work on this today and I hope to finish the bulk of it by Wednesday so I can spend the last days fine-tuning it.

MySQL

As for MySQL, the parallel failures were related to timezone errors. A quick google search showed that this is a common problem related to timezone definitions so I’ll be testing it again after changing my configuration file.

1 Like

Just a note to say, I really enjoy reading updates on your progress. :+1:

1 Like

Hi @Valz — this is just a question. Interested to see what @orf, @adamchainz &co think too…

Would it be worth adding (optional) timing outputs to the test runner?

  • Started
  • Took X to build DB.
  • Took Y to clone
  • Took Z to run tests
  • Total time.

I’m looking at the work, with different platforms and DBs, and keep making the same manual notes, but then miss exactly when it moved along… — Anyway: clear enough? Thoughts on the idea?

Cheers, C

Yes, very worth adding! I’ve hit the same issue when trying to benchmark the performance myself.

The idea is clear enough. It would be a nice tool to determine if a patch deteriorates the test suite’s performance or improves it.

Cool! I briefly mentioned it to Mariusz, and he was keen… fancy adding a —timings flag (or better name as you think) to the test runner?

I’ll get right on it. -timings seems good. It’ll be nice to optionally enable which timing to output. We might be interested to only know how much time it takes to setup the database.

Ticket opened here

Super!

I suspect that’d be more complexity than it’s worth, but very happy to look at your thoughts!

Hello everyone! Here is the state of play at the moment:

Main work on ticket_31169

State of Oracle:

Running the full test suite in parallel gives 2 errors that I am still investigating and one failure that seems to be an ordering issue. I’ll be taking care of these three remaining issues in the next milestone.

The test runs I have done with Oracle over the past week had the bulk of their time consumed in cloning. As such, I’ll be turning my attention to ticket 31804 for the next milestone to finalize Oracle properly. See ticket update below for more details.

State of MySQL:

Same as Oracle, the bottleneck is in cloning. I have not yet found a parallel-unique failure on MySQL till now, the failures I’ve caught were related to test logic. There are still a couple left and I’ll be posting tickets and a summary of them over the next two days. The vast majority of the test run however should be green.

Other tickets:

Ticket 31811 (Optional timings):

As per Carlton’s suggestion, optional timings have been added to the test runner. Just need to get it merged into master, and we should be good to go. This patch has made benchmarking much easier.

I made another ticket to show N slowest tests here.

Ticket 31804 (Parallelization of databases):

I’ve added a general clone_test_databases(verbosity, keepdb, parallel) method to give each backend the freedom to do backend-specific setup pre-parallel cloning and any necessary teardown post-setup. This was necessary for MySQL since I wanted to create a single dump for each alias instead of a dump for each clone, and I imagine it’ll be necessary for PostgreSQL as well.

MySQL

This has been much trickier than I originally thought. Adding a process pool and running clone_test_db in parallel did not yield noticeable time improvements (tested on MySQL backend using the timings patch). This led me to try out mysqlpump with the process pool.

After many failed attempts at switching to mysqlpump, I finally figured out a sort of hack that works. The short of it is that we generate a dump via mysqlpump, modify the database name dumped by mysqlpump, then pass it in as input for the load subprocess. It’s necessary to modify the database name since mysqlpump's logical backup inserts the source database’s name before each table, making it a bit difficult to use the dump to create duplicates.

There are though some issues with using mysqlpump. Restoring foreign key constraints seems to be an issue for the utility. I’m still debugging this to determine why and how to fix this. @adamchainz’s input here would be a huge help.

runtests.py --parallel=4 --timings --start-at=pagination

Comparing database cloning time we have:
mysqldump 2771.955s
mysqlpump 1028.989s

This is a significant speed-up.

Parallelizing other databases

On PostgreSQL, I believe we can follow a similar approach but we need to allow multiple connections onto the template database through changing the datallowconn flag. Relevant documentation

On Oracle, an approach almost-identical to the MySQL one would work.

Parallelizing cloning on SQLite isn’t necessary since if we’re forking, they’re automatically cloned, and if we’re spawning, then we restore them during the test run itself, not before.

Finally, I’ll be posting an outline for the next three weeks and how I’m going to structure everything I’ve mentioned above. I’ll also be pushing the latest code after I tweak it a bit so everyone can have a play.

This has been an amazing month! Many thanks to @orf, @adamchainz, @smithdc1, and @carltongibson. We’re almost done with Oracle :smiley:

3 Likes

Hi @Valz. Super stuff thanks.

I want to get the timings in this week, and then use that to run the main PR more objectively.

It’s time for us to start thinking about getting it in. :slightly_smiling_face:

Hello all, here is this milestone’s outline:

  • Week 1: 1-7

    • Finalizing ticket 31811 and merging it
    • Patching the three parallel Oracle failures
    • Patching the nonparallel MySQL failures
  • Week 2: 8-14

    • Ironing out mysqlpump-specific failures
    • Adding PostgreSQL to parallel cloning ticket
  • Week 3: 15-21

    • Code review for ticket 31804
    • Merging ticket 31804 (if all goes well)
    • Documentation for the parallel test runner on Windows and macOS
    • Finalizing ticket_31169’s outstanding issues to merge:
      • SQLite and PostgreSQL’s reverse failures
      • Refactoring parts of the code for more readability and cleanup
      • Splitting up Oracle cloning to its own PR
  • Week 4: 22-24

    • Final checks and changes

Excited to get all of these tickets merged in!

3 Likes

Your post is super. I came here to talk about exactly this. We are nearly there, lets see if we can get this in. :rocket:

Just a few questions:

  • what do you need from others to deliver your plan?

  • Are there any sequencing/dependencies to your plan? I think you may have implicitly stated these in the ordering of your schedule. But worth calling out anything critical. (Timing needs to get in first?)

  • Is some of this within the Minimal Viable Product (MVP) and some of it nice to have? If so, worth calling out so we can be sure to focus on the right things.

Good luck with the last 3(ish) weeks!

p.s. I’m super excited about being able to run tests in parallel on Windows. Speed up based on my earlier test was significant!