Hello everyone! Here is the state of play at the moment:
Main work on ticket_31169
State of Oracle:
Running the full test suite in parallel gives 2 errors that I am still investigating and one failure that seems to be an ordering issue. I’ll be taking care of these three remaining issues in the next milestone.
The test runs I have done with Oracle over the past week had the bulk of their time consumed in cloning. As such, I’ll be turning my attention to ticket 31804 for the next milestone to finalize Oracle properly. See ticket update below for more details.
State of MySQL:
Same as Oracle, the bottleneck is in cloning. I have not yet found a parallel-unique failure on MySQL till now, the failures I’ve caught were related to test logic. There are still a couple left and I’ll be posting tickets and a summary of them over the next two days. The vast majority of the test run however should be green.
Other tickets:
Ticket 31811 (Optional timings):
As per Carlton’s suggestion, optional timings have been added to the test runner. Just need to get it merged into master, and we should be good to go. This patch has made benchmarking much easier.
I made another ticket to show N slowest tests here.
Ticket 31804 (Parallelization of databases):
I’ve added a general clone_test_databases(verbosity, keepdb, parallel) method to give each backend the freedom to do backend-specific setup pre-parallel cloning and any necessary teardown post-setup. This was necessary for MySQL since I wanted to create a single dump for each alias instead of a dump for each clone, and I imagine it’ll be necessary for PostgreSQL as well.
MySQL
This has been much trickier than I originally thought. Adding a process pool and running clone_test_db in parallel did not yield noticeable time improvements (tested on MySQL backend using the timings patch). This led me to try out mysqlpump with the process pool.
After many failed attempts at switching to mysqlpump, I finally figured out a sort of hack that works. The short of it is that we generate a dump via mysqlpump, modify the database name dumped by mysqlpump, then pass it in as input for the load subprocess. It’s necessary to modify the database name since mysqlpump's logical backup inserts the source database’s name before each table, making it a bit difficult to use the dump to create duplicates.
There are though some issues with using mysqlpump. Restoring foreign key constraints seems to be an issue for the utility. I’m still debugging this to determine why and how to fix this. @adamchainz’s input here would be a huge help.
runtests.py --parallel=4 --timings --start-at=pagination
Comparing database cloning time we have:
mysqldump 2771.955s
mysqlpump 1028.989s
This is a significant speed-up.
Parallelizing other databases
On PostgreSQL, I believe we can follow a similar approach but we need to allow multiple connections onto the template database through changing the datallowconn flag. Relevant documentation
On Oracle, an approach almost-identical to the MySQL one would work.
Parallelizing cloning on SQLite isn’t necessary since if we’re forking, they’re automatically cloned, and if we’re spawning, then we restore them during the test run itself, not before.
Finally, I’ll be posting an outline for the next three weeks and how I’m going to structure everything I’ve mentioned above. I’ll also be pushing the latest code after I tweak it a bit so everyone can have a play.
This has been an amazing month! Many thanks to @orf, @adamchainz, @smithdc1, and @carltongibson. We’re almost done with Oracle 