[GSOC 2020] Parallel Test Runner

Hello everyone!

Communication

I’ve setup a documentation blog here to document my progress/thoughts about the project so far. I’ve made an initial post about the tickets I’ve tackled during the community bonding period. You can find this post here. I’m mainly going to use the blog to give updates about my progress and post here for direct communication if I’m stuck on a particular problem. Of course, if my mentors prefer otherwise, I do not mind posting my progress on here and the blog as well or to use another communication channel.

Scheduling:

The schedule I already mentioned on my proposal is the one I’ll largely be sticking to. I have a long period of exams/assessments running from mid-May till the first week of July. This however won’t affect my proposed schedule; I’m going to compensate by beginning work on the first milestone this week so I give myself more buffer time during June.

The schedule for documentation purposes is largely this:

  • First Milestone: fixing worker database connections and adapting SQLite’s cloning method to using spawn.
  • Second Milestone: Adding Oracle backend support to the parallel test runner.
  • Third Milestone: General cleanup, documentation, and tackling other related tickets.

After the first milestone, the parallel test runner will be fully operational on Windows and macOS using spawn. I’ll also ensure the added logic didn’t break running the parallel test runner using fork.

Current implementation issue

Running the entire test suite with the current patch leads to nondeterministic failures on these tests:

  • check_framework
  • servers
  • test_utils
  • queries
  • transactions
  • transaction_hooks
  • view_tests
  • utils_tests

I say nondeterministic because both the number of errors and failures and the tests that fail vary every single test run.

The majority of errors are operational errors due to the queried tables not existing. These errors also result in failures due to again the specified tables not existing.

Curious enough, running these tests in isolations leads to no errors. Although, two failures remain from test_utils and utils_tests.

I’m not sure what exactly causes the tables to be removed/not exist when the full test suite is run versus only running one set of tests in isolation.

Running the test suite with --start-at=test_utils also gives no errors, just the two failures from test_utils and utils_tests. After reading through the test runner options, I’m going to use --pair and --bisect to determine what causes the failures and post my results afterwards.

Here’s the link to the Jenkins build. It shows the exact errors and failures along with the test names.

3 Likes

Hey Ahmad! It’s fantastic to see your progress. I had a play with your branch tonight - it’s looking good!

Regarding the spurious failures this seems to be releated to the Django database teardown code. From my investigation it seems that:

  1. A spawned worker dies for whatever reason, which triggers Django to tear down the database (and thus remove the file)
  2. The multiprocessing.pool restarts it, which somehow results in an empty database
  3. Subsequent tests then fail due to the database being wiped clean.

I’m not entirely sure if this is exactly the cause of events, but running runtests.py --keepdb on my Macbook results in a lot less of these random failures, but quite a few ones relating to AttributeError: Can't pickle local object 'infix.<locals>.Operator' which appears to come from a TemplateDoesNotExist exception.

Multiprocessing and files can get tricky - I’m sorry if we’ve already covered/considered this, but have you thought about a two phase approach where the master process writes a sqlite.db file with all the migrations, then each child loads it into memory?

connection.settings_dict['NAME'] = ":memory:" 
dest = os.path.join(DIR, 'main.sqlite3')
old_conn = sqlite3.connect(dest)
new_conn = sqlite3.connect(":memory:")
old_conn.backup(new_conn)
new_conn.commit()
connection.connection = new_conn

That might avoid issues where a worker picks up an invalid file due to a previous crash by keeping the on-disk database untouched?

Hey Tom! Thanks for checking out my branch! I pushed a new commit now fixing all test failures except three that are thankfully consistent:

  • test_main_module_is_resolved (utils_tests.test_autoreload.TestIterModulesAndFiles)
  • test (test_utils.test_transactiontestcase.TestSerializedRollbackInhibitsPostMigrate)
  • test_registered_check_did_run (check_framework.tests.ChecksRunDuringTests)

There are also 2 new failures from the Jenkins build.

After trying out different approaches this week, the current somewhat stable version is using in-memory databases the way you suggested. Only one database is created per alias; I did this by ignoring any suffix number that is greater than 1. This is a slight optimization that makes a lot of sense in my opinion because multiple processes can read the same database file concurrently.

Cloning is done during worker initialization backing up the on-disk database into a unique in-memory database for each connection like so:

import sqlite3
sourcedb = sqlite3.connect('%s.sqlite3' % str(alias))
settings_dict = connection.settings_dict
settings_dict['NAME'] = 'file:memorydb_{}_{}?mode=memory&cache=shared'.format(str(alias), str(_worker_id))
connection.settings_dict.update(settings_dict)
connection.connect()
sourcedb.backup(connection.connection)

Adding in connection.connect() removed failures related to database setup such as database transaction behavior in atomic blocks and missing database functions. During spawn, connections are uninitialized and workers do not see existing databases. For PostgreSQL and MySQL, we don’t connect during worker initialization, we just point the worker to the correct database name and the worker connects appropriately using the correct name during the first test run.

Initializing a connection for SQLite on-memory databases was necessary to backup an existing database onto them.

Also, it’s important to connect the database to a unique URI filename to separate it from other in-memory databases in the same process as per documentation

Connecting it like in the first line causes cache conflicts and db connection conflicts with other database aliases in the same process.

settings_dict['NAME'] = ':memory:'  # Not unique

settings_dict['NAME'] =  'file:memorydb_{}_{}?mode=memory&cache=shared'.format(str(alias), str(_worker_id)) # Unique (possible to remove _worker_id from the name, but I kept it for consistency with test names)

The last major change I made was switching from VACUUM INTO to backup().
After testing using in-memory databases, there’s always a 20-40 second difference with the two methods. In any case, I’ll be benchmarking this again after the last three five? stubborn failures are dealt with to put the matter to rest.

@adamchainz might be interested to see how the benchmark turns out. I’m betting on backup! :slight_smile:

Side note: directory management wasn’t necessary so I stripped it all away, might be necessary if we consider django users I think, will test later because another small side-goal I have for the parallel test runner in general is to make it more usable and extensible for django users

Nice! I’m glad to hear you’re making progress. I had a bit of a look today, and the autoreload failure is interesting - it’s down to this line: https://github.com/django/django/blob/master/django/utils/autoreload.py#L118. Multiprocessing does something special and sets __name__ to __mp_main__. I think it’s acceptable to change that condition to check for __mp_main__ as well as __main__.

The second failure is related to this bit of code, which is indeed a slightly horrific process :sweat_smile:. I think you also need to copy _test_serialized_contents to your new connections somehow, else this condition is false when it should be true.

The last failure is because of this global variable. Checks are only run on the master process and not all child processes, so after the worker spawn's the variable is set to False.

I’m not really a fan of the test that uses that variable, maybe there is a better way of doing this? Or we can just explicitly trigger run_tests from within that test somehow.

All in all these are some very interesting issues! But it seems like we are nearly there with switching to spawn!

I hope this helps!

Hey Tom, thanks for the look and the help!

Thanks for the autoreload insight, I was thinking about that yesterday and I couldn’t wrap my head around it.

The serialized contents fail was rather simple to do but it really is a horrific process.

I’m currently working on fixing the contenttypes_tests failures and I’m writing tests for the test runner. I should be done with everything over the weekend; though, I would appreciate any input you have on dealing with it. I’ll be pushing my branch tonight so you can see the changes.

The current failures with contenttypes_tests are the last two tests in test_management, the ones modifying settings. Spawn messes up something there but I’m yet to determine what it is.

Sorry for the lapse in communication and documentation, I’ll be publishing my blog articles also over the weekend; I’ve been writing drafts but I haven’t finalized them yet due to the exams.

The mycheck.did_run fail is honestly unavoidable. We could fix it through using a Pipe or a Queue from the multiprocessing library but it would be a horrible hack in my opinion.

The two options I see for dealing with it are:

  • Ignoring this test on parallel test runs using spawn
  • Implement a Pipe or Queue to communicate between the worker process executing the test and the main process

The second would become an established standard for how to preserve any information generated from a work process back to the main process when using spawn, so we’d have to design it very well.

At the moment, I don’t see compelling use cases for such a standard other than this particular check failure. I do want to get your opinion and Adam’s though because I might be wrong.

Hi @Valz — how have you been getting on?

Can you bring me up to speed? — I’ve just started looking at your draft PR, but where’s that at? The issues are getting the per-runner DBs right yes?

And if you can make some notes, it also gives us something to point at for the first review.

Beyond that, what other help do you need? What can we so to support you?

I’m so looking forward to this working: if I had parallel tests on Window it would be almost complete :grinning:

Kind Regards,

Carlton

Hey @carltongibson, welcome back!

The current draft PR has fixed the following:

  • Worker DB connections for MySQL and PostgresSQL
  • Worker DB connections to in-memory SQLite (we copy them into on-disk databases and then copy them back into in-memory per-worker process)

We currently have three failures that I do not yet know how to tackle:

  • two contenttypes_tests.test_management failures
  • one check_framework failure that is discussed above

Worker DBs are correct at the moment.
What I need is to finalize a design decision on the check_framework test: if we should attempt to tackle it through finding a good way to communicate between spawned processes and the main process or if we should ignore it since it’s the only use case for such communication at the moment.

Another thing I need help on is figuring out where the contenttypes failures are coming from. I suspect they’re related to either the @modify_settings decorator or the remove_stale_contenttypes command.

As for the notes, how would you like for me to prepare the notes? I’m mostly depending on the blog to provide clear documentation, but as I’ve gotten late on that, I’m all ears if you want me to provide something immediate.

I’m very excited to get this done as well :smiley:, we’ll finally have lightning fast tests on Windows

2 Likes

Super. Good update. Let me have a read-over.

There’s no urgency: the blog is perfect if that’s your plan. (End of the month is first assessment.)

I say let’s skip the check_framework failure: The way its structured now is simply incompatible with spawn-based multiprocessing, and rather than invest any time with complex syncing we can likely just re-work the test. Lets not get blocked on it now though.

I’m on holidays until the 24th so I cannot help you with the other failures until then. Maybe Adam can?

1 Like

Progress update

On all databases, check_framework is failing.

  • SQLite
    • Works except when running the test runner in reverse
  • PostgreSQL
    • One failure on m2m_through(doesn’t happen in isolation)
  • MySQL
    • Extremely slow to set up, but working

For the check_framework failure, a hack I can think of that would work is creating an external file that we can write to a value; saving the value to a file would make the test process-agnostic, and during the test we can open that file to read it. If that file doesn’t exist, or has an incorrectly written value, then the check didn’t run. We’ll handle clean up of said file during the test run or during teardown. Interested to know your opinions on the matter, especially if there’s a cleaner way to rewrite it.

The SQLite reverse failure I think is due to an unclosed file somewhere during the tests, which causes a failure whenever we open another file during the test run.

For PostgreSQL, I honestly don’t know yet why the test is failing when run with the entire test suite. It doesn’t have a relation to external apps since it’s a pure QuerySet evaluation so I’m still wrapping my head around it.

A serious issue I’m having with MySQL is that it takes at least an hour or so for it to finish creating the tables. Then, it gets stuck running deferred SQL for a couple more hours. I left my computer on while I went to sleep to wake up to it still cloning databases. Is anyone else having performance issues running MySQL on a local development server?

If anyone could give the patch a code review soon, I can work on it so we can hopefully get it merged soon and finally have parallel testing on Windows.

2 Likes

Thanks for the update. :smiley:

The project manager in me thinks a high level plan with key milestones and timings may be helpful (if I remember correctly your proposal had an outline plan). I think it would be a useful tool to help track progress and also help you to manage others e.g. I’ll need a code review on this topic, during this period of time as it unlocks this dependency etc.

I think your note above gives a nice summary of current risks / issues. Is there anything you see becoming a problem in the future, if so what’s your thoughts on these?

I definitely agree. Adding a more detailed structure with key milestones and timings to let everyone know when I would need a code review or similar things would make this process a lot easier. I’m largely still following the same outline I laid out initially in my proposal, but due to conflicts with my university exams, it ruined that timeline. I also think my proposal’s timeline wasn’t accurate or representative of the problems I ended up facing in implementation. It has been an awesome learning experience though to say the least :smiley:

The next problem I foresee coming up is MySQL; I’ve left the test runner open since my post earlier today, and it is still setting up the databases, albeit finally creating the ‘other’ database. I’m in the middle of setting up a Linux computer to test MySQL since my current computer is taking an absurd amount of time to just set them up.

I’m planning on patching all of the MySQL failures over the next couple of days, and revisiting the last PostgreSQL failure right after.

An unrelated issue that could arise in the future is developers writing tests without knowing the limitations of the parallel test runner on Windows and getting unexpected errors. I’ll definitely need to add documentation for that in specific.

I do think there are issues that could occur with the next milestone, and I’m going to try to lay out a more comprehensive and detailed overlook of the next milestone to avoid the problem of a lack of clear structure/organization.

1 Like

The MySQL slowness is really quite weird. Have you tried turning on SQL logging to see what it’s actually waiting on? The test runner --debug-sql only works for tests, but you can hackily enable all logging by sticking:

logger = logging.getLogger('django.db.backends')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler()

At the top of runtests.py.

What specific failures are you seeing? If you give me a list I can spend some time helping you debug them :+1:

An unrelated issue that could arise in the future is developers writing tests without knowing the limitations of the parallel test runner on Windows and getting unexpected errors. I’ll definitely need to add documentation for that in specific.

Given that two major platforms (MacOS and Windows) don’t work well with forking, would it be a bad idea to use spawn instead of fork on all platforms? These are not really limitations of the windows parallel runner, and more bugs that are only surfaced when you don’t run with fork().

I’ll try SQL logging when I test MySQL again. Hopefully, it’ll clear up where the performance drop is. Thanks for the idea!

Here’s the exhaustive list of failures:

  • General failures:
    • check_framework Fixed on latest master (still going to push it later today)
  • SQLite:
    • Running the test suite in reverse (Seems inconsistent in failures, haven’t had a fail after latest change in code base, do test on your machine though, will push it after post)
  • PostgreSQL
    • Reversing multiple_database gives an error
    • m2m_through Addressed below
  • MySQL (After 21 uninterrupted hours, it finally ran the entire test suite)
    • db_functions
    • timezones
    • schema
    • unittest.loader._FailedTest
    • proxy_model_inheritance
    • datetimes
    • admin_views

My thoughts so far on the failures:

The most important ones for me to tackle right now are the MySQL failures since those are the ones I do not have complete liberty to test due to the relative slowness of MySQL on my machine. I also could use another look at reversing the test suite for SQLite; it doesn’t cause any more failures on my local machine, but you can never be too sure. I’m working on patching the PostgreSQL reverse failure right now and the m2m_through failure.

It would be a major help if you could look at those, Tom! :slight_smile:

check_framework fix

I’m using a CheckModel that gets saved on the main process’s default database to check if the check did run or not. During tests, I temporarily switch the worker to the main process’s database to check if that model exists or not; for SQLite, I had to clone databases to save a copy of the main process’s in-memory database. Haven’t tested on MySQL yet but this should work out of the box.

Moving to spawn across all platforms

Switching to spawn on all platforms would make developing the parallel test runner moving forward much easier since we could safely assume spawn-only usage. I think this merits at least a full consensus on the mailing list since it is a huge change.

After multiple look-throughs, the issue with the m2m_through failure boils down to the QuerySet API and how it handles order_by() calls.

class CustomMembership(models.Model):
    person = models.ForeignKey(
        Person,
        models.CASCADE,
        db_column="custom_person_column",
        related_name="custom_person_related_name",
    )
    group = models.ForeignKey(Group, models.CASCADE)
    weird_fk = models.ForeignKey(Membership, models.SET_NULL, null=True)
    date_joined = models.DateTimeField(default=datetime.now)

    class Meta:
        db_table = "test_table"
    #    ordering = ["date_joined"] (commenting this out fixes the failures)

test_order_by_relational_field_through_model tests if the QuerySet call respects the ordering specified in the related model as stated in the docs. When running this test in parallel, or non-parallel modes, it fails as the SQL itself prioritizes the CustomMembership’s date_joined ordering over the related Person model’s own ordering by name. I’m not sure how this is succeeds in Linux.

Either the intended behavior is for CustomMembership’s ordering to override the related model’s and the docs need to be updated, or it’s the opposite case and the QuerySet API needs to be fixed. Happy to open a ticket, but I want confirmation. @carltongibson, I would love to hear your thoughts about this

Hmmm. I need to give it a run. I can do that tomorrow, but that test has been there a while — the question is why’s it failing on the branch? :thinking:

I definitely think it warrants further investigation. I am getting the same failure on my main django master branch on Windows (this one doesn’t have any changes related to GSoC).

I’ll try to investigate further on my end to make more sense of the failure.

OK, I’ll have a play in the morning. If I can reproduce we’ll open a ticket for that failure. :+1:

Do make sure to re-run the test multiple times. It’s a sneaky test that sometimes passes/fails due to two of the models having the same date_joined number. The SQL query then interchanges them during tests since they have the same exact date.

All GSoC documentation is up here.

I also think my proposal’s timeline wasn’t accurate or representative of the problems I ended up facing in implementation.

This is to be expected, please don’t feel like this is a failing. The plan was an outline based on an (relatively) unknown topic with only a ‘best guess’ of what issues you may face in the future. We’re also living in very uncertain and potentially upsetting times and so our own well being (both physically / mentally) need to be looked after more than ever.

I read your blog post, I’d encourage to have the ‘work breakdown structure’ at a level than can be consumed by the audience at a glance. Focus on key milestones / objectives and then flag any that are at risk of being missed, why and what support you need. Finally, don’t forget to celebrate your successes!

It has been an awesome learning experience though to say the least

This is what really matters :+1:

p.s. all, please say if you disagree with my guidance!

1 Like