[GSOC 2020] Parallel Test Runner

I definitely agree. Adding a more detailed structure with key milestones and timings to let everyone know when I would need a code review or similar things would make this process a lot easier. I’m largely still following the same outline I laid out initially in my proposal, but due to conflicts with my university exams, it ruined that timeline. I also think my proposal’s timeline wasn’t accurate or representative of the problems I ended up facing in implementation. It has been an awesome learning experience though to say the least :smiley:

The next problem I foresee coming up is MySQL; I’ve left the test runner open since my post earlier today, and it is still setting up the databases, albeit finally creating the ‘other’ database. I’m in the middle of setting up a Linux computer to test MySQL since my current computer is taking an absurd amount of time to just set them up.

I’m planning on patching all of the MySQL failures over the next couple of days, and revisiting the last PostgreSQL failure right after.

An unrelated issue that could arise in the future is developers writing tests without knowing the limitations of the parallel test runner on Windows and getting unexpected errors. I’ll definitely need to add documentation for that in specific.

I do think there are issues that could occur with the next milestone, and I’m going to try to lay out a more comprehensive and detailed overlook of the next milestone to avoid the problem of a lack of clear structure/organization.

1 Like

The MySQL slowness is really quite weird. Have you tried turning on SQL logging to see what it’s actually waiting on? The test runner --debug-sql only works for tests, but you can hackily enable all logging by sticking:

logger = logging.getLogger('django.db.backends')
logger.setLevel(logging.DEBUG)
logger.addHandler(logging.StreamHandler()

At the top of runtests.py.

What specific failures are you seeing? If you give me a list I can spend some time helping you debug them :+1:

An unrelated issue that could arise in the future is developers writing tests without knowing the limitations of the parallel test runner on Windows and getting unexpected errors. I’ll definitely need to add documentation for that in specific.

Given that two major platforms (MacOS and Windows) don’t work well with forking, would it be a bad idea to use spawn instead of fork on all platforms? These are not really limitations of the windows parallel runner, and more bugs that are only surfaced when you don’t run with fork().

I’ll try SQL logging when I test MySQL again. Hopefully, it’ll clear up where the performance drop is. Thanks for the idea!

Here’s the exhaustive list of failures:

  • General failures:
    • check_framework Fixed on latest master (still going to push it later today)
  • SQLite:
    • Running the test suite in reverse (Seems inconsistent in failures, haven’t had a fail after latest change in code base, do test on your machine though, will push it after post)
  • PostgreSQL
    • Reversing multiple_database gives an error
    • m2m_through Addressed below
  • MySQL (After 21 uninterrupted hours, it finally ran the entire test suite)
    • db_functions
    • timezones
    • schema
    • unittest.loader._FailedTest
    • proxy_model_inheritance
    • datetimes
    • admin_views

My thoughts so far on the failures:

The most important ones for me to tackle right now are the MySQL failures since those are the ones I do not have complete liberty to test due to the relative slowness of MySQL on my machine. I also could use another look at reversing the test suite for SQLite; it doesn’t cause any more failures on my local machine, but you can never be too sure. I’m working on patching the PostgreSQL reverse failure right now and the m2m_through failure.

It would be a major help if you could look at those, Tom! :slight_smile:

check_framework fix

I’m using a CheckModel that gets saved on the main process’s default database to check if the check did run or not. During tests, I temporarily switch the worker to the main process’s database to check if that model exists or not; for SQLite, I had to clone databases to save a copy of the main process’s in-memory database. Haven’t tested on MySQL yet but this should work out of the box.

Moving to spawn across all platforms

Switching to spawn on all platforms would make developing the parallel test runner moving forward much easier since we could safely assume spawn-only usage. I think this merits at least a full consensus on the mailing list since it is a huge change.

After multiple look-throughs, the issue with the m2m_through failure boils down to the QuerySet API and how it handles order_by() calls.

class CustomMembership(models.Model):
    person = models.ForeignKey(
        Person,
        models.CASCADE,
        db_column="custom_person_column",
        related_name="custom_person_related_name",
    )
    group = models.ForeignKey(Group, models.CASCADE)
    weird_fk = models.ForeignKey(Membership, models.SET_NULL, null=True)
    date_joined = models.DateTimeField(default=datetime.now)

    class Meta:
        db_table = "test_table"
    #    ordering = ["date_joined"] (commenting this out fixes the failures)

test_order_by_relational_field_through_model tests if the QuerySet call respects the ordering specified in the related model as stated in the docs. When running this test in parallel, or non-parallel modes, it fails as the SQL itself prioritizes the CustomMembership’s date_joined ordering over the related Person model’s own ordering by name. I’m not sure how this is succeeds in Linux.

Either the intended behavior is for CustomMembership’s ordering to override the related model’s and the docs need to be updated, or it’s the opposite case and the QuerySet API needs to be fixed. Happy to open a ticket, but I want confirmation. @carltongibson, I would love to hear your thoughts about this

Hmmm. I need to give it a run. I can do that tomorrow, but that test has been there a while — the question is why’s it failing on the branch? :thinking:

I definitely think it warrants further investigation. I am getting the same failure on my main django master branch on Windows (this one doesn’t have any changes related to GSoC).

I’ll try to investigate further on my end to make more sense of the failure.

OK, I’ll have a play in the morning. If I can reproduce we’ll open a ticket for that failure. :+1:

Do make sure to re-run the test multiple times. It’s a sneaky test that sometimes passes/fails due to two of the models having the same date_joined number. The SQL query then interchanges them during tests since they have the same exact date.

All GSoC documentation is up here.

I also think my proposal’s timeline wasn’t accurate or representative of the problems I ended up facing in implementation.

This is to be expected, please don’t feel like this is a failing. The plan was an outline based on an (relatively) unknown topic with only a ‘best guess’ of what issues you may face in the future. We’re also living in very uncertain and potentially upsetting times and so our own well being (both physically / mentally) need to be looked after more than ever.

I read your blog post, I’d encourage to have the ‘work breakdown structure’ at a level than can be consumed by the audience at a glance. Focus on key milestones / objectives and then flag any that are at risk of being missed, why and what support you need. Finally, don’t forget to celebrate your successes!

It has been an awesome learning experience though to say the least

This is what really matters :+1:

p.s. all, please say if you disagree with my guidance!

1 Like

p.s. I’ve just run the test suite on Django master on Windows 10 with Postgres 12.

======================================================================
FAIL: test_order_by_relational_field_through_model (m2m_through.tests.M2mThroughTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\smith\PycharmProjects\django2\tests\m2m_through\tests.py", line 247, in test_order_by_relational_field_through_model
    [self.jim, self.bob]
AssertionError: Sequences differ: <QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]> != [<Person: Person object (2)>, <Person: Person
 object (1)>]

First differing element 0:
<Person: Person object (1)>
<Person: Person object (2)>

- <QuerySet [<Person: Person object (1)>, <Person: Person object (2)>]>
? ----------                         ^                            ^   -

+ [<Person: Person object (2)>, <Person: Person object (1)>]
?                          ^                            ^

OK, ticket for the m2m test failure https://code.djangoproject.com/ticket/31752

2 Likes

PR, as suggested by Mariusz https://github.com/django/django/pull/13126

2 Likes

Don’t worry :smiley: I’m not upset over it nor do I feel like it’s a failing. In fact, I’m kinda happy that I do now think my original proposal’s timeline is inadequate; it means that I have a better understanding of the tools I’m working with. I take pride in that process of learning.

Will do! I want it to be as clear as possible so everyone can easily drop-in and see everything at a glance. I can show a draft later to get your opinion on it.

Thanks David!

Thanks Ahmad. Let me know when you have something, I’m happy to support.

On a separate topic I’ve just written some comments on a ticket about adding pytest as an alternative / supplementary test runner. Now, whilst I think the answer to the next question should be “absolutely not” and “out of scope” I thought I’d put it here “just in case”.

IF (that is a big if) the project chooses to head towards pytest in the future, even if it is only as a supplementary test runner, does this work still give us benefits. I’m really unclear how something like pytest-xdist would/could help with this project.

I’m completely out of my depth on a technical level here, so as I say, just putting it out here “just in case”.

David

Well, with py-test, we wouldn’t have to implement our own parallel test runner since we’d have that out of the box supported on all major platforms through pytest-xdist, rendering both our existing custom test runner and this entire project redundant.

A huge benefit behind pytest is the ecosystem around it. The plugins are amazing. You can see a lot of tickets on the issue tracker amounting through discussions to “implement a feature that already exists in pytest” more or less. Of course, one could argue the reason Django doesn’t have a similar ecosystem is because we need to refactor our code more to make it easier for others to modify our test runner.

If we assume that pytest is the definite future for python testing, then it’s simply a matter of time till we adopt it, and a matter of much more time for some of our users to update their Django installations to adopt it as well. For us, I don’t think we’ll find an adequate django pytest solution in time for this project to be considered redundant. Knowing Django’s release cycles and deprecation policies, we’d still get a good amount of time where people wouldn’t move toward using pytest and would instead mostly depend on our own test runner to do their tests. Having the option to run them in parallel would be a godsend for them.

Hello all, after talking to David about the next milestone, here is the general workflow we’ve developed:

Work structure for the next milestone

  • First week (July 4th-11th):

    • Work out existing MySQL failures
    • Implementing a basic _clone_test_db method
  • Second week(July 11th-18th):

    • Patch remaining MySQL failures
    • Start testing Oracle with spawn
  • Third week(July 18th-25th):

    • Done with MySQL and need a code review for MySQL & Oracle cloning method
    • Patching Oracle failures
  • Fourth week(25th-27th):

    • Testing Oracle under fork
    • Finalizing all test runner-related failures
    • General code review for the project till now

I’m not sure if these code review dates are suitable for the two of you, @orf and @adamchainz. Let me know how you would like for me to restructure them.

This week at a glance

Current tasks for this week are as follows:

  • SQLite and PostgreSQL:
    • Ensure the test runner can reverse them
  • MySQL:
    • Get an idea of the patterns behind the failures since they’re mostly the same
  • Oracle:
    • Implement the _clone_test_db method

Current issue

Furthermore these are the current observations/notes I have about the SQLite reverse failure:
Errors are mostly caused by database access and raise this error:

django.db.utils.NotSupportedError: SQLite schema editor cannot be used while foreign key constraint checks are enabled. Make sure to disable them before entering a transaction.atomic() context because SQLite does not support disabling them in the middle of a multi-statement transaction. 

This makes me think there’s a problem with database connection setup for SQLite.

There are also two other errors that are HTTP 500 server errors raised in the servers test app. Namely, servers.tests.LiveServerDatabase and LiverServerThreadedTests.

No current working theory why these errors are only seen when reversing the test suite. I will update when I have more information.

I’ll also update my progress on the other tasks as I start to tackle them.

Python 3.6 compatibility

Since backup() isn’t supported in Python versions earlier than 3.7, this project will need to pivot towards using VACUUM INTO to maintain compatibility.

I’m having a slight hiccup with making VACUUM INTO work with copying on-disk into in-memory databases. When executing this code snippet, VACUUM INTO creates a 0 bytes undefined file called ‘file’ instead of copying the on-disk database to the specified URI filename in-memory one:

sourcedb.execute('VACCUM INTO ?', (f'file:memorydb_{str(alias)}_{str(_worker_id)}?mode=memory&cache=shared',))

I’m going to read the documentation to debug this. It might just be a simple URI filename check that I need to turn on.

That’s all for now. Looking forward to writing up the next update!
Ahmad

2 Likes

These dates look fine to me @Valz :+1:. Awesome work so far! If you push any outstanding changes I can help look into the reverse failures today.

I would make a general suggestion about the project: it seems we are running into a lot of hard-to-debug issues with the test suite. As such I would strongly suggest hard time-boxing the investigations and fixes as these have the danger of sucking up a lot of effort and delaying some of the future work.

If by the end of the project we have a mostly working solution and a list of caveats (mysql runs slow, sqlite fails when running the test suite backwards, etc etc) then I would still consider it a big success :partying_face:. Those caveats might also include using backup() instead of VACCUM INTO if you cannot resolve any of the related issues that it brings up.

Sorry for the late reply, just saw this now. Will push the latest changes when I get back home later tonight.

Tasks done:

  • Python 3.6 compatibility done (woohoo). it did turn out to be a URI check
  • Implemented a basic Oracle _clone_test_db method. I’ll probably end up refactoring some of the codebase to make it more DRY when I started adding exception handling.

This week at a glance(in priority):

  • Finalize Oracle _clone_test_db method with exception handling
  • Track and fix MySQL failures
  • Fix PostgreSQL reverse failure (time-boxed to middle of the week)

Areas of uncertainty:

  • SQLite reverse (will re-visit after finishing Oracle and MySQL)

Reverse failures update

SQLite’s reverse failures are non-deterministic when I skip the two failing servers test cases. I initially thought the issue was due to an incorrect order of backup/connect calls: we connect to the database, adding in certain tables, then backup overriding those tables, but that theory is a bust. Still not sure what the cause is, but like you said I shouldn’t get stuck on the small details when we have more pressing matters.

PostgreSQL’s reverse failure is isolated when running the tests with spawn. It passes normally when run with --parallel=1. It also fails when you run the multiple_database test suite in isolation in reverse, so at least it’s consistent in that regard. I think I can figure this one out if I give it some time this week.

Oracle

I’ve implemented a basic _clone_test_db method that isn’t completely correct yet. I’m missing a couple of important schema changes I think. I’m still getting an understanding of what I need to change between the main_db and the cloned db. I’ll be finalizing said changes and adding exception handling this week.

MySQL

Frankly, I didn’t spend time on the MySQL failures this week because I took too much time working out the SQLite and PostgreSQL reverse failures (would’ve saved time there if I read Tom’s post). I’m going to spend more time this week working them out and I’ll be making a post in the middle of this week updating my progress on it to see if its something I’ll need help on or if its under control.

3 Likes

Hello all, here are a couple of interesting updates:

Week at a glance:

  • MySQL failures finalized (by Thursday)
  • Parallelizing the cloning process (by Wednesday)
  • Better exception handling for Oracle & remaining failures

Oracle updates

Cloning process for Oracle is complete and functional at the moment including creation and destruction. The exception handling needs to be cleaner and DRY-er (Tom’s comments on the PR solve this really well)

I’m still in the process of doing a full Oracle test suite run with the cloned databases so I have no errors/failures to report as of yet. I was stuck on a couple of non parallel failures, which Mariusz pointed out to me were solved three weeks ago!

Reason for parallelizing the cloning process

There’s an issue with the cloning speed on Oracle. It is very slow. To tackle that, I’m going to parallelize the cloning process for all database backends. This will lead to a noticeable speed-up for Oracle cloning and MySQL(at least). I’m beginning work on this today and I hope to finish the bulk of it by Wednesday so I can spend the last days fine-tuning it.

MySQL

As for MySQL, the parallel failures were related to timezone errors. A quick google search showed that this is a common problem related to timezone definitions so I’ll be testing it again after changing my configuration file.

1 Like

Just a note to say, I really enjoy reading updates on your progress. :+1:

1 Like