GSOC Parallel Test Runner

Hello everyone,

My name is Ahmad. I’m a Computer Engineering and Science sophomore at the German University in Cairo.

I read through the GSOC Idea list and the parallel test runner stood out for me. It seems to be one of the more interesting project ideas. I’m going to start going through the testing framework tickets on Trac to familiarize myself with it, but I’d definitely love any pointers or advice anyone can throw at me.

P.S I seriously love how friendly and open to discussion everyone is, whether on the mailing list or on the forums; I was a bit intimidated at first but that faded away quickly.

Hi Ahmad.

Cool! I think this is a great topic.

Previously we’ve been able to run the test suite in parallel on Linux and macOS.

With Python 3.8, the default (multiprocessing) start method on macOS went from fork to spawn, because fork is deprecated and will be removed in a future version of macOS. For now, you can work around it, by setting it back to fork, but this means we we’re down to just Linux for parallel test execution (on Python 3.8+)

It would be good to get it going with the spawn method, so we can run at full speed on macOS and Windows too.

On macOS with Python 3.8+ or Windows (with any supported Python) run the django test suite, a project’s tests with ./manage.py test using the --parallel=4 (say) flag. You’ll see lots of errors (most are AppRegistry issues.)

The trick will be to work out how to adjust django/test/runner.py::_init_worker() to set the right data.

It’s a nice project because you get to dig right into the multiprocessing module: what exactly is the difference between fork and spawn? (OK, “environment and file descriptors are not shared…” — but what does that mean? :slightly_smiling_face:) Then there’s working out how to get the right DB connections to each worker, and… And then there’s testing across platforms. And…

I’d first of all review the original ticket and discussions. Did this issue come up when it was first implemented? (Windows support would surely have been mentioned…)

Hopefully that gets you started. :slightly_smiling_face:

Kind Regards,

Carlton

1 Like

Hello, Carlton.
My name is Ichlasul Affan. I’m a first year Computer Science magister student at University of Indonesia. I’m looking through all GSoC project ideas and interested in fixing parallel testing for Windows, especially because I mostly used Windows as my main work OS when I’m doing Django projects.

Firstly, thank you very much for your initial advice. Just like Ahmad, I am currently feeling a bit intimidated on how I should initialize on contributing to Django. I am currently working on 1 issue in Trac but it’s about adding warning on UniqueConstraint for MySQL (easy-picking), to familiarize myself on how I should code. Is it worth it to continue working on that issue? I didn’t found new unreviewed issue on Django Test when I started to pick issues at Trac.

Secondly, I have read some of codes on django/test/runner.py and django/core/management/__init__.py. It shows that Django loads Apps registry at first by executing django.setup(), after that it continues by running the specific command (in this case, spawn a DiscoverRunner). Is it okay to start with django.setup() on each thread that being spawned using spawn method? Please correct me if I’m wrong. :smile:

That’s all for now. Any further advice is very appreciated. Thanks.

Hi @ichlaffterlalu. Welcome.

setup() ultimately just sets some state, so maybe we can reuse that, rather than recalling it, but, yes we need the app registry populated in each worker.

Equally we need to make sure each worker is using the correct database connection (there’s one cloned for each worker).

Part of the project is working out these issues, and how to test them…

@carltongibson

I’ve read through documentation and the PR that implemented the parallel test runner itself. The difference between spawn and fork seems to be whether or not there is an inherited state from parent to child processes which in our case means that setting up state has to be done per-process.

Wouldn’t each worker use the correct database connection because of get_test_db_clone_settings? The only situation I see right now where that method fails to work is if the cloning process itself failed and the get_test_db_clone_settings gives an incorrectly configured connection

Is there something I’m missing about the difference between spawn and fork affecting worker initialization?

Hi @Valz. Yeah… it’s just a case of initialising the workers correctly. (I didn’t look into the DB connection handling as yet…)

Alright, thank you for your reply.

I wanted to ask: should I submit my proposal on this forum or on the django-developers mailing list?

I was also considering to expand the scope of this proposal to include bringing in the Oracle database into the parallel test runner. This of course would mean a lot more testing but I believe Oracle users would get a huge performance boost when running tests. I’ve already read through the original PR’s discussion about implementing Oracle cloning and how much of a headache that was compared to other databases, but it seems like a feature that has a ton of support. I can of course ask on the django-developers mailing list to confirm this

It needs to go to the GSoC portal to be submitted.

I would offer to read a draft, but the reality of that with the current situation with covid-19 I really don’t have the capacity.

It’s okay. Hopefully the situation will get better.

I’ll submit it on the GSoC portal later then. Is it in line with community guidelines to ask for someone to review my draft on the mailing list or would that be off-topic?

There’s no problem is asking. :slightly_smiling_face:

Alright great :slight_smile:

Wanted to have an explicit blessing because I understand django has a ton of structure

Thanks again for your replies!

1 Like