[GSOC 2020] Parallel Test Runner

Concerning dependencies in the plan, the only sequential parts in the project are timings and database cloning, respectively in that sequence. They’re nice to have, but they’re not the MVP.

MVP of course is just the original GSoC project.

What I need from others during this month is any final review comments about the original GSoC project. If there’s something missing or something that could be improved.

I’m very excited about this too! We’re almost there indeed!

Hi all!

Last week’s tasks

  • Ticket 31811 pending merge/review
  • Oracle failures patched
  • MySQL failures not yet patched

I had an issue with my ISP over the last week. (Seems to be a country-wide thing at the moment with my friends having the same issues). It is back to normal now. I’ve fixed the last three Oracle failures, and I made changes to the PR for ticket 31811, fingers crossed getting it merged this week. I’ll be taking the opportunity this week to finish MySQL properly.

Week at a glance:

  • Patch MySQL failures
  • Parallelize PostgreSQL cloning
  • Iron out mysqlpump failures

I might end up finishing these tasks earlier than scheduled, if so I’m going to start tackling next week’s tasks. I want to try to finish ahead of schedule if possible to give more room for final edits from reviews.

1 Like

Hi @Valz! The --timing PR is nearly there. There’s just a question over the setup_databases() signature change, and matching docs tweak. See Mariusz’ last review. Are you good to address those points?

Yup, will get right on it! I hadn’t thought about the implications of the signature change.

Hey Ahmad,
Sorry that I’ve been silent for the last couple of weeks (life :frowning: ). I’m catching up on the discussion here, and it seems that you’ve made a lot of progress :tada:. I also really like the additions of --timings :smiley:, great work.

Are you able to push an update to your GSOC branch? It’s been a while since the last push and I’d like to continue to review it as changes are made, which should make it easier to merge later on.

Also what’s the deal with #31804? The PR was closed but I’m not clear why. Is it not required anymore?

Hi Tom!

I’ve pushed the latest edits. Feel free to take a look! A thing I’m planning on changing is the block here to make it cleaner. I’m considering using a pattern like such: connection.setup_worker_connection() where all of the spawn-logic can be placed instead of the current messy if connection.vendor == 'backend'

I closed #31804 temporarily because I messed up the git commit history when I tried to rebase/squash. Still a newbie at squashing so I deleted that branch and recreated it from scratch. I’ll be reopning it once I finish the PostgreSQL edits tonight.

3 Likes

Hello all!
I’m posting this week’s update a bit late because I wanted to have all my facts checked for the current state of play and the tasks for this week with their deadlines

We’re almost near the end of the GSoC program. Next week’s post will be the last official GSoC post, but expect more posts from me on normal tickets and pull requests :smiley:

Last week’s tasks

  • MySQL failures have been patched in ticket 31888
  • mysqlpump failures have not been ironed out yet.
  • PostgreSQL has been tricky to parallelize

Week at a glance:

  • Splitting up Oracle cloning to its own PR (Monday)
  • Using pg_dump to clone PostgreSQL databases in 31804 (Monday)
  • Adding documentation (Tuesday)
  • Implementing any code review comments on ticket_31169 and finalizing outstanding issues(Friday)

State of ticket 31804 (Parallelizing cloning)

PostgreSQL

After tinkering out with PostgreSQL and re-reading the PostgreSQL documentation I linked above, template databases are not mean to support concurrent cloning. What we can do however is to follow a pattern similar to MySQL’s parallel cloning and use the pg_dump utility, generating a single dump and restoring it in parallel across multiple databases. Whether this approach would be faster and better than the current one is yet to be benchmarked. We can discuss this more on the pull request, which by the time this is posted, will be reopened.

MySQL

Seeing as mysqlpump generates errors that are unseen with mysqldump. I’m going to temporarily switch to mysqldump for ticket_31804 until I can find a way to fix the foreign constraint issue.

Oracle

I’ll be adding Oracle to ticket_31804 once the Oracle’s own cloning ticket gets merged.

2 Likes