Importing and transforming a MySQL database

I’m working on a Django app to replace a legacy app built in PHP and MySQL (MariaDB, to be more precise). One of the biggest concerns I have is how to import the data from the old app. Among the concerns:

  • The new app will use Postgres, not MySQL
  • The new app’s database schema and models are completely different from the legacy app’s schema, so I will need to transform the data as I import
  • The new app’s database might not live on the same server as the legacy app’s database

I currently only have visibility into the legacy database via PHPmyAdmin. I can export from here, but I don’t have direct access to the server that it lives on. At least, not yet. Available export formats are MediaWiki table, Texy, JSON, open document, CSV, Word, CodeGen, Yaml, XML, latex, PDF, PHP array and SQL.

So I need to export the database somehow and import it into the new app. It will need to transform the data to fit the new schema. What are my options? Are there any tools available that will assist with this?

Right now I’ve written an admin tool to upload exported JSON data from a few of the tables, but this seems like a cumbersome and not scalable way of handling the imports.

Have you see the Django inspectdb command? It can generate model code for an existing database, at least as a first pass. This could be an alternative technique.

https://docs.djangoproject.com/en/4.0/howto/legacy-databases/

You might be able to export your MySQL DB as CSV’s, import them into the new server, and use inspectdb to build models to match. From there you can gradually refactor your database going forwards to match the end schema that you want.

Interesting - so what I could do then, is import the data via CSV directly into the database, create models for the legacy data via inspectdb, and then write some kind of task to transform the data into the newer models.

As long as I avoid the access log table (which I don’t want anyway), none of the tables should be excessively large. So this could be feasible.

EDIT: Looking at the legacy tables (ignoring accesslogs), there is one table at ~170 megabytes, one at 72, one at 16, and every other table is 10 megabytes or less. So that’s not too bad.

What we do is export the source data as SQL.

I believe there’s an option in phpAdmin allowing you to specify whether you want it to include the create statements for the schema - that makes recreating the source database easier.

Once that’s done, we use inspectdb to create the models - but in a different app. We then use a custom router to direct all reads of those models to the import app. (This helps ensure that nothing gets accidentally changed in the original data.)

We then create a set of custom management commands to perform the data migrations. (We typically create a separate task for each destination table - it makes it easier to test and validate the individual steps.)