Importing and transforming a MySQL database

I’m working on a Django app to replace a legacy app built in PHP and MySQL (MariaDB, to be more precise). One of the biggest concerns I have is how to import the data from the old app. Among the concerns:

  • The new app will use Postgres, not MySQL
  • The new app’s database schema and models are completely different from the legacy app’s schema, so I will need to transform the data as I import
  • The new app’s database might not live on the same server as the legacy app’s database

I currently only have visibility into the legacy database via PHPmyAdmin. I can export from here, but I don’t have direct access to the server that it lives on. At least, not yet. Available export formats are MediaWiki table, Texy, JSON, open document, CSV, Word, CodeGen, Yaml, XML, latex, PDF, PHP array and SQL.

So I need to export the database somehow and import it into the new app. It will need to transform the data to fit the new schema. What are my options? Are there any tools available that will assist with this?

Right now I’ve written an admin tool to upload exported JSON data from a few of the tables, but this seems like a cumbersome and not scalable way of handling the imports.

Have you see the Django inspectdb command? It can generate model code for an existing database, at least as a first pass. This could be an alternative technique.

https://docs.djangoproject.com/en/4.0/howto/legacy-databases/

You might be able to export your MySQL DB as CSV’s, import them into the new server, and use inspectdb to build models to match. From there you can gradually refactor your database going forwards to match the end schema that you want.

Interesting - so what I could do then, is import the data via CSV directly into the database, create models for the legacy data via inspectdb, and then write some kind of task to transform the data into the newer models.

As long as I avoid the access log table (which I don’t want anyway), none of the tables should be excessively large. So this could be feasible.

EDIT: Looking at the legacy tables (ignoring accesslogs), there is one table at ~170 megabytes, one at 72, one at 16, and every other table is 10 megabytes or less. So that’s not too bad.

What we do is export the source data as SQL.

I believe there’s an option in phpAdmin allowing you to specify whether you want it to include the create statements for the schema - that makes recreating the source database easier.

Once that’s done, we use inspectdb to create the models - but in a different app. We then use a custom router to direct all reads of those models to the import app. (This helps ensure that nothing gets accidentally changed in the original data.)

We then create a set of custom management commands to perform the data migrations. (We typically create a separate task for each destination table - it makes it easier to test and validate the individual steps.)

@KenWhitesell this is useful because I’m trying to do the same thing. And if I should delete this and put it in a new topic, let me know. This just seemed appropriate still to this open thread, and Google lead me to this topic too, so it may help someone in the future.

Here’s what I’ve done so far:

  1. Already have an existing Django project, “newproject”, on my local machine
  2. Created another separate Django project, “oldtonew”, on my local machine
  3. From the old PHP/mySQL application, I downloaded a full backup of the mySQL database to use to bring some data into “newproject”
  4. Filled out the “Databases” section of settings.py in “oldtonew” to point to this backed up mySQL database:
DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': 'old_mysql_db_name', 
        'USER': '<username here>',
        'PASSWORD': '<password here>',
        'HOST': 'localhost',
        'PORT': '3306',
    }
}

And added this to the top of settings.py in “oldtonew”:

import pymysql
pymysql.install_as_MySQLdb()
  1. As a first test, ran the following code in Terminal (in my new venv for the “oldtonew” project), because I only wanted to try importing the “clients” table instead of all the others:
    python manage.py inspectdb clients > models.py

Now, that that is done, what’s next?
Do I (in Terminal) for the “oldtonew” project run: python manage.py makemigrations, and migrate the database?
I tried that and it said “No changes detected”.

Or another way to ask, how do I then get the data from the local mysql backup to the “newproject” project?

I see there is a Django dumpdata command, but that seems like it is used on a Django project, not the “external” mysql database.

The dumpdata command will work with any database defined to Django.

If you have multiple databases defined, the --database parameter allows you to specify which database to dump.

This is only going to be useful if you’re using exactly the same schema in your new database as the old.

If they’re not, and the differences between the two schema are minor, you may be able to leverage the django-import-export package to handle the migration. However, if there’s a lot of data transformation needing to occur, then you’ll probably be better off writing custom scripts to perform those conversions.

The makemigrations command creates a specification of changes to be made to a database when the models are changed. (For example, when you add a field to a model, the database table needs to have the corresponding column added to that table.) It has nothing to do with migrating data from one source to another.

1 Like