dumpdata and loaddata - issues and alternatives

Hi everyone,

I have a lot of trouble using dumpdata and loaddata, and I just keep banging my shins when I try to work around it.

Use cases are:

  • populating the database with realistic data for tests
  • data migrations, moving some data from one system to another
  • moving data fixtures from a branch into main
  • backup and restore (pg_dump is great for this)

I’d also like something like an import wizard that can handle renaming models and fields, setting default or derived values etc. Django-import-export is good (very good) in situations where we get the same spreadsheet every month and need to upload it, but it isn’t for json.

I’m trying to lay this out analytically but its been a bit of a chaotic search, with me trying things with loaddata, trying to implement natural-keys, trying to write my own scripts and so on:

  1. contenttypes and permissions need to use natural-keys; the pks set in a new blank database won’t necessarily align.

  2. difficulties with data from models with concrete inheritance. If model B inherits from model A, dumpdata will output instances of B as separate object from A. They are united in the data model by having the same pk, but they can’t just be created independently - model B instances need their associated model A data.

  3. difficulties with natural keys for some objects where they are very relational - e.g. we have at least one model which is unique by its relationships through a ManyToMany relationship and exists as a shortcut to that unique set. (If there are 12 feasible combinations of factors [(a, b), (1, 2, 3), (X, Y)] , this table has 12 rows and each row relates to a unique set of factors (1: a1X, 2: b1X etc.) Defining the natural _key function for all the models was OK, but for some of these types the queries to get_or_create() them had some very extended foreign key chains.)

I’ve tried the django-extensions DumpScript and RunScript (but it was a while ago) without success, and when I tried just now to refresh my memory it seems they don’t support Django 4.0 yet (‘can’t import ‘smart_text’ from django.utils.encoding’). Not sure if this was the reason but it does seem to export only app by app, which seems like I could still mess it up.

What are societies conventions on this? Is there a really good tool I should know about?

This doesn’t feel 100% clear to me, but I wrote some code for imports with django fixtures so perhaps that can help.

My data was xlsx files imports, from some old db system no one uses much anymore (dbase). What I did was write myself a parser in python that:

  1. Takes in the xlsx & parses into something intelligible for python (I used Pandas, which naturally lends itself to that sort of tasks). Panda is pretty useful to manage things that always suck migrating data, like date formats, booleans, etc.
  2. Then I created a few python classes that aim at creating the structure of a fixtures. Which really isn’t that complexe - you just need to specify the model, and fields keys. Then the list of fields & associated values is dumped as json into the fields dictionnary.

I’m not sure what you mean by contenttype & permissions. If you use a permission system, you’re not likely going to be able to manage it JUST with your fixtures. I think you’d need to write yourself a small parser that takes whatever format specific user rights on your source data and “translates” that into whatever you use in django manage that.

For keys, if you specify a “pk” key and value, django will use that as the key, e.g.:

{ “model”:…, ", “pk”: 99999, “field”:{…}}

For related models, that’s a bit of a pain and I generally prefer (especially if uplaoding to a remote server) using the database’s processes (e.g. pg_dump if postgres etc.).

However, you can set db_constraint=False on the ForeignKey fields of the relevent models and makemigrations/migrate, upload your data without having those constraints checked, then turn them back on by setting to True with make/migrate again.

=========

All that being said, if you do have complex operations for the mapping of your source data to your django app, write yourself a parser for it. That’s what I did, it’s a bit of a pain but once done it’s really extensible and powerful. If your needs are too specific you likely won’t find a tool that fits 100%.