dumpdata and loaddata - issues and alternatives

Hi everyone,

I have a lot of trouble using dumpdata and loaddata, and I just keep banging my shins when I try to work around it.

Use cases are:

  • populating the database with realistic data for tests
  • data migrations, moving some data from one system to another
  • moving data fixtures from a branch into main
  • backup and restore (pg_dump is great for this)

I’d also like something like an import wizard that can handle renaming models and fields, setting default or derived values etc. Django-import-export is good (very good) in situations where we get the same spreadsheet every month and need to upload it, but it isn’t for json.

I’m trying to lay this out analytically but its been a bit of a chaotic search, with me trying things with loaddata, trying to implement natural-keys, trying to write my own scripts and so on:

  1. contenttypes and permissions need to use natural-keys; the pks set in a new blank database won’t necessarily align.

  2. difficulties with data from models with concrete inheritance. If model B inherits from model A, dumpdata will output instances of B as separate object from A. They are united in the data model by having the same pk, but they can’t just be created independently - model B instances need their associated model A data.

  3. difficulties with natural keys for some objects where they are very relational - e.g. we have at least one model which is unique by its relationships through a ManyToMany relationship and exists as a shortcut to that unique set. (If there are 12 feasible combinations of factors [(a, b), (1, 2, 3), (X, Y)] , this table has 12 rows and each row relates to a unique set of factors (1: a1X, 2: b1X etc.) Defining the natural _key function for all the models was OK, but for some of these types the queries to get_or_create() them had some very extended foreign key chains.)

I’ve tried the django-extensions DumpScript and RunScript (but it was a while ago) without success, and when I tried just now to refresh my memory it seems they don’t support Django 4.0 yet (‘can’t import ‘smart_text’ from django.utils.encoding’). Not sure if this was the reason but it does seem to export only app by app, which seems like I could still mess it up.

What are societies conventions on this? Is there a really good tool I should know about?

This doesn’t feel 100% clear to me, but I wrote some code for imports with django fixtures so perhaps that can help.

My data was xlsx files imports, from some old db system no one uses much anymore (dbase). What I did was write myself a parser in python that:

  1. Takes in the xlsx & parses into something intelligible for python (I used Pandas, which naturally lends itself to that sort of tasks). Panda is pretty useful to manage things that always suck migrating data, like date formats, booleans, etc.
  2. Then I created a few python classes that aim at creating the structure of a fixtures. Which really isn’t that complexe - you just need to specify the model, and fields keys. Then the list of fields & associated values is dumped as json into the fields dictionnary.

I’m not sure what you mean by contenttype & permissions. If you use a permission system, you’re not likely going to be able to manage it JUST with your fixtures. I think you’d need to write yourself a small parser that takes whatever format specific user rights on your source data and “translates” that into whatever you use in django manage that.

For keys, if you specify a “pk” key and value, django will use that as the key, e.g.:

{ “model”:…, ", “pk”: 99999, “field”:{…}}

For related models, that’s a bit of a pain and I generally prefer (especially if uplaoding to a remote server) using the database’s processes (e.g. pg_dump if postgres etc.).

However, you can set db_constraint=False on the ForeignKey fields of the relevent models and makemigrations/migrate, upload your data without having those constraints checked, then turn them back on by setting to True with make/migrate again.

=========

All that being said, if you do have complex operations for the mapping of your source data to your django app, write yourself a parser for it. That’s what I did, it’s a bit of a pain but once done it’s really extensible and powerful. If your needs are too specific you likely won’t find a tool that fits 100%.

1 Like

Thanks for your response. For tabular data we use django-import-export, but if its one table, its often very simple to tailor a script. For testing and data porting from system to system, we want a solution that can deal with the convoluted data.

Contenttypes and the auth app both create data in their tables after their first migration is applied - the model names for each app and the permissions on that model. That data (in our dataset) gets used in relations to a lot of other data, and the result is that if you naively ‘dumpdata’ from a developer database to a fixture and then try to load the data to a ‘blank’ database, you find that the row numbers (pks) are inconsistent between the blank database and the fixture, and so all the data that refers to these objects by primary key or id is invalid. The database rejects the dataset.

The natural-keys solution works for simple data where the right kind of uniqueness is common, but it isn’t easy to write a ‘natural_key()’ function for models that are mostly about relationships and you’re reaching across too many foreign_key connections. I’m guilty of creating a database in which a lot of data is only uniqely identified by the pk.

I believe it should be feasible, at least for our fairly small dataset, to load the json, sort it according to the model topology, and load it to the database with get_or_create(), remembering what the new object created is. But I seem to be really bad at decomposing that into a sensible, readable set of functions.