Providing initial Data fixtures vs datamigration

A colleque of mine gone through the work to provide some inital data for out app in our app.

What would be the right way to integrate this in our source code?
I was able to dump the data via dumpdata and to load the data via loaddata. I also researched into fixtures.
Previously I provided all data via datamigrations - should I use the fixture and load it into the app or should I use datamigrations?

In my opionion fixtures are somewhat more fragil and they overwrite all data that is already in the app, while datamigrations seem to be more robust, but I would have to write everything (270 records) by hand again.

My “short answer” to your question is that it really depends. We have situations where we’ll include some data as text files to be loaded via the psql command. (Not often, but it happens.)

We take the approach that there are two kinds of data in our systems - “code tables” (a.k.a. “reference tables”, “system tables”, or “key items”), and “user data”. The former is typically data not seen (or seen but not editable) by the user, the latter is what the user wants to see. (For example, we would call the ContentTypes table a system table, while the table containing blog posts is “user data”.)

I make the distinction because we use two different mechanisms for populating these tables. We use fixtures for all our code tables (just to avoid confusion, we do not load/migrate the ContentTypes table - that was just an example), and multiple methods for loading / migrating user data.

One facility that really helps us prevent our fixtures from becoming fragile is that we define natural keys for any table being loaded via fixtures. This prevents overwriting existing data and makes it easier to manually edit those fixtures when necessary.

If you find it appropriate to do so, you could also have your pg_dump (or whatever the analogous tool would be for the database you’re using) emit the output as SQL INSERT statements rather than as load table commands. You could then copy/paste those into whatever migration process you wish to use.

Bottom line - I spend a lot of time moving data around, and I’ve never found it practical to settle on just one method. There are always additional factors to consider that can make this a non-trivial decision.

Ken

1 Like

Thanks for your Insights Ken. I guess Fixtures with natural keys seems the way to go since it allows en export/import function in the future.

So I have to create a manager for the models and define the unique conditions for the models. Since I hadnt done before. Then I can dump the data via dumpdate --natural-foreign --natural-keys and load it via loaddata

Yep. Here’s a greatly trimmed down version of a “Firm” model we use:

class FirmManager(models.Manager):
    def get_by_natural_key(self, firm_code):
        return self.get(code=firm_code)

class Firm(StandardData, models.Model):
    name = models.CharField('Firm Name', max_length=200, unique=True)
    code = models.CharField('Firm Code', max_length=20, unique=True)

    objects = FirmManager()

    def get_absolute_url(self):
        return reverse('edit_firm', kwargs={'pk': self.pk})

    def __str__(self):
        return self.name

    def natural_key(self):
        return (self.code, )
1 Like

Is there a way to debug this? I got TypeError: get_by_natural_key() takes 2 positional arguments but 18 were given

So I know its somewhere at one of the Managers but I don’t have a clue what went wrong, other that too many arguments are handed over.


I forgot the , and black optimized return (self.name) to return self.name; so Instead of handing over a tupe it hands over a string

Best first guess in the absence of seeing any code is that you’re passing a string at a point where a tuple is expected.
(Some context would be helpful here. What command are you running? What’s the data? The portion of the traceback above that particular error message is going to be useful in identifying the source of the problem.)

1 Like

Your first guess was right! Thanks!