We have a lot of unit tests in our Django application and recently, we added a custom data migration to initially populate our database tables. Ever since doing so, our unit tests that involve constructing objects fail with duplicate primary key errors. I know that as part of Django’s normal testing setup, that the test data base is created and migrated. In our case, this includes a data migration which pre-populates data into our test database. Most of our models primary key fields are auto generated by Django. Within a unit test now with the preloaded test data, if we create an object, we get constraint (integrity) error that a primary key is duplicated. For example, we have one model that has 1 row preloaded from our data migration. We have a unit test that attempts to create an object in that table and it fails because Django tries to insert it with the same primary key as the object that was loaded from the data migration. How should we handle the creation of these test objects against tables where our data migration introduced data into our test database.
It seems like it is very normal to initially populate a database using a Django migration. Just not sure why Django, when creating test objects, appears to be starting with a sequence that causes inserts to conflict with existing data’s primary key or how we can handle this so we can insert objects in our test case without worrying the inserts will conflict with the existing data in regards to the primary keys.
This isn’t, strictly speaking, necessarily a Django issue.
In PostgreSQL, the auto-id assignment is handled by a sequence
object. That object is incremented whenever a row is inserted and the autoincrement field is null.
If you insert a row with that field assigned, the sequence object is not incremented. Your next insert without the id assigned is going to try to assign the id based upon the sequence - and may duplicate an entry that was inserted with an ID.
(Note, you can demonstrate this using SQL statements in psql - showing that this is not specifically a Django / test case issue.)
You’ve got a couple ways around this.
- You can define natural keys for the objects being “preloaded” such that you don’t include the
id
field in that data. (That’s what we do.) This allows Django to perform the inserts without assigning the id field - allowing PostgreSQL to keep the sequence object in sync.
- You can predefine a range for this preloaded data and initialize the sequence past that range. (e.g. Set the initial sequence value to 1000 - or whatever you need it to be)
- You can get the highest value from your data after it is loaded and alter the sequence.
Thanks for the info Ken,
I know the diff between Natural and Surrogate Keys, but wasn’t sure what you were suggesting for using natural keys in the preloaded data.
By specifying Natural Keys, do you mean that we need to change our models to not use the AutoField, or do you mean for the pre-loaded data, we specify the id values in the objects we are inserting via the migration script?
So for example, given this contrived model that relies on Django to create the primary key field using and AutoField.
class SomeModel(models.Model):
field1 = models.CharField()
Are you suggesting that we create our objects that are inserted during the migration script with the id field set as in,
SomeModel(id=1, field1=“Hello World”)
instead of,
SomeModel(field1=“Hello World”)
Actually, I was referring to Natural Keys in the sense of how Django uses that term in the context of the serialization process for the loaddata / dumpdata commands. Our models define natural_key
and get_by_natural_key
methods for those with data being preloaded via fixtures.
See Natural Keys