Add unique key for django migration tracking model

Django has atomic migration support, with applying the migration and recording it in the same transaction. Howewer, if by accident the same migration is run paralell, it will be applied twice. We could prevent that by adding an unique constraint on (app, name) on Migration model.

Not all backends do support atomic migrations (MySQL, and Oracle don’t) and the migration record are attempted to be inserted once all operations are performed so adding such unique constraint wouldn’t constitute a good way to prevent concurrent migrate from racing to corrupting to your database.

It’s also not clear to me how changes could effectively be applied twice because even with atomic DDL support most of the operation performed by the framework (except for data migration ones) are conflictual. In other words, even if you have two transactions that are isolated from each other only one of them is going to succeed in calling CREATE TABLE foo and thus only one of them will reach the point where the migration record is attempted to be inserted.

I believe you should avoid to run the migrate command in parallel in the first place.

You got the point, data migrations. Imagine a finance application where a data migration runs accidentally twice. Now that could happen. By placing a unique key, and without backends supporting it, at least an error will be thrown when recording. And, with backends supporting it, it will just improve consistency. I don’t know a use case when this constraint would break anything. We expect consistency from the db, that’s why we use transactions. In case of a data migration, I expect the same, I would rely on the database to make the migration happen once only. Also, with data migrations, more backends would support this.

Could you please explain your use case a little bit more, so we can understand why (data) migrations would (potentially) be run in parallel?

In my experience, when running migrations for a database, even in systems that have lots of nodes, only one of the nodes is chosen/selected as “leader” and only that node is responsible for running the migrate command, when necessary.

What’s your setup and why would more than one process/node/etc run the migrate command?

Alternatively, perhaps your data migration code could, somehow, detect whether another (relevant) data migration is happening at the same time? or perhaps, is there a way to build your models with enough constraints that data migrations that are run more than once would generate a db state that would make any of those constraints fail?

Perhaps I cannot tell a use case here, that does not mean it cannot happen. Even human errors can happen, in the world of home office, two operators may run the migration at the same time. Of course, coordination is needed, we should avoid such situations as well. But, from the other side: do you ever want to see migrations applied multiple times? I suspect the answer is no. If the intention of Django is to apply migrations once and exactly once, then, we should have an unique constraint there. I think django should ensure at its best that migrations are not run multiple times.

So, for data migrations that would assure consistency. And, for DDL migrations on backends which dont support that, DDL operations will fail too. But, I suspect in no case will the unique constraint do any harm.

I understand that critical data migrations can use similar sentinel records, but hey, that is why django records them, not?

Also, can it do any harm to have such a unique constraint?

In my understanding, it would just improve Django’s quality and reliability.

Earlier a django ticket was opened: #30490 (migrations unique_index on (app, name).) – Django. That was closed with reasons not being valid with current Django. One reason was that the migration and the corresponding record was not handled in the same transaction. The relevant code now indeed records the migration in the same transaction. And also, a possible use case is mentioned there. Please consider the idea to have a unique key on the applied migrations table.

Sure, but as @charettes wrote:

…plus mgirations can be marked as non-atomic even on PostgreSQL etc.

I agree with the general consensus here that a unique constraint won’t stop concurrent migration runs. Given it’s done at the end of the process, it’s also the wrong time to perform any locking, as concurrent attempts would first do all the work and then one would need to roll it back, which is not great for database performance.

If you cannot structure your deploy process such that only one server runs migrate at once, consider trying some kind of inter-process locking between them in a custom migrate command. It’s worth experimenting in your own projects or a third-party package before suggesting anything to Django.

Some arguments against the unique key sounds for me as if we should avoid using unique keys in an relational database at all, and use some out-of database solution for ensuring uniqueness. Django itself checks for already applied migrations, thus it seems it does not want to apply a migration twice. Then, why would a unique key hurt? Do you ever want to see the same records in the migration histor table?

Again: I understand that we should take care of migrations not run parallel, but if we could ensure it at a lower level, why not?

BTW, if some would agree on this, can use the python module GitHub - rkojedzinszky/django-atomic-migrations: Atomic migration steps for Django.

Some arguments against the unique key sounds for me as if we should avoid using unique keys in an relational database at all, and use some out-of database solution for ensuring uniqueness. Django itself checks for already applied migrations, thus it seems it does not want to apply a migration twice. Then, why would a unique key hurt?

I haven’t seen anyone here argue against the correctness of defining a unique constraint on django_migrations(app, name), I think that mostly everyone would agree that it would be preferable over the current schema if we could start over.

Rolling out a schema change to django_migrations at this point is not trivial though as it’s the table that tracks applied migrations itself, it’s backed by an unmanaged model, and the lack of unique constraint might have resulted in duplicate rows slipping through overt the years. Assuming we want to not only support such a unique key for new installs we’d have to provide a script to add the constraint like you did in your proposed package but also provide some tools to deal with duplicates.

Given moving forward is not trivial and will incur a maintenance burden we’re asking for a use case that outweighs the cost of merging and supporting this change. From what I can see in this thread so far I believe that everyone agrees that preventing migrations from being applied concurrently does not meet this criteria.

Thanks for the sound explanation. I am happy now that the point we see the issue is the same. And now I understand that we need to handle existing setups where unfortunately duplicates may be present. That unfortunately can mean data migrations run twice. Depending on the migration step, there may be wrong data. (e.g. adding a bonus to accounts, etc.). Howewer, Django is a great project, I really like its ORM, and I still suspect it would improve its stability, its reputation to be more correct with migrations.

So, there is an option that we add the unique key to just new installations, and provide a command to add it to existing ones. Then, it is up to the operator to run that command or not. If they dont even get noticed aboit the feature, they will use Django as if nothing happened. Other users may see the benefits of having the key.

Or, with a major release, this could be a “breaking” change, e.g. running django migrate could handle this missing unique key explicitly, before any migrations being run. It could add the unique key, or at least check for its existence, and also check for duplicated records in history table. As there are other checks in Django ecosystem, this also could be an additional one. That could warn the user that migration has been applied multiple times, which I suspect should be handled somehow.

I’m going to throw a cat amongst the pigeons here…

… but it looks as though if we add a unique constraint to the Migration model then it won’t affect existing installations, only new ones. I’d just tested it out on a new project + existing one by making some updates & migrating and the existing project didn’t make any updates to the django_migrations schema.

Just taking another look through the code that creates the table; it manually creates the table using a schema editor only if the table doesn’t already exist.

Right you are @shangxiao .

I think that still doesn’t address the problem where a non-atomic migration runs twice concurrently. Both processes would start and process the data, one would insert into django_migratinos successfully, and the second would fail on the constraint—with no ability to roll back its duplicated changes.

I’d rather have duplicate records in django_migrations in such cases, as that could at least aid after-the-fact debugging.

I’d rather have duplicate records in django_migrations in such cases, as that could at least aid after-the-fact debugging.

Yep you have a point there.

So, there can be data migrations, which I suspect can be transaction safe, so can be atomic, and thus, the unique key would prevent them running twice. The non-atomic migrations are usually DDL migrations with backends not supporting it, howewer as an earlier response, DDL operations usually cannot run twice (e.g. add a column twice, or rename one, etc.). So I still think it would be beneficial.

And, in the case we have no unique key there and with non-atomic migrations:
it can still happen that migrations run twice, and after the migration the sql backend goes away, thus django has no chance to record the migration. Then, we will end up with no records at all with the migration run twice. So, leaving this as-is still dont help anything, I suspect. Nor the unique key does change its behavior.

Ok, so I need guarantee that a migration is never run twice, and seems that Django itself does not make a guarantee of it. Then, shall I use a dedicated table with the perfect unique key, probably be a mirror of django_migrations? I dont see the point of mirroring it if it would contain the very same records…

Ok, I see I cannot push this more. I hope, sometimes it will get into Django. As it happened to recording and the migration itself run in the same transaction (where atomic=True). In a few years ago, I proposed that too, but then the idea was rejected too (#30490 (migrations unique_index on (app, name).) – Django). Then I noticed that for some other reasons it just got modified to work the way I proposed. One day Django will be truly reliable, I hope! :slight_smile: