A migrations clean slate

A long time ago in a galaxy far far away I started a nice clean greenfield projects. Alas, it is 6 years later, and the fields, once green, are now brown. My project now has 459 project-level migrations, which I am sure that some of you would still think of as small-time, but it’s at the point where the long migration trees are hard for me to wrangle, and running the migrations from scratch eats materially into CI/testing time. We are overdue for a cleanup.

I recall that this discussion has popped up at least a couple of times on django-developers from the perspective of how to fix the UX for cleaning up migrations, which some have called “less than ideal” (and I am inclined to agree). I do not recall these discussions getting anywhere. However now I am having these issues first hand as a Django user, so i am looking at it as more of a support query. My apologies if this has been brought up elsewhere here. I assumed that it had and did some digging beforehand, but it’s of course a bit hard to search for.

Some additional pieces of info:

  • Our project-level application migration dependency trees are highly intermingled (i.e. there are lot of cross-app table relationships).
  • Our development team is only two people. The solution does not have to be smooth enough to just work with a manage.py migrate. That is to say that myself and the other developer will happily put some manual work into getting things working in our respective developer environments.
  • It is acceptable to blow away all development databases, though I don’t see why that’d be necessary as replacement migrations could be faked. It’s obviously not acceptable to blow away the prod database.
  • There are no data migrations that need to survive (i.e. they are essentially noops when run on a clean database).

I am aware of migration squashing, though I do not have much experience with it. I am aware that it has shortcomings, and is still probably involves a fair bit of work in my situation, so if anything I am looking for assurance that it is (or is not) the right thing to do.

I naively thought that I could just delete all of my migrations and run a manage.py makemigrations to have Django rebuild them from scratch based on the current state of my model definitions. Last time I tried that—which was not recently but I doubt that anything has changed because this is a fundamental issue—circular dependencies in my model relationships made that impossible. Attempting to selectively comment out relationship fields in my model definitions in order to make intermediary migrations would cause system checks to fail which’d block makemigrations, etc. All solvable problems, but all requiring effort, so I don’t want to go down this path if someone has a a better/easier idea.

If you were in my position, what would you do?

I’d start by looking at the fundamentals

This to me jumps out as an issue that should be addressed. My reaction to situations like this is that it’s very much a bad “code smell”. Whatever the cause, I’d move that to the top of the priority list.

How it’s resolved is going to depend upon whether these circular references exist at the database level - resolved by normalizing the data, or at the Python model / module level - which can be resolved by some refactoring efforts.

The dependencies exist at the database level. I haven’t attempted this in a while, so I can’t recall the specifics. Though at the time I remember feeling that the data modelling was justified. I could very well feel different now though. I’ll have another go at interrogating the problematic relationships. Thanks.

If you’re not using --keepdb for local test runs (--reuse-db for pytest-django), that can help immediately.

I would try this, even if it’s difficult. Perhaps --skip-checks could help rebuild the migrations?

Also on board with Ken’s suggestion to try remove your circular dependencies, they probably don’t make things easy. But refactoring them can be hard.

I have an idea I’d like to try, where you store a “baseline” of your database alongside to migrations: a SQL dump after running migrate, including the django_migrations table. Test runs could then load this first, before running any new migrations, which would be very fast. Old migrations wouldn’t be a worry then. I have seen something similar done once, on a custom migrations framework, and it worked well. If you have a try at this idea, I’d love to see the results.

Sorry to hijack the topic, I can set up a different topic if this is unrelated.

What is another possible way to structure models so they don’t have circular dependencies if you wanted to be able to query data in a way where circular dependencies typically come about?

I currently have a similar issue where I go through steps in order to comment out, migrate, uncomment, migrate, provide values for required fields and migrate. Sometimes it gets tricky linking models in the Admin panel before continuing migrations.

As an example I have a Product model with a User ForeignKey (the creator of the product) and a Profile model with a ManyToManyField to the Product (purchased by a User but not created by them) and a OneToOneField from the Profile to the User.

If I migrate the Product models first, I have to comment out the creator ForeignKey in the Product or if I migrate the User models first, I have to comment out a ManyToManyField in the Profile to the Product. Is there another way?

IMO, this is probably worth a separate topic onto itself.

1 Like