Very slow migrations with large numbers of tables

I’m working on a project that has a very large number of tables (thousands). Migrations take a long time to run, even if they only have a few dozen operations. I added some logging in the Django code that prints out each migration operation and how long it takes, and I’m finding that operations will take 3-10 seconds each. This is for local Sqlite on a reasonably fast, modern developer laptop.

I’m now facing a situation where I need to change common fields on almost all tables, and it would take many hours for this to run. I’m looking at some janky solutions to try to work around the migation system, like using SQL to make the changes directly and then manipulating the django_migrations table to fake apply the migration. This does not feel right.

Is there anything I can do to fix the slowness of migrations? I’m willing to locally patch the Django code if need be, as I recognize my situation is probably not the common one.

1 Like

Hello @JoelFeiner!

It’s kind of hard to provide you support without more details about where the slowness you are experiencing is but I suspect you are running into problems because of how the migration framework still requires to materialize model states into actual Python models.Model sub-classes to perform changes.

This problem is even worst on SQLite as it has limited support for table alterations which requires the framework to emulate some changes by rendering another model, create a new table, backfill it (using INSERT INTO new_table FROM SELECT * old_table), drop the old table, and rename the new one.

In order to assist you we’d need to know

  1. The version of Django and SQLite you are using
  2. The kind of schema changes you are attempting to perform
1 Like

Certainly. I’m using Django 4.2.7. The system sqlite3 installed is 3.42. I’m not sure what version Python/Django uses as there’s no clear pip dependency for that, but I assume it is similar.

The schema changes in this case are removing three fields per table, two of which are foreign keys, and adding a new field for each table, which is also a foreign key.

The removals are taking 3-7 seconds each and the additions are taking about the same, maybe a little faster.

Additional info: I used sqlmigrate to generate the SQL and then ran that manually. In one app where these types of changes are being applied, it took 3 seconds to add all the fields and 28 seconds to remove all the fields. There are 9 tables, so 9 adds and 27 removes. By comparison, the migrations took 43 seconds to add all the fields and 3m40s to remove all the fields, almost 10 times slower.

1 Like

Joel used an example that was just a few minutes, but we have enough tables that we have around 24 hours of migrations that need to be generated. Being able to cut that to 1/10th of the time would be a massive life improvement for us.

@charettes

I did some more investigating. I discovered that in ProjectState.reload_model(), it gets a list of related models. This list is generated recursively. In the database I’m using, there is (1) a large number of tables, (2) a situation where all tables are related to each other recursively. The latter happens because each table has a foreign key for a deleted_by user linked to a user table. This means that for any table T, it relates to this user table, which in turn relates to every other table that has this deleted_by field via reverse foreign key relationships, which is the vast majority of tables.

Adding some timing statements shows that almost all of the time spent in applying an operation is reloading state for all of these tables after every operation. I made a slapdash modification to get_related_models_recursive() in state.py, and a few other necessary places, to only generate first-degree relationships. As expected, this considerably speeds up the operations (0.2-0.9 seconds per operation instead of 4-8 seconds per operation). It does not seem to break migrations for the subset of operations I tested, although I do not necessarily expect this to be generally true.

Why does the ORM recursively discover relationships, instead of just stopping at the first degree of relationships? Would that not ensure that the models that matter (the one just modified and ones that might depend on it) are updated without updating the world?

Is there a way to improve this without patching the Django code? If patching is acceptable, would removing the recursion be acceptable for a PR? Should this become a ticket?

Without getting into all the details model type instances (ModelBase instances AKA Model subclasses) cache forward and backward relationships in their _meta object (look at all the properties in there) in order to make introspection and ORM query resolving faster. This is a cost worth paying at project setup time as it happens once and then it remains immutable for the remaining of the Python process but it performs very poorly when model classes have to be generated over and over again as their are mutated between operations.

There have been years of investment in trying to build a smarter invalidation logic for dynamically built fake model classes to avoid doing unnecessary work while ensuring that the graph doesn’t become corrupt (e.g. reverse relationship still pointing at an old model reference). This problem is embodies perfectly why cache invalidation is said to be hard as it’s not only about making things faster but also about making sure the resulting changes do not corrupt relationships in non-trivial ways.

Both Markus Holtermann and myself spent a lot of time fiddling in this area came to the conclusion that the likely best way forward is to adapt the schema editor to operate from ModelState (which are lean tuples of field options) instead of BaseModel instances that require constant rending but that would require a large deprecation phase that requires careful planing to avoid leaving third party backends and other migration related third party apps in the dust.

Assuming we were able to achieve this goal BaseModel instances would only need to be rendered for RunPython operations which could be done on-demand when retrieving objects from apps.get_model.

I’m not saying this to discourage you from trying to make things faster, I know @shangxiao ventured in this area not too long ago following another migrations are slow thread. I just want to warn you that a lot of cycles were spent in this area already which ended up as promising speedups followed by regressions in complex migration graphs for a large projects in the wild that required reverts. Over the years this has reinforced my belief that moving away from creating disposable Python classes for the sole purpose of introspection was a saner approach.

1 Like

@charettes Thank you for the thoughtful, detailed response. I had run across at least one of those before, but had hoped there had been more recent traction. I understand it is a difficult problem.

In the short term, I have implemented the following change in my Django, which allows the migrations to run much more quickly and does not seem to cause any problems. I’m curious if you see any glaring issues with this patch.

I would eventually like to help solve this problem for Django, but I need to immerse myself in past efforts first. I will see about plugging in on one of the existing PRs, unless you think it would be better to start fresh, or if you would like me not to do any further work on this for now.

diff --git a/django/db/migrations/migration.py b/django/db/migrations/migration.py
index 3c7713c5ea..e0bc7cc619 100644
--- a/django/db/migrations/migration.py
+++ b/django/db/migrations/migration.py
@@ -101,6 +101,8 @@ class Migration:
         Migrations.
         """
         for operation in self.operations:
+            if isinstance(operation, RunPython):
+                project_state.reload_models(models=None)
             # If this operation cannot be represented as SQL, place a comment
             # there instead
             if collect_sql:
diff --git a/django/db/migrations/state.py b/django/db/migrations/state.py
index ae55967383..780096dca0 100644
--- a/django/db/migrations/state.py
+++ b/django/db/migrations/state.py
@@ -396,16 +396,20 @@ class ProjectState:
 
     def reload_model(self, app_label, model_name, delay=False):
         if "apps" in self.__dict__:  # hasattr would cache the property
-            related_models = self._find_reload_model(app_label, model_name, delay)
-            self._reload(related_models)
+            # related_models = self._find_reload_model(app_label, model_name, delay)
+            # self._reload(related_models)
+            self._reload(set([(app_label, model_name)]))
 
     def reload_models(self, models, delay=True):
         if "apps" in self.__dict__:  # hasattr would cache the property
-            related_models = set()
-            for app_label, model_name in models:
-                related_models.update(
-                    self._find_reload_model(app_label, model_name, delay)
-                )
+            if models is None:
+                related_models = set(self.models.keys())
+            else:
+                related_models = set()
+                for app_label, model_name in models:
+                    related_models.update(
+                        self._find_reload_model(app_label, model_name, delay)
+                    )
             self._reload(related_models)
 
     def _reload(self, related_models):

I would eventually like to help solve this problem for Django, but I need to immerse myself in past efforts first. I will see about plugging in on one of the existing PRs, unless you think it would be better to start fresh, or if you would like me not to do any further work on this for now.

It’s worth giving the suite a run with your proposed changes. My first impression is that the reload_model changes will break any schema editor method that uses backward related cache to do things like rename objects or re-create foreign key constraints. For example, say that you have

  1. A series of migrations being applied where __fake__.Author and __fake__.Book are rendered. Both objects in memory have pointers to each other.
  2. Now you proceed with a migration that alters Author to Author' and result in calling reload_model for it. At this point Book still points at Author in memory and not Author' so if you have a follow up migration that introspect Book for reverse relationships it won’t point at Author' like it should.

The only really way to tell is to throw the test suite at it though which should be pretty straight forward. If the suite passes it’s a good sign for sure but we have been proven wrong plenty in the past and migration graph corruption bugs are very hard to diagnose as they basically require the reporters to provide us access to a subset of their project to reproduce.

I believe that it might be better to start fresh but given the efforts have pretty much stalled on this front and slow migrations remains a source of frustration for many Django users it’s definitely worth exploring alternative avenues.