Migrating backwards takes orders of magnitude longer than migrating forwards

It seems that migrating backwards seems to be extremely slow compared to migrating forwards and I haven’t been able to find a ticket or previous discussion that might explain why.

The actual changes performed on the database complete quickly. I’m able to CTRL+C after a relative short period of time and the migration is properly undone. I believe the problem is around checking the model state post-migration.

First, the versions.

Python 3.10.6
Django 4.1.8

The project I’m working on has many dozens of apps, high hundreds of models, and thousands (~6000) of migrations.

The specific migration is the creation of a new model, in a new app, that has no dependencies on any other models or apps in either direction. The table has 4 records in it when attempting to migrate backwards. Migrating forwards takes approximately 20 seconds. Migrating backwards takes 10 minutes! A similar amount of time that takes to run the entire migration suite on a freshly created database.

Here is a traceback I captured after a KeyboardInterrupt - noting that the migration within the database has already completed, the table has been removed, and the django_migrations table appropriately updated.

^CTraceback (most recent call last):
  File "/Users/josh/dev/kraken/src/./manage.py", line 16, in <module>
    management.execute_from_command_line(sys.argv)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
    utility.execute()
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/__init__.py", line 440, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/base.py", line 402, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/josh/dev/kraken/src/octoenergy/interfaces/common/management_commands.py", line 83, in execute
    return super().execute(*args, **options)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/base.py", line 448, in execute
    output = self.handle(*args, **options)
  File "/Users/josh/dev/kraken/src/octoenergy/interfaces/systemjobs/management/commands/migrate.py", line 31, in handle
    super().handle(*args, **options)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/base.py", line 96, in wrapped
    res = handle_func(*args, **kwargs)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/core/management/commands/migrate.py", line 349, in handle
    post_migrate_state = executor.migrate(
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/executor.py", line 141, in migrate
    state = self._migrate_all_backwards(plan, full_plan, fake=fake)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/executor.py", line 231, in _migrate_all_backwards
    migration.mutate_state(state, preserve=False)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/migration.py", line 89, in mutate_state
    operation.state_forwards(self.app_label, new_state)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/operations/fields.py", line 219, in state_forwards
    state.alter_field(
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/state.py", line 289, in alter_field
    self.reload_model(*model_key, delay=delay)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/state.py", line 400, in reload_model
    self._reload(related_models)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/state.py", line 435, in _reload
    self.apps.render_multiple(states_to_be_rendered)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/state.py", line 659, in render_multiple
    with self.bulk_update():
  File "/Users/josh/.pyenv/versions/3.10.6/lib/python3.10/contextlib.py", line 142, in __exit__
    next(self.gen)
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/migrations/state.py", line 649, in bulk_update
    self.clear_cache()
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/apps/registry.py", line 383, in clear_cache
    for model in app_config.get_models(include_auto_created=True):
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/apps/config.py", line 258, in get_models
    if model._meta.swapped and not include_swapped:
  File "/Users/josh/.pyenv/versions/kraken/lib/python3.10/site-packages/django/db/models/options.py", line 412, in swapped
    if self.swappable:
KeyboardInterrupt

What jumps out at me here is the clear_cache and the re-rendering of states. I haven’t (yet) dug much more into this to see if the migrator is reloading unaffected models but, given the run time, I highly suspect it.

Is this a problem other folks are seeing on large projects? Any insights as to if this is a known issue, if anything can be done about it, or where I might go about digging in?

1 Like

:wave: Josh, nice yo see you around!

The usual cause of slowness when applying migration is model rendering. Nowadays most Operation.state_forwards implementations avoid model rendering entirely by operating solely from ModelState instance which has the side effect of making the creation of project state used to apply migration forward relatively fast as until changes need to be effectively applied no dynamic model creation is performed.

Since migrations operation don’t implement a state_backwards method a lot more state alterations have to take place to emulate them and this encouraged us to force the rendering of models prior to attempting any change. This has the unfortunate side effect of making most of the state_forwards extremely costly when dealing with a large model graphs as state mutation forces the reloading of tons of model clases. I wouldn’t be surprised if in your case removing the forced model rendering made things faster.

In order to make migrations truly faster we know that we must make schema editors treat with model state instead of rendered models and implementing state_backwards would certainly help to a certain extent. Until then we’ll have to keep finding ways to defer model rendering as much as possible while making sure our invalidation logic on state alterations accounts for all the layers of caching that rendered models have.

3 Likes

Simon, hey it’s been a while!

Thanks for that comprehensive answer you’ve saved me at least a few hours there, really appreciate it. It’ll take me some time to digest it all as the migration framework internals are new to me.

The slow down I’m observing occurs post-database change but I think the rendering you’ve pointed out occurs prior to the migration being un-applied. I think the post migration handling is where the slow down occurs. That may be a consequence of the prior rendering that you pointed out, but I’ll need to debug and see exactly what’s happening. I’ll try to find some time for that over the weekend.

Thanks Simon :bowing_man:

Hi Josh, I have nothing to contribute to this particular concern but I want to point you in the direction of a talk about how migrations work under the hood. If you’re diving into migrations this weekend, it might be helpful to watch this first for some high level guidance on how things work.

Hi Josh,

Yeah I noticed slow reverse migrations as well though it’s never really bothered me. I have a CI job that migrates then reverse migrations to make sure any new migrations are reversible.

fwiw: Currently at ~100 models, just recently squashed so only a handful of migrations, migrate runs in 1m41s while reverse migrate is 3m39s. :man_shrugging:

Sounds like it could be worth making a ticket for.


David

Simon,

I’ve investigated this further and found this block to take most of processing time:

This block runs after the DDL queries are applied and appears to be used to forward onto things like signals?

For my small project unapplying the most recent migration on the main app: total command time ~20s, profiling this block alone showed it took ~16s.

This seems to align with Josh’s stack trace from above.

What I find a mystery is that before the DDL queries are applied project state is updated in a similar manner but is lightning quick compared to this. I just can’t seem to spot what the difference is and why so much slower? :thinking:

I was hoping you could shed some light on the issue because blame shows you as the author :sweat_smile:

David

1 Like

Also relevant, there’s another person reporting the same block to cause CPU spikes to which Simon replied back in 2017: Fixed #24100 -- Made the migration signals dispatch its plan and apps. · django/django@f937c9e · GitHub