GSoC 2021 Migration Project

Thank You @carltongibson for the most needed resource. I checked out @MarkusH’s talk from DjangoConEU https://www.youtube.com/watch?v=u6cVvbuUzlk . And it was an awesome overview of the Migrations framework of Django. After checking the talk, I dug into the code more precisely. And noted the following key points.

  1. The main migrate phase starts from migrate() function(using executor object) which takes pre_migration_state as an argument and returns post_migration_state after applying all the migrations
  2. In migrate function, it checks that the migrations are to be applied in forward direction or in backward direction and works accordingly.
  3. In case all the migrations are in forward direction then it called _migrate_all_forwards() that migrates them all in the forward direction.
  4. In _migrate_all_forwards() it calls apply_migration on each migration in plan which belongs to full_plan as well
  5. In apply_migration() the apply() function is called on migration in which state and an instance of SchemaEditor is passed
  6. in Migration.apply() function the old_state of project is stored and the latest state after applying the migration is generated using state_forwards in this state, the state.apps contains all the fake models which would then be used by SchemaEditor with database_forwards() and the migration would be applied in the database.
  7. The main problem is in the 6th step that is in database_forwards in which Django uses SchemaEditor to apply the migration in Database and for the SchemaEditor uses __fake__ models. So for that we have to call reload_model() in state_forwards which internally calls render_multiple() and render() which delays the process.

Key points I noted from https://www.youtube.com/watch?v=u6cVvbuUzlk :

  1. Executor is the brain of applying and unapplying migration
  2. Most cruicial part of migration phase is rendering models
  3. After rendering the executor is going to call the apply method that mutates state operation by operation and database as well operation by operation(as mentioned in point 6 above)
  4. ProjectState knows all ModelStates at a given time
  5. Because the Schema Editor only works with model classes the model states needs to be converted into them and that’s called model rendering.
  6. Why schema editor doesn’t work with model States
    1. The Schema Editor is the part of the database backend and doesn’t know about the internal database migrations framework making it work with model states would mean opening up some of the internal APIs which is the ProjectStates and ModelStates which would not necessarily be an issue because the benefits almost certainly outway the cost in this case
    2. Because that change needs to happen in the backward compatible way

After digging more in @MarkusH patch I found it is not that difficult to use ModelState but then I realised that ModelState fails to resolve relational fields which make it a bit tricky for Schema Editor to use ModelState in spite of models.
I also read the Proposal by @aryan9600 where the introduction of new data structure (RelatedFieldTuple) was proposed. And also @MarkusH suggested

So this is all I digged into, So far. After all this I would like to conclude that there are a lot of ways through which we may solve the problem like:

  1. As Suggested by @MarkusH, A central registry to store all dependencies
  2. Is there a way by which ModelStates may store relations? If yes, then probably it would be a win win situation for ModelStates and SchemaEditor

Since last 2 days I am digging into all this and did a lot of brainstorming. Personally, I found Migration Framework damn interesting. I would love to optimise the Migration framework. Just need some more points which I can add in my proposed solution to make it more concrete.
Any help,suggestions or feedback would be highly appreciated from @carltongibson , @MarkusH , @charettes , @felixxm or anyone who has thoughts and ideas on this.

Regards
Manav Agarwal

1 Like