GSoC 2021 Migration Project

Hello everyone,
My name is Manav. I’m a Computer Science and Engineering junior at Dr. A.P.J. Abdul Kalam Technical University in India.
I have solved many issues on trac

I read through the GSoC Idea List and the Migration topic stood out for me. I found the idea to adapt schema editors to operate from model states instead of fake rendered models really interesting.

I am going through the code for the Migrations framework for the last 2 weeks. I started from django.core.management.commands.makemigrations and django.core.management.commands.migrate, Then I read the source code of all the functions and classes that are used in both the modules. In the meantime I was reading the closed migration tickets and also solved one of the migrations problem of naming the migrations file.
When I researched more about the problem, I came to know that a fake model is rendered by the render() function. I am still trying to figure out how can we use Model State on practical level. I even read the comment by @MarkusH which gave me the idea of digging into the field level and also learned that as the API is documented all the things have to keep working as-is.
In the comment, Markus suggested a good idea, and currently, I am figuring this out.

Also, I would like to thank @MarkusH for his initial patch (Commits · MarkusH/django · GitHub) which helped me a lot while thinking of an optimized solution and I feel that such efforts shouldn’t be wasted.

To be honest, I am contributing to the Django project for the last 6 to 7 months and I found the community friendly and helpful. I would appreciate any suggestions, thoughts on how can I propose the best solution.

1 Like

Hi @manav014 — there was some discussion of this last year. (Search the history here, on the developers mailing list, GitHub PRs, and the ticket — you’ll find it.)

Getting an overview of that would be a good start.

This is definitely an addressable issue!

Thanks for your input!

Thank You @carltongibson for your suggestion. And as per your suggestion, I read the discussion on #22608. And noted the following key points:

  1. When this ticket was raised (7 years ago) the Django used to use traditional methods for migrations and migrate the operations.
  2. As per the suggestions by @timgraham the idea of memoization was adapted and also 2 commits were merged to optimize the migrations process.
    1. Optimized migration optimizer and migrate by caching calls to str.lower()
    2. Optimized migration optimizer Moved list constants instantiation into optimizer’s init
  3. And in the end @charettes concluded that

The main slowdown during the migrate phase is the heavy model rendering required to pass fake models to the schema editor. Markus Holtermann have a long standing ​branch to completely stop rendering models to during the migration phase. That includes significant changes to the schema editor to be able to operate on model states instead of rendered model classes but it’s a good step forward IMO. I just filed a ticket to track this optimization in #29898.

  1. From here the patch by @MarkusH was introduced.

After reading this discussion I started analyzing the code by @MarkusH in the patch, I recorded all my research in the form of flowcharts.
The red color boxes with filenames are the starting point and Please start reading from models.py or fields.py for better understanding.

Currently, I am reading the discussion on https://groups.google.com/g/django-developers/c/_ohBzsuomqw/m/RxCZ2MKyAwAJ regarding a proposal by @aryan9600 .
Will update the same once completed.
Are the observations correct and Am I on the right track? Any Suggestions, feedbacks, or Improvements are most welcome.

1 Like

Hey @manav014 — Looks good. Keep going! :slight_smile: The mailing list thread from last year made some good progress IIRC.

Check out @MarkusH’s talk from DjangoCon EU last year too DjangoCon 2020 | A Pony On The Move: How Migrations Work In Django :racehorse: - Markus Holtermann - YouTube — it’s not directly addressing the ticket but is a super overview of the migrations system.

1 Like

Thank You @carltongibson for the most needed resource. I checked out @MarkusH’s talk from DjangoConEU https://www.youtube.com/watch?v=u6cVvbuUzlk . And it was an awesome overview of the Migrations framework of Django. After checking the talk, I dug into the code more precisely. And noted the following key points.

  1. The main migrate phase starts from migrate() function(using executor object) which takes pre_migration_state as an argument and returns post_migration_state after applying all the migrations
  2. In migrate function, it checks that the migrations are to be applied in forward direction or in backward direction and works accordingly.
  3. In case all the migrations are in forward direction then it called _migrate_all_forwards() that migrates them all in the forward direction.
  4. In _migrate_all_forwards() it calls apply_migration on each migration in plan which belongs to full_plan as well
  5. In apply_migration() the apply() function is called on migration in which state and an instance of SchemaEditor is passed
  6. in Migration.apply() function the old_state of project is stored and the latest state after applying the migration is generated using state_forwards in this state, the state.apps contains all the fake models which would then be used by SchemaEditor with database_forwards() and the migration would be applied in the database.
  7. The main problem is in the 6th step that is in database_forwards in which Django uses SchemaEditor to apply the migration in Database and for the SchemaEditor uses __fake__ models. So for that we have to call reload_model() in state_forwards which internally calls render_multiple() and render() which delays the process.

Key points I noted from https://www.youtube.com/watch?v=u6cVvbuUzlk :

  1. Executor is the brain of applying and unapplying migration
  2. Most cruicial part of migration phase is rendering models
  3. After rendering the executor is going to call the apply method that mutates state operation by operation and database as well operation by operation(as mentioned in point 6 above)
  4. ProjectState knows all ModelStates at a given time
  5. Because the Schema Editor only works with model classes the model states needs to be converted into them and that’s called model rendering.
  6. Why schema editor doesn’t work with model States
    1. The Schema Editor is the part of the database backend and doesn’t know about the internal database migrations framework making it work with model states would mean opening up some of the internal APIs which is the ProjectStates and ModelStates which would not necessarily be an issue because the benefits almost certainly outway the cost in this case
    2. Because that change needs to happen in the backward compatible way

After digging more in @MarkusH patch I found it is not that difficult to use ModelState but then I realised that ModelState fails to resolve relational fields which make it a bit tricky for Schema Editor to use ModelState in spite of models.
I also read the Proposal by @aryan9600 where the introduction of new data structure (RelatedFieldTuple) was proposed. And also @MarkusH suggested

So this is all I digged into, So far. After all this I would like to conclude that there are a lot of ways through which we may solve the problem like:

  1. As Suggested by @MarkusH, A central registry to store all dependencies
  2. Is there a way by which ModelStates may store relations? If yes, then probably it would be a win win situation for ModelStates and SchemaEditor

Since last 2 days I am digging into all this and did a lot of brainstorming. Personally, I found Migration Framework damn interesting. I would love to optimise the Migration framework. Just need some more points which I can add in my proposed solution to make it more concrete.
Any help,suggestions or feedback would be highly appreciated from @carltongibson , @MarkusH , @charettes , @felixxm or anyone who has thoughts and ideas on this.

Regards
Manav Agarwal

1 Like

I have a doubt. I would be glad if anyone may help.
From one of the points by @charettes

If project state maintains a map of (app_label, model_name): [(from_fields, app_label, model_name, to_fields), …] you’ll be able to resolve related db types easily by doing project_state.models[to_app_label, to_model_name].fields[to_field].db_type(connection).

If we populate the registry with any change in state as suggested by @charettes then from where we will get the to_fields to store in the registry.
And also if we will populate the registry in django.apps.registry.Apps.populate as suggested by @MarkusH then also the same issue will arise.

I have proposed a solution to the problem. Please review the same and suggest your feedback or thoughts.