Hello everyone,
My name is Manav. I’m a Computer Science and Engineering junior at Dr. A.P.J. Abdul Kalam Technical University in India.
I have solved many issues on trac
I read through the GSoC Idea List and the Migration topic stood out for me. I found the idea to adapt schema editors to operate from model states instead of fake rendered models really interesting.
I am going through the code for the Migrations framework for the last 2 weeks. I started from django.core.management.commands.makemigrations and django.core.management.commands.migrate, Then I read the source code of all the functions and classes that are used in both the modules. In the meantime I was reading the closed migration tickets and also solved one of the migrations problem of naming the migrations file.
When I researched more about the problem, I came to know that a fake model is rendered by the render() function. I am still trying to figure out how can we use Model State on practical level. I even read the comment by @MarkusH which gave me the idea of digging into the field level and also learned that as the API is documented all the things have to keep working as-is.
In the comment, Markus suggested a good idea, and currently, I am figuring this out.
Also, I would like to thank @MarkusH for his initial patch (Commits · MarkusH/django · GitHub) which helped me a lot while thinking of an optimized solution and I feel that such efforts shouldn’t be wasted.
To be honest, I am contributing to the Django project for the last 6 to 7 months and I found the community friendly and helpful. I would appreciate any suggestions, thoughts on how can I propose the best solution.
Hi @manav014 — there was some discussion of this last year. (Search the history here, on the developers mailing list, GitHub PRs, and the ticket — you’ll find it.)
Getting an overview of that would be a good start.
Thank You @carltongibson for your suggestion. And as per your suggestion, I read the discussion on #22608. And noted the following key points:
When this ticket was raised (7 years ago) the Django used to use traditional methods for migrations and migrate the operations.
As per the suggestions by @timgraham the idea of memoization was adapted and also 2 commits were merged to optimize the migrations process. 1. Optimized migration optimizer and migrate by caching calls to str.lower() 2. Optimized migration optimizer Moved list constants instantiation into optimizer’s init
The main slowdown during the migrate phase is the heavy model rendering required to pass fake models to the schema editor. Markus Holtermann have a long standing branch to completely stop rendering models to during the migration phase. That includes significant changes to the schema editor to be able to operate on model states instead of rendered model classes but it’s a good step forward IMO. I just filed a ticket to track this optimization in #29898.
After reading this discussion I started analyzing the code by @MarkusH in the patch, I recorded all my research in the form of flowcharts.
The red color boxes with filenames are the starting point and Please start reading from models.py or fields.py for better understanding.
Thank You @carltongibson for the most needed resource. I checked out @MarkusH’s talk from DjangoConEU https://www.youtube.com/watch?v=u6cVvbuUzlk . And it was an awesome overview of the Migrations framework of Django. After checking the talk, I dug into the code more precisely. And noted the following key points.
The main migrate phase starts from migrate() function(using executor object) which takes pre_migration_state as an argument and returns post_migration_state after applying all the migrations
In migrate function, it checks that the migrations are to be applied in forward direction or in backward direction and works accordingly.
In case all the migrations are in forward direction then it called _migrate_all_forwards() that migrates them all in the forward direction.
In _migrate_all_forwards() it calls apply_migration on each migration in plan which belongs to full_plan as well
In apply_migration() the apply() function is called on migration in which state and an instance of SchemaEditor is passed
in Migration.apply() function the old_state of project is stored and the latest state after applying the migration is generated using state_forwards in this state, the state.apps contains all the fake models which would then be used by SchemaEditor with database_forwards() and the migration would be applied in the database.
The main problem is in the 6th step that is in database_forwards in which Django uses SchemaEditor to apply the migration in Database and for the SchemaEditor uses __fake__ models. So for that we have to call reload_model() in state_forwards which internally calls render_multiple() and render() which delays the process.
Executor is the brain of applying and unapplying migration
Most cruicial part of migration phase is rendering models
After rendering the executor is going to call the apply method that mutates state operation by operation and database as well operation by operation(as mentioned in point 6 above)
ProjectState knows all ModelStates at a given time
Because the Schema Editor only works with model classes the model states needs to be converted into them and that’s called model rendering.
Why schema editor doesn’t work with model States
1. The Schema Editor is the part of the database backend and doesn’t know about the internal database migrations framework making it work with model states would mean opening up some of the internal APIs which is the ProjectStates and ModelStates which would not necessarily be an issue because the benefits almost certainly outway the cost in this case
2. Because that change needs to happen in the backward compatible way
After digging more in @MarkusHpatch I found it is not that difficult to use ModelState but then I realised that ModelState fails to resolve relational fields which make it a bit tricky for Schema Editor to use ModelState in spite of models.
I also read the Proposal by @aryan9600 where the introduction of new data structure (RelatedFieldTuple) was proposed. And also @MarkusH suggested
So this is all I digged into, So far. After all this I would like to conclude that there are a lot of ways through which we may solve the problem like:
As Suggested by @MarkusH, A central registry to store all dependencies
Is there a way by which ModelStates may store relations? If yes, then probably it would be a win win situation for ModelStates and SchemaEditor
Since last 2 days I am digging into all this and did a lot of brainstorming. Personally, I found Migration Framework damn interesting. I would love to optimise the Migration framework. Just need some more points which I can add in my proposed solution to make it more concrete.
Any help,suggestions or feedback would be highly appreciated from @carltongibson , @MarkusH , @charettes , @felixxm or anyone who has thoughts and ideas on this.
I have a doubt. I would be glad if anyone may help.
From one of the points by @charettes
If project state maintains a map of (app_label, model_name): [(from_fields, app_label, model_name, to_fields), …] you’ll be able to resolve related db types easily by doing project_state.models[to_app_label, to_model_name].fields[to_field].db_type(connection).
If we populate the registry with any change in state as suggested by @charettes then from where we will get the to_fields to store in the registry.
And also if we will populate the registry in django.apps.registry.Apps.populate as suggested by @MarkusH then also the same issue will arise.