[GSoC 2021] Adapt Schema Editors to operate from model states instead of fake rendered models

Hello everyone!
This discussion is for my GSoC project.

Communication

I have setup a documentation blog here. I will document all my GSoC-related work in this blog. I will also add my plans and their progress in the same blog.
For communication with my mentor and other Django Fellows, I will prefer using either Django Forum or django-developers mailing list.

Schedule

I would like to stick to a plan and before starting with anything I would like to plan things up. Also, I would like to continue with the plan mentioned in my project proposal.

Major Milestones:
  1. Creating and Populating Central Registry
  2. Adapting Central Registry and ModelState
  3. Working on tests and documentation
Task for the upcoming 1.5 weeks ( May 22 - June 1)
  • Creating the proxy for all Operation subclasses’ state_forwards and state_backwards methods in ProjectState.
  • Initialization of the Central Registry in ProjectState(like ProjectState.add_field() etc.).
  • Code the logic to populate the registry with ProjectState.add_model() and newly introduced ProjectState.add_field() .
  • Fixing and Writing tests along with Documentation. (if required)

I am starting with proxying all Operation subclasses state_forwards and state_backwards methods in ProjectState(Of course, if my mentor prefers).

Any suggestions, views, thoughts or feedback is most welcome. Also, I would like to know the community’s expectations from me.

Working this summer, optimizing the Migrations Framework is going to be way more interesting. It’s more of like once in a lifetime experience.

3 Likes

I have created a PR to create a Proxy of all state_forwards method in ProjectState class.
PR

1 Like

While writing the code for the central registry in ProjectState. I came across a problem which is mentioned below. I am trying to figure out the solution for the same. I would be glad if anyone may help me out with this.

Problem

To initialize the central registry in django.db.migrations.state.ProjectState.add_model I need the following parameters

  • from_app_label
  • from model_name
  • from field
  • to model_name
  • to app_label
  • to_field

Out of all these required values I figured out how to get the first 5 values as follows:

  • from_app_label - model_State.app_label
  • from model_name - model_state.name_lower
  • from field - model_state.fields.items()
  • to model_name - model_state.fields.items().field.remote_field.model
  • to app_label - model_state.fields.items().field.remote_field.model.split(".")

But I am unable to figure out how can I get to_field in case the to_field is not explicitly defined by the developer.

Suggestions and feedback would be highly appreciated.

Regard,
Manav

I have solve the above mentioned problem But I am stuck with how will we store manytomany relations in the registry? Like for Foreign Key relations we may use the following method:

project_state.related_fields_registry[app_label, model_name]:[(from_field,app_label,model_name,to_field),...]

But as in case of manytomany field we have through model so how can we manage that?
I would like to ping @felixxm @MarkusH @charettes for the solution.

Suggestions and feedback would be highly appreciated.

Would it not be possible for you to create the through models (replicating what the actual models do?) with more or less the same code? It would probably be easy to factor out at least the through model/field names to share that between this and the real code. I imagine the big thing you want is those names, right?

So a ManyToMany declaration ends up also creating the model in your add_model/add_field code. If I’m not mistaken that’s basically what happens in real Django code.

1 Like

Looks you’ve resolved this issue but when not specified it must default to the primary key of the referenced model.

@rtpg could work and might even make adapting the schema editor a bit easier as the latter wouldn’t have to care about whether or not such models were auto-created by the framework.

As for the data structure itself it will likely need to be adapted to store both through and through_fields. It might not be necessary in the first place if you populate the registry with through models as @rtpg suggested though as you might be able to solely rely on the implicitly created foreign keys to deduce relationships.

2 Likes

I think it will take some time to digest the suggestions by @rtpg @charettes. I also need to dig more into the code in order to have something concrete in mind for the same.

Thank You @rtpg and @charettes for your suggestions. I have some code in my mind for add_model in django.db.migrations.state.ProjectState which is as follows.

        for name, field in model_state.fields.items():
            if field.is_relation:
                from_app_label = model_state.app_label
                from_model = model_state.name_lower
                from_model = name
                to_app_label = field.remote_field.model.split(".")[0]
                to_model_name = field.remote_field.model.split(".")[1]
                # to_field would be the through model name in case of m2m and to_fields in case of foreign keys
                to_field = field.remote_field.through or\
                           "_".join((model_state.name_lower, name.lower()))\
                            if field.many_to_many else field.to_fields
                try:
                    self.related_fields[(model_state.app_label, model_state.name_lower)].\
                        append((from_model, to_app_label, to_model_name, to_field))
                except KeyError:
                    self.related_fields[(from_app_label, from_model)] = \
                        [(from_model, to_app_label, to_model_name, to_field)]

This code is to populate the central registry by add_model. If that seems fine I may continue with populating the registry with other functions as well.
Any improvements or suggestions for a better implementation would be appreciated.

As the first evaluations are drawing near, I would like to know my mentor’s and the community’s expectations from me for the first evaluation. So that I may complete all the tasks on time. :slight_smile:

While working on the central repository, I realized that using project_state.related_fields_registry[app_label, model_name]:[(from_field,app_label,model_name,to_field),...] kind of structure would make the central mapping or registry more bulky.

So I just came up with a solution. I would like to propose a new class for the same, in order to make it less bulky to store relations.
it would be something like this:

    class RelationValue:
        def __init__(self, from_field, to_app_label, to_model_name, to_field):
            self.from_field = from_field
            self.to_app_label = to_app_label
            self.to_model_name = to_model_name
            self.to_field = to_field
        #Methods
        def get_all_relations_by_model_name(self):
            # Something
            pass

The above mentioned class definition can be used in the central mapping or registry as follows:
project_state.related_fields_registry[app_label, model_name]:[(RelationValue instance 1), (RelationValue instance 2),...]

I am not aware of the consequences of the same. Would it be fine to do so? But yes, definitely it will make it easy to maintain the central registry or mapping.

I would ping @charettes, @felixxm, @MarkusH, and @carltongibson for suggestions.

Hello @manav014.

I’m not sure how this will make the registry less bulky but if you believe it makes it easier to reason about the relationships I’d say go for it. You might want to look into subclassing NamedTuple instead to reduce the boilerplate though.

Hey @charettes . Thanks for your suggestions. I would like to continue with your suggestion of using NamedTuple. I will create a PR soon with the changes.

IMO we’re on the right track :+1:. Proxy methods are already merged :star:. It would be great to submit PR (even draft) with the central registry and start a discussion about it :face_with_monocle:, but if you need few more days that’s fine.

1 Like

That sounds great!
I have submitted a PR with the central Registry. Please review the same so that I may start working on tests.

Would it be feasible to change the properties of fields in ModelStates? Like populating the remote_field and other properties of ModelState fields and making them work similar to that of __fake__ model fields. This will perhaps even maintain the backward compatibility.

I am not aware of the consequences of the same. Please let me know if this will cause something inappropriate? If this will have some wrong consequences then we will continue with central mapping.

As the deadline for final submission is approaching, I would like to know about the specific milestones along with their deadlines.

Apologies for not updating the blog post. I am currently working on adapting.

BaseDatabaseSchemaEditor class to work with ModelStates and I have even successfully converter functions like table_sql, column_sql and CreateModel.database_forwards to work with ModelStates in my local copy. I will push the changes once the PR for relations registry gets merged.

I am working on logic for many_to_many fields and as soon as I will complete the same and create a PR, I will update my Blog Post.

I would like to know about my mentor’s and community’s expectations from me.

Kind Regards,
Manav

Thanks :+1: I left comments on your PR, please take a look. We should be able to merge it when I come back after the holidays (August, 11th to 18th :mountain:). We can start working on changes in the schema editor afterward, hopefully before the final GSoC evaluation but don’t worry if not.

1 Like

@felixxm Thank You for the feedback. Hope you have enjoyed your holidays :mountain: .

I have added the desired changes to the PR. Please review the same so that I may start working on the final changes in Schema Editor. I will try to create a PR as soon as #14587 gets merged and hopefully, we will complete the project soon.

I would also like to mention that I have updated my blog. I would be glad, if you may check it out once as I would be submitting the same for the final evaluation. I would also like to learn more about what changes I have to make in the blog to make it perfect for final submission.

We don’t have much time to August, 30th :spiral_calendar: . I’d like to merge PR14781 and PR14587 until that day. If you can open a subsequent PR with schema editor changes it will great, if not, it’s okay too :+1: . You can do this later as a non-GSoC contribution. I hope you will stay with us as a regular contributor :medal_sports:

1 Like

As we are done with relations registry population in ProjectState. It’s time to start working on adding changes to SchemaEditor so that it may work with ModelStates.
I have made a few observations which are as follows:

A. In django.db.backends.schema.BaseDatabaseSchemaEditor

  1. In table_sql() we need to manually find the db_params if model is an instance of ModelState. The code may look like this:
 if isinstance(model, ModelState) and field.is_relation:
                related_model = field.related_model
                if related_model == 'self':
                    related_model = model.name_lower
                if "." in related_model:
                    to_model_key = tuple(related_model.lower().split('.'))
                else:
                    to_model_key = tuple([model.app_label, related_model.lower()])
                to_model_state = self.project_state.models[to_model_key]
                try:
                    to_field = to_model_state.get_field(field.to_fields[0])
                except KeyError:
                    to_field = to_model_state._meta.pk
                db_params = {"type": to_field.rel_db_type(connection=connection), "check": []}
  1. we need to find to_column also using the similar method as follows:
       if isinstance(model, ModelState):
                    related_model = field.related_model
                    if related_model == 'self':
                        related_model = model.name_lower
                    if "." in related_model:
                        to_model_key = tuple(related_model.lower().split('.'))
                    else:
                        to_model_key = tuple([model.app_label, related_model.lower()])
                    to_model = self.project_state.models[to_model_key]
                    to_table = to_model._meta.db_table
                    try:
                        to_column = to_model.get_field(field.remote_field.field_name).column
                    except KeyError:
                        to_column = to_model._meta.pk.column

In order to use the above-mentioned column property of the field, we need to initialize column to every field in ModelState.__init__() .

  1. In create_model() the making of m2m models can be replaced with the below-mentioned code if model is an instance of ModelState.
 if isinstance(model, ModelState):
            for field in model._meta.local_many_to_many:
                if field.remote_field.through == None:
                    model_state = self.project_state.models[(
                        model.app_label, "".join([model.name_lower, "_", field.name.lower()]))]
                else:
                    model_state = self.project_state.models[tuple(field.remote_field.through.split("."))]
                if model_state.auto_created:
                    self.create_model(model_state)

B. In django.db.migrations.state

  1. We need to add a variable with name auto_created in ModelState to deal with auto_created m2m models.

  2. We need to create a _meta function in ModelState class which will return the instance of ModelStateOptions which is defined in the next point

  3. We need to create a ModelStateOptions class which will have property functions like pk, app_label, swappable, db_table etc.

  4. We need to add functions similar to the below mentioned ones in the ModelState class.


    def get_field_by_name(self, name):
        return self.get_field(name)

    def get_local_fields(self):
        fields = [field for field in self.fields.values() if not field.many_to_many]
        return fields

    def get_local_many_to_many(self):
        fields = [field for field in self.fields.values() if field.many_to_many]
        return fields

C. In django.db.migrations.operations.base

  1. We need to create a context manager to pass ProjectState instance to the schema editor.
@contextmanager
def patch_project_state(schema_editor, project_state):
    schema_editor.project_state = project_state
    try:
        yield
    finally:
        del schema_editor.project_state

D. In django.db.migrations.operations.models

  1. We need to automatically create ModelStates of auto created m2m through models in CreateModel.state_forwards. The logic of which would look something like this:
for name, field in self.fields:
            if field.many_to_many and field.remote_field.through == None:
                this_model_name = "".join([app_label, ".", self.name])
                to_model = field.related_model
                if this_model_name == to_model:
                    to_field_name = "".join(["to", "_", field.related_model])
                    from_field_name = "".join(["from", "_", field.related_model])
                    to_field = models.ForeignKey(this_model_name, on_delete=models.CASCADE, auto_created=True)
                    from_field = models.ForeignKey(this_model_name, on_delete=models.CASCADE, auto_created=True)
                else:
                    to_field_name = "".join([to_model.split(".")[1], "_", "id"])
                    from_field_name = "".join([to_model.split(".")[1], "_", "id"])
                    to_field = models.ForeignKey(to_model, on_delete=models.CASCADE, auto_created=True)
                    from_field = models.ForeignKey(this_model_name, on_delete=models.CASCADE, auto_created=True)
                fields = [("id", models.AutoField(primary_key=True)),
                          (to_field_name, to_field),
                          (from_field_name, from_field)]
                through_model_name = "".join([self.name.lower(), "_", name])
                through_model_state = ModelState(
                    app_label,
                    through_model_name,
                    list(fields),
                    options={"db_table":field.db_table},
                    auto_created=True,
                )
                state.add_model(through_model_state)

This is all I had in my mind since I started working on schema_editor. I think this is a big change and will need proper planning and discussion before implementation. I would like to have suggestions from fellow developers.