GSOC 2020 Migration Project

Hello! I’m Peter, a third year undergrad student from New York studying Computer Science.

I found the migrations project from the idea list interesting and I would love to give it a shot. From what I understand, we want to try to change the schemaEditor to use model state rather than rendered models, ultimately speeding up migrations. It looks like Markus Holtermann did some great initial work on this and I would hate for it to go to waste!

I would appreciate any extra details, thoughts on how to go about this challenge or how I could best propose this project.

Thank you!
Peter

Hi Peter. Welcome.

Super, yes! Migrations are interesting and hard. Taking this on would make you an expert in this area by the end of it. I’d suggest looking at the open migrations tickets and digging into a few. I think just hanging out in that code is a sensible policy to begin.

From there, this project is probably one of the better defined ones, so the proposal will amount to your battle-plan.

I will ping @MarkusH so that he’s seen your post. He knows the most here. :slightly_smiling_face:

1 Like

Hi,

Sorry to bother, but where is the “idea list” for GSOC 2020.

Cheers
Naveen Arora

Every organization participating in GSoC has an ideas list, or something very similar.

For Django, it’s here https://code.djangoproject.com/wiki/SummerOfCode2020

2 Likes

Thanks @SanketDG. 100% correct. :slightly_smiling_face:

Hi @jc4883,

great that you’re interested in this :+1:. Where do you want me to start to elaborate on the idea?

Some things that will probably be necessary these:

To really know about all dependencies between all models, I’d go about and have a central registry, probably on django.apps.registry.Apps. You need some kind of structure that allows quick look-ups of the form “which models point to a given model”. But, more precisely, you probably need that on a field level. What are all the fields (and from them you can derive the corresponding models) that point to a given field or a given model. You need this be able to lookup which related models need to be touched when e.g. another model is deleted

You will also need a mapping that gives you information about data types of related columns. Currently a ForeignKey retrieves its db_type from self.target_field which depends on the model instance. This is necessary as the SchemaEditor's column_sql() depends on db_type().

I suspect these one or two things to be the first major milestones. I didn’t take these steps in my proof of concept and eventually that became one or the core problems I ran into.

The second major issue I was facing was the backwards compatibility requirements on the SchemaEditor. Anything you do will have to keep working as-is, even with model classes because the API is documented. You can deprecate parts, but their removal will only happen in the next major major release series (simply put). I think making the ModelStates from django.db.migrations.state behave identically to model classes could help there. But I haven’t investigated any further what that would entail.

I hope that helps :slight_smile:

2 Likes

Hey everyone, I too would would like to give this idea shot, I have been going through DB and Migration issues in the past 2 weeks and also have opened some PRs for the same. I am looking for some advice. Should I try to browse through more tickets, or now start to go through all the code related to this idea and @MarkusH’s patch? I would love to get some guidance from you guys :slight_smile:

Hi @aryan9600. I think the advice in this thread applies equally. Familiarity with the migrations framework is the starting point. Then yes Markus’ more detailed points.

In the limit, I’m sure there’s space for more than one person to work on the migrations framework. The project idea was one concrete suggestion. But other ideas are also welcome too.

1 Like

Hello @MarkusH,

Thank you for the response! I’ve been doing some digging this past week to digest your reply and I had a few more areas that I wish you would explain:

  • When/where should the quick lookup for models/db_types be populated? Perhaps as models are being registered?
  • Could you elaborate why the way we get the db_type of a model instance’s field is different/impossible on a model state’s field?

Thanks!

My semester midterms were going on last week so I didn’t get much time to work on my proposal😬. I was hoping to get some advice.

  1. What exactly does it mean when it’s said that migrations are currently done by rendering fake models? Where exactly does that happen?
  2. How would one approach swapping out the existing logic with the proposed logic of using model states?

Absolutely. I’m happy to.

My first choice would probably be the AppRegistry, yes.

At this point, relational fields, such as ForeignKeys, have a reference to the instance of the field they refer to, available under remote_field. That’s usually the primary key field, but can be any as defined through to to_field argument to e.g. ForeignKey. That field, again, knows about the model it’s on. When you now what to know the datatype for the ForeignKey column, Django will look at the related field and take its data type.
When you’re working with ModelStates, you don’t have any resolved references. A ForeignKey only refers to app_label.Model (and optionally the to_field). No, in order to get all incoming ForeignKeys, one needs to iterate over all ModelStates and over each state’s fields, to check if they’re related. That’s very inefficient, but doable.
The idea of the change is, to have an O(1) lookup (if possible) to get al incoming ForeignKeys.

Cheers.

Django’s migrations use ModelStates that, for the most part, look like a Model, but actually aren’t They have a a significantly reduced API to make them more efficient. The downside of that is, that some things, like resolving relational fields, as pointed out above, isn’t possible. But since these relations are necessary to construct correct SQL, Django takes these ModelStates and turns them into real Model classes.

That’s one of the task of the project and up to be proposed.

1 Like

Hey everyone, after going through the codebase and @MarkusH’s patch, I got started with my proposal. But I feel a bit unclear about what my proposal should consist of. I went through the proposal on the Django GSoC website, and the proposal by Parth Patil last year. Since both these proposals involved adding new features that a Django developer would use and not fixes in the framework itself, I am a bit confused about what exactly my proposal should entail. Should it contain of how the functions would actually look like? Or should it be more like the new proposed of the framework, i.e. ModelState and SchemaEditor.?
Some advice would be highly appreciated. :smile:

Thanks!

@carltongibson @MarkusH could you kindly reply?:grinning:

Hi @aryan9600. Sorry for the lack of reply… this week has been somewhat interesting shall we say, as you can no doubt imagine. :microbe:

I think for this project, some low-level thoughts on how you would address the problem are appropriate. Some discussion that demonstrates what the problem is, how it manifests and how you’d be able to address it, or such. — That kind of thing would hopefully give confidence that you’d be able to address the issue.

I hope that makes sense.

Kind Regards,

Carlton

1 Like

Here is my initial draft. Feedback and criticism is highly appreciated.

GSoC Proposal.md

@carltongibson @MarkusH do you have any changes in mind that could improve this proposal? :smiley:

Thanks!

Hi @aryan9600. With the current covid-19 situation, I don’t have the capacity to review this now. Sorry about that. (I can’t speak for Markus but it’s disrupting life everywhere.)

Do submit when you are happy. Thanks for your effort!

I understand that and I hope you and everyone else is safe. :heart: Sorry to bother you! I will try to improve my proposal wherever I can, and submit it. Thanks for guiding me! :smile: Hope I get to work with you people in the summer.

Best wishes!

1 Like

Hi @aryan9600. I got a chance to quickly glance at the proposal.

I wasn’t able to dig into the exact details of the methods you mention but it looks good at a high level.

Good luck with the submission!

Carlton