Hello! I’m Peter, a third year undergrad student from New York studying Computer Science.
I found the migrations project from the idea list interesting and I would love to give it a shot. From what I understand, we want to try to change the schemaEditor to use model state rather than rendered models, ultimately speeding up migrations. It looks like Markus Holtermann did some great initial work on this and I would hate for it to go to waste!
I would appreciate any extra details, thoughts on how to go about this challenge or how I could best propose this project.
Hi Peter. Welcome.
Super, yes! Migrations are interesting and hard. Taking this on would make you an expert in this area by the end of it. I’d suggest looking at the open migrations tickets and digging into a few. I think just hanging out in that code is a sensible policy to begin.
From there, this project is probably one of the better defined ones, so the proposal will amount to your battle-plan.
I will ping @MarkusH so that he’s seen your post. He knows the most here.
Sorry to bother, but where is the “idea list” for GSOC 2020.
Every organization participating in GSoC has an ideas list, or something very similar.
For Django, it’s here https://code.djangoproject.com/wiki/SummerOfCode2020
Thanks @SanketDG. 100% correct.
great that you’re interested in this . Where do you want me to start to elaborate on the idea?
Some things that will probably be necessary these:
To really know about all dependencies between all models, I’d go about and have a central registry, probably on
django.apps.registry.Apps. You need some kind of structure that allows quick look-ups of the form “which models point to a given model”. But, more precisely, you probably need that on a field level. What are all the fields (and from them you can derive the corresponding models) that point to a given field or a given model. You need this be able to lookup which related models need to be touched when e.g. another model is deleted
You will also need a mapping that gives you information about data types of related columns. Currently a
ForeignKey retrieves its
self.target_field which depends on the model instance. This is necessary as the
column_sql() depends on
I suspect these one or two things to be the first major milestones. I didn’t take these steps in my proof of concept and eventually that became one or the core problems I ran into.
The second major issue I was facing was the backwards compatibility requirements on the
SchemaEditor. Anything you do will have to keep working as-is, even with model classes because the API is documented. You can deprecate parts, but their removal will only happen in the next major major release series (simply put). I think making the
django.db.migrations.state behave identically to model classes could help there. But I haven’t investigated any further what that would entail.
I hope that helps
Hey everyone, I too would would like to give this idea shot, I have been going through DB and Migration issues in the past 2 weeks and also have opened some PRs for the same. I am looking for some advice. Should I try to browse through more tickets, or now start to go through all the code related to this idea and @MarkusH’s patch? I would love to get some guidance from you guys
Hi @aryan9600. I think the advice in this thread applies equally. Familiarity with the migrations framework is the starting point. Then yes Markus’ more detailed points.
In the limit, I’m sure there’s space for more than one person to work on the migrations framework. The project idea was one concrete suggestion. But other ideas are also welcome too.
Thank you for the response! I’ve been doing some digging this past week to digest your reply and I had a few more areas that I wish you would explain:
- When/where should the quick lookup for models/db_types be populated? Perhaps as models are being registered?
- Could you elaborate why the way we get the db_type of a model instance’s field is different/impossible on a model state’s field?
My semester midterms were going on last week so I didn’t get much time to work on my proposal😬. I was hoping to get some advice.
- What exactly does it mean when it’s said that migrations are currently done by rendering fake models? Where exactly does that happen?
- How would one approach swapping out the existing logic with the proposed logic of using model states?
Absolutely. I’m happy to.
My first choice would probably be the AppRegistry, yes.
At this point, relational fields, such as
ForeignKeys, have a reference to the instance of the field they refer to, available under
remote_field. That’s usually the primary key field, but can be any as defined through to
to_field argument to e.g.
ForeignKey. That field, again, knows about the model it’s on. When you now what to know the datatype for the
ForeignKey column, Django will look at the related field and take its data type.
When you’re working with
ModelStates, you don’t have any resolved references. A
ForeignKey only refers to
app_label.Model (and optionally the
to_field). No, in order to get all incoming
ForeignKeys, one needs to iterate over all
ModelStates and over each state’s fields, to check if they’re related. That’s very inefficient, but doable.
The idea of the change is, to have an O(1) lookup (if possible) to get al incoming ForeignKeys.
Django’s migrations use
ModelStates that, for the most part, look like a
Model, but actually aren’t They have a a significantly reduced API to make them more efficient. The downside of that is, that some things, like resolving relational fields, as pointed out above, isn’t possible. But since these relations are necessary to construct correct SQL, Django takes these
ModelStates and turns them into real
That’s one of the task of the project and up to be proposed.
Hey everyone, after going through the codebase and @MarkusH’s patch, I got started with my proposal. But I feel a bit unclear about what my proposal should consist of. I went through the proposal on the Django GSoC website, and the proposal by Parth Patil last year. Since both these proposals involved adding new features that a Django developer would use and not fixes in the framework itself, I am a bit confused about what exactly my proposal should entail. Should it contain of how the functions would actually look like? Or should it be more like the new proposed of the framework, i.e.
Some advice would be highly appreciated.
@carltongibson @MarkusH could you kindly reply?
Hi @aryan9600. Sorry for the lack of reply… this week has been somewhat interesting shall we say, as you can no doubt imagine.
I think for this project, some low-level thoughts on how you would address the problem are appropriate. Some discussion that demonstrates what the problem is, how it manifests and how you’d be able to address it, or such. — That kind of thing would hopefully give confidence that you’d be able to address the issue.
I hope that makes sense.
Here is my initial draft. Feedback and criticism is highly appreciated.
@carltongibson @MarkusH do you have any changes in mind that could improve this proposal?
Hi @aryan9600. With the current covid-19 situation, I don’t have the capacity to review this now. Sorry about that. (I can’t speak for Markus but it’s disrupting life everywhere.)
Do submit when you are happy. Thanks for your effort!
I understand that and I hope you and everyone else is safe. Sorry to bother you! I will try to improve my proposal wherever I can, and submit it. Thanks for guiding me! Hope I get to work with you people in the summer.
Hi @aryan9600. I got a chance to quickly glance at the proposal.
I wasn’t able to dig into the exact details of the methods you mention but it looks good at a high level.
Good luck with the submission!