The order in which Django migrations are run is determined when running migrate as any of a number of legal orderings of a DAG. I understand why that this is a useful feature… however: it does generate a source of non-determinism that may come back to bite you when you least expect it (e.g. when rolling out to production).
In many cases I would be fine to specify a migration to have exactly the dependencies that are observable as the exiting per-app last migrations at the moment of creating the migration. It would be nice if there was an option to makemigrations that just dumped all existing such info into the dependencies field.
Can you explain your experience with non-determinism specifically? (otherwise we may be playing guessing games as to what problems you’re experiencing - eg I’m assuming you may be referring to custom migrations but “In the face of ambiguity, refuse the temptation to guess.” and all that )
I’ve experienced this many times but I didn’t take notes
Yes, this is most likely when doing RunPython.
It could be any dependency outside the application that you’ve written the migration for. Such dependencies need not immediately be obvious, they could simply be the consequence of resolving an attribute (which happens to be a ForeignKey pointing out of your app).
I suppose the reason this may lead to “surprising results” is that a developer will naturally write and run the migrations in some logical order without thinking about it… because even when they don’t explicitly think about dependencies, they’ll act on attributes in the order that they are available on their local system. However, because dependencies outside of your app are not automatically added to a migration when creating an empty migration, the environment in which the migrations are run next does not necessarily run the migrations in the same order as the original developer.
One more way of saying the same thing is: in scenarios where migrations are actually developed (and as part of the development process at least informally tested) in a linear way there is no reason to un-linearize this process (as Django does automatically) and such un-linearization can be the source of surprising errors. The suggested solution, of simply dumping all latest-versions into the dependencies tuple, is a solution for guaranteeing the same order of execution of the migrations across target environments (other developers or when deploying)
Whilst completely accurate, this solution will not scale well. Large projects regularly have 50+ installed apps, and most of them are not dependencies for a given migration. The largest I heard of had 500 apps: https://www.youtube.com/watch?v=NsHo-kThlqI .
Also putting a lot of dependencies in place could prevent squashmigrations from working, as each cross-app dependency prevents a squash over it.
I think a better solution would be to have some kind of defence for RunPython operations, that the function only uses apps, models, or fields referred to in the migration’s dependencies. This could be enforced by Django or maybe a linter.
By the way, my package django-linear- might help. It enforces linear order on a per-app basis, at least, which constrains the possible migration orders somewhat.
Run python complaining loudly when operating on models that are part of apps that are not explicitly defined as dependencies seems like an excellent suggestion indeed.
If you want to look into implementing this, it would need to at least go through a deprecation period. I think it would also help to provide an escape hatch for old migrations, so projects don’t have to arduously determine which migrations they depend on.
Also, maybe there could also be a new makemigrations option to add dependencies to the latest migrations in named apps, like makemigrations --depend users sales. That would reduce the difficulty of adding the cross-app dependencies. Could be a bit of a wild idea though.
I think a better solution would be to have some kind of defence for RunPython operations, that the function only uses apps, models, or fields referred to in the migration’s dependencies. This could be enforced by Django or maybe a linter.
Yup I think that’s the problem that would need solving here.
Coincidentally we can almost do this with a simple change to RunPython.
With this simple test we can connect a model signal to raise an exception if the sender is not a “fake” model (fake models are what migrations works with).
This just uses the pre_init signal to prevent Foo.objects.create() situations. If we had a pre_query signal then we could also prevent querying.
from django.db import migrations
from django.db.models.signals import pre_init
class RunPythonAvoidRealModels(migrations.RunPython):
def database_forwards(self, *args, **kwargs):
def no(sender, **kwargs):
if sender.__module__ != "__fake__":
raise NotImplementedError("No")
pre_init.connect(no)
super().database_forwards(*args, **kwargs)
pre_init.disconnect(no)
I don’t think escaping from the context of fake models is the problem here? Or I may be misunderstanding. I’d say the problem (RunPython-time) to signal is touching any model that is not explicitly declared as a dependency of the migration… AFAIK these 2 things are not the same (but I may be wrong, not an expert on the internals of migrations)
The migration state only has models declared in previously in the migration graph.
If your migration in app_a has not declared a dependency on app_b then doing apps.get_model("app_b", "ModelB") raises an exception that the app does not exist - I did this same mistake only a couple of hours ago because I forgot a dependency.
This snippet above forces people to use get_model() instead of importing the real model.
Isn’t this a solution? Or are we on different pages?
Correction: Migrations may or may not raise an exception about unknown app/model depending on the order of migrations. In my case local dev was ok but CI failed, because locally I had already applied app_b.
Yes, that’s the non-determinism. Changing get_model() to prevent accessing non-depended-on models is a likely easy path forward, though it won’t help with depending on particular fields.
Personally I would vote to have this behaviour rectified & I think we should get opinions from other folks. I’d like to ping @charettes because Simon added this behaviour in this commit:
Simon: thoughts on whether RunPython should not see models for already-applied migrations if RunPython doesn’t have a dependency on said migration?
That would be backward incompatible and require bi-directionaly coupling operations with migrations as the executor would have to know to pass a specialized kind of Apps instance to RunPython operations because the schema editor uses apps to retrieve models that are not explicitly depended on (e.g. to re-create foreign keys for example).
It might appear trivial to prevent RunPython from doing direct lookups against app models that are not explicitly specified in the .dependencies but what about transitive dependencies through relations that might not be frozen in time? The problem is not only to have the model exist in a certain form in the graph but that it has the current definition.
Manipulating models from app foo in app bar is always going to be problematic as the whole migration framework revolves around the idea of apps.
Take for example the following models locations.Place and people.Person.
If you create a data migration in locations that depends on people.0042 at its creation time because it uses people.Person, lets call it locations.0033, and you enforce the RunPython logic you are suggesting it won’t prevent people.0041 from altering people.Person in a way that breaks locations.0033 (e.g. removing a first that is being queried against). In order to truly prevents drifts of this nature we would not only have to generate dependencies for the lower bound but also run_before with a syntax that allows to reference the future next migration of the all models involved directly and indirectly.
Cross-app migration dependencies require developer interventions and I think that special casing RunPython to prevent direct lookups would only provide a false sense of security in this regard.
I’m not strongly against a makemigrations --dependency flag given it does saves the developer from looking up what the latest migration is for each involved app but it might be hard / impossible to implement in cases where <app_label> is not provided. That’s because when app_label is not specified migrations for multiple apps can be generated that might need to be broken down in migrations referencing each other in very specific manners and injecting dependencies in there might break things apart.
As pointed out above in the people and locations example neither makemigrations <app_label> --dependency or making RunPython(apps.get_model) special case lookups againt missing entries in Migration.depdencies would solve the problem entirely. I also agree with @vanschelven that the problem of using non-fake models in migrations is a completely different one that the topic he initially brought up.
The thing is nothing guarantees that a particular shuffling around between foo and bar is going to be the last one between the two. I think it’s realistic to assume that if there is a need to create a data migration between the two it might be necessary to create others in the future.
In a certain sense this problem is very similar to dependency management in Python projects. Migration.dependencies can be seen as minimum package versions, .run_before maximum package versions, and related models transitive package dependencies.
In the case of the migration framework though there is no way to lock dependencies for a migration to particular representation of the project mainly due to the lack of support for an upper bound specifier (an entry in run_before) that can point to a currently non-existent migration entry.
For example, in the case of package dependencies one would specify Django>=4.2,<5 but AFAIK it’s not possible to do Migration.dependencies = [("foo", "0031")]; .run_before = [("foo", "0031__next__") if foo/migrations/0031__next__.py doesn’t exist yet. We’d need a specialized way of doing that as the 0031 + 1 naming is very much a convention and there could be multiple foo migrations with the 0032 prefix (in non-linear migrations).
In this sense --dependency in it’s currently proposed form would only make sure to add minimum version entries in the dependency graph (e.g. dependencies = [("foo", "0031")] or Django>=4.2) but over time nothing guarantees that the piece of logic won’t be run against a future incompatible migrations that satisfies this minimum version dependency.
From my personal experience the lower bound problem (missing entries in dependencies) is as much of a problem as the upper bound problem (missing entries in run_before) so I’d much rather see a --locked-dependency <app_label> option (better name welcome) that takes care of both even if that pre-emptively requires adding support for run_before to reference next but possibly non-existent dependencies.
people.0041 → people.0043, as ti would be a later migration that changes the model, right?
“removing a first” → “removing a field”
I totally agree here. I had the perspective that warning against un-depended cross-app model access would at least guard against thoughtless use of models. It could indeed provide a false sense of security though.
I think it would be fine to limit to when <app_label> is provided.
Why would this be “pre-emptively’?
Also, for syntax, one idea would be to use a special wrapper object like:
from django.db.migrations import Next
class Migration:
...
run_before = [("people", Next("0031"))]
When loading the graph, that could be replaced with concrete references, or nothing at all, depending on whether next migrations exist. (There can be multiple in the case of forked history.)
It could be done in the same changeset. I used pre-emptively in the sense that the migration framework today doesn’t allow references to a migration that might not exist yet in run_before so we’d need to add a feature for that before adding support for the --dependency feature.
Also, for syntax, one idea would be to use a special wrapper object like:
I like the idea of a sentinel object to denote that over trying to be clever with the migration name. I think we should have --dependency include the full name of the runner migraton included in dependencies though as there might also be multiple migrations matching 0031 so what comes next could be ambiguous.
To avoid the effect of xkcd: Wisdom of the Ancients
To scratch my own itch, I stuck with my original suggestion of “specify[ing] a migration to have exactly the dependencies that are observable as the exiting per-app last migrations at the moment of creating the migration. It would be nice if there was an option to makemigrations that just dumped all existing such info into the dependencies field.”
I did this by clobbering the Django in my virtualenv like so (based on Django 4.2 code):
# If they want to make an empty migration, make one for each app
if self.empty:
if not app_labels:
raise CommandError(
"You must supply at least one app label when using --empty."
)
# Make a fake changes() result we can pass to arrange_for_graph
changes = {app: [Migration("custom", app)] for app in app_labels}
changes = autodetector.arrange_for_graph(
changes=changes,
graph=loader.graph,
migration_name=self.migration_name,
)
dependencies = [(an, mn) for (an, mn) in loader.graph.leaf_nodes() if an in [
# hardcoded list of apps
]]
for generated_migrations in changes.values():
for migration in generated_migrations:
migration.dependencies = dependencies
self.write_migration_files(changes)
return
For the record, this will work at migration creation time until it breaks again when you alter one of the x-app models used in the RunPython operation @vanschelven as dependencies only provides a lower bound for requirements.
It will also be nightmare to squash when the time comes to do so as pointed out by Adam so while it might temporarily address problems of visitors from the future in case this is never added to core (we still need someone to champion this) it might make their future interactions with the migration framework miserable.