Manipulating data during migration (RunPython)

Hi, I’m working on a PR for pythondotorg. While my solution works, I’m not sure if I’m following best practices. I couldn’t find many resources on this topic, so I would appreciate any advice.

Original model:

class Membership(models.Model):
    BASIC = 0
    SUPPORTING = 1
    SPONSOR = 2
    MANAGING = 3
    CONTRIBUTING = 4
    FELLOW = 5

    MEMBERSHIP_CHOICES = (
        (BASIC, 'Basic Member'),
        (SUPPORTING, 'Supporting Member'),
        (SPONSOR, 'Sponsor Member'),
        (MANAGING, 'Managing Member'),
        (CONTRIBUTING, 'Contributing Member'),
        (FELLOW, 'Fellow'),
    )

    membership_type = models.IntegerField(default=BASIC, choices=MEMBERSHIP_CHOICES)
...

New model:

class Membership(models.Model):
    BASIC = 0
    SUPPORTING = 1
    CONTRIBUTING = 2
    FELLOW = 3

    MEMBERSHIP_CHOICES = (
        (BASIC, 'Basic Member'),
        (SUPPORTING, 'Supporting Member'),
        (CONTRIBUTING, 'Contributing Member'),
        (FELLOW, 'Fellow'),
    )

    membership_type = models.IntegerField(default=BASIC, choices=MEMBERSHIP_CHOICES)
...

Trying to automate the process of updating records with a membership level scheduled for deletion:

# Generated by Django 4.2.17 on 2025-01-12 04:08

from django.db import migrations, models


def update_membership_levels(apps, schema_editor):
    SPONSOR = 2
    MANAGING = 3
    CONTRIBUTING = 4

    Membership = apps.get_model('users', 'Membership')
    Membership.objects.filter(
        membership_type__in=[SPONSOR, MANAGING]
    ).update(membership_type=CONTRIBUTING)


class Migration(migrations.Migration):

    dependencies = [
        ('users', '0015_alter_user_first_name'),
    ]

    operations = [
        migrations.RunPython(update_membership_levels),
        migrations.AlterField(
            model_name='membership',
            name='membership_type',
            field=models.IntegerField(choices=[(0, 'Basic Member'), (1, 'Supporting Member'), (2, 'Contributing Member'), (3, 'Fellow')], default=0),
        ),
    ]

My understanding is that this approach uses the legacy model; therefore, membership_level 4 is still available. Any records that match SPONSOR or MANAGING are updated to CONTRIBUTING. Then, when AlterField is applied, CONTRIBUTING (4) is changed to CONTRIBUTING (2). Is this correct?

After a quick search on GitHub, I wasn’t able to find any instances of this approach being used, so I wonder if it’s uncommon or considered a non-standard practice. Any advice or clarification on the underlying behavior would be greatly appreciated.

The choices on the field is just about validation. The AlterField doesn’t change the data in the row, so you’re right that you’ll need a data migration. “CONTRIBUTING” doesn’t exist in the database, only “4”, which means your migration needs to consider more about the values than their human-readable names, so the entries you need to update are slightly different to what’s in your data migration.

Therefore, you need to migrate:

  • 0 → 0
  • 1 → 1
  • 2 → 2
  • 3 → 2
  • 4 → 2
  • 5 → 3

The first 3 are noops, meaning there’s only 2 cases you need to be careful of. Since I suspect this table is large on python.org, this might be best updated using a CASE, rather than 3 individual queries.

AlterField won’t change any of the rows values, just the constraints around the values in the rows, which means you need to make sure the row values are correct before changing the constraints.

Personally, I tend to handle these types of fields as CharField. It’s larger, but means the migrations are much simpler when adding or removing entries (since you don’t need to shuffle values which don’t change).

You’ll need to be careful about what happens to the system when this migration is part way through, but that’s one more for the team reviewing your PR.

Another point to consider is: Do the membership type values have to be consecutive or can you leave “holes” in there? E.g, why do you want to change the value for FELLOW from 5 to 3? Can’t you keep it at 5? If not, why not? Because then you’d only have to reassign the membership types which have been removed.

That being said, following @theorangeone 's advice and making this a CharField while at it would probably be clearer; the database values would make sense even without having the code around then, and there’s always value in that. Maybe except if you need to order by membership type or something, but you could still use something like 10_basic, 20_supporting etc., where values can be stringly ordered correctly while still themselves being clearer than an integer.

Thanks! This ended up being an interesting problem to think through since I haven’t really dove too deep into migrations yet. So I appreciate the added context and advice, @matthiask and @theorangeone !

So maybe the following could work:

  • Add a data migration with the associated SQL CASE. Maintain Django’s transactional state. Implement reverse logic? Might be tough unless I were to save the legacy values.
  • Add a migration specifically for the schema update.
  • Add a migration unit test.

It’ll probably be a while before they can give me feedback, and I’m not sure whether they prefer IntegerField here. For now, I’ll stick with Int but I’ll mention it and update it later if they’d rather switch.

I don’t generally support reverse logic in my migrations. I do it if it’s easy and/or if there’s a chance I’m going to need it.

You could (and maybe should anyway?) keep a backup of the data somewhere. This data could also be used to view the old values if you need them.

1 Like