Manipulating data during migration (RunPython)

Hi, I’m working on a PR for pythondotorg. While my solution works, I’m not sure if I’m following best practices. I couldn’t find many resources on this topic, so I would appreciate any advice.

Original model:

class Membership(models.Model):
    BASIC = 0
    SUPPORTING = 1
    SPONSOR = 2
    MANAGING = 3
    CONTRIBUTING = 4
    FELLOW = 5

    MEMBERSHIP_CHOICES = (
        (BASIC, 'Basic Member'),
        (SUPPORTING, 'Supporting Member'),
        (SPONSOR, 'Sponsor Member'),
        (MANAGING, 'Managing Member'),
        (CONTRIBUTING, 'Contributing Member'),
        (FELLOW, 'Fellow'),
    )

    membership_type = models.IntegerField(default=BASIC, choices=MEMBERSHIP_CHOICES)
...

New model:

class Membership(models.Model):
    BASIC = 0
    SUPPORTING = 1
    CONTRIBUTING = 2
    FELLOW = 3

    MEMBERSHIP_CHOICES = (
        (BASIC, 'Basic Member'),
        (SUPPORTING, 'Supporting Member'),
        (CONTRIBUTING, 'Contributing Member'),
        (FELLOW, 'Fellow'),
    )

    membership_type = models.IntegerField(default=BASIC, choices=MEMBERSHIP_CHOICES)
...

Trying to automate the process of updating records with a membership level scheduled for deletion:

# Generated by Django 4.2.17 on 2025-01-12 04:08

from django.db import migrations, models


def update_membership_levels(apps, schema_editor):
    SPONSOR = 2
    MANAGING = 3
    CONTRIBUTING = 4

    Membership = apps.get_model('users', 'Membership')
    Membership.objects.filter(
        membership_type__in=[SPONSOR, MANAGING]
    ).update(membership_type=CONTRIBUTING)


class Migration(migrations.Migration):

    dependencies = [
        ('users', '0015_alter_user_first_name'),
    ]

    operations = [
        migrations.RunPython(update_membership_levels),
        migrations.AlterField(
            model_name='membership',
            name='membership_type',
            field=models.IntegerField(choices=[(0, 'Basic Member'), (1, 'Supporting Member'), (2, 'Contributing Member'), (3, 'Fellow')], default=0),
        ),
    ]

My understanding is that this approach uses the legacy model; therefore, membership_level 4 is still available. Any records that match SPONSOR or MANAGING are updated to CONTRIBUTING. Then, when AlterField is applied, CONTRIBUTING (4) is changed to CONTRIBUTING (2). Is this correct?

After a quick search on GitHub, I wasn’t able to find any instances of this approach being used, so I wonder if it’s uncommon or considered a non-standard practice. Any advice or clarification on the underlying behavior would be greatly appreciated.

The choices on the field is just about validation. The AlterField doesn’t change the data in the row, so you’re right that you’ll need a data migration. “CONTRIBUTING” doesn’t exist in the database, only “4”, which means your migration needs to consider more about the values than their human-readable names, so the entries you need to update are slightly different to what’s in your data migration.

Therefore, you need to migrate:

  • 0 → 0
  • 1 → 1
  • 2 → 2
  • 3 → 2
  • 4 → 2
  • 5 → 3

The first 3 are noops, meaning there’s only 2 cases you need to be careful of. Since I suspect this table is large on python.org, this might be best updated using a CASE, rather than 3 individual queries.

AlterField won’t change any of the rows values, just the constraints around the values in the rows, which means you need to make sure the row values are correct before changing the constraints.

Personally, I tend to handle these types of fields as CharField. It’s larger, but means the migrations are much simpler when adding or removing entries (since you don’t need to shuffle values which don’t change).

You’ll need to be careful about what happens to the system when this migration is part way through, but that’s one more for the team reviewing your PR.

Another point to consider is: Do the membership type values have to be consecutive or can you leave “holes” in there? E.g, why do you want to change the value for FELLOW from 5 to 3? Can’t you keep it at 5? If not, why not? Because then you’d only have to reassign the membership types which have been removed.

That being said, following @theorangeone 's advice and making this a CharField while at it would probably be clearer; the database values would make sense even without having the code around then, and there’s always value in that. Maybe except if you need to order by membership type or something, but you could still use something like 10_basic, 20_supporting etc., where values can be stringly ordered correctly while still themselves being clearer than an integer.