Migrate to a different schema without loss of data

I am in a situation where I need to do some changes to my models and I’d like to come up with the best way to make those changes without losing data on a database with existing data.

This is my current scheme:

class Main(models.Model):
    oto_a = OneToOneField(A)
    oto_b = OneToOne(B)

class A(models.Model):
    field_1 = ...
    field_2 = ...

class B(models.Model):
    field_3 = ...
    field_4 = ...

class A_child(models.Model):
    father = models.ForeignKey(A)
    field_c1 = ...
    field_c2 = ...

class B_child(models.Model):
    father = models.ForeignKey(B)
    field_c3 = ...
    field_c4 = ...

These models form sort of a tree: for every instance of Main, there’s exactly one of each A and B. Moreover, for each A_child there’s a B_child: those are like siblings, and there’s a de facto one-to-one relationship between those as well.

What I’d like to do now, is to migrate to the following scheme:

class Main(models.Model):
    field_1 = ...
    field_2 = ...
    field_3 = ...
    field_4 = ...

class MainChild(models.Model):
    father = ForeignKey(Main)
    field_c1 = ... 
    field_c2 = ...
    field_c3 = ...
    field_c4 = ...

In other words, I want to resolve the one-to-one relationships into single models, as well as remove the level of indirection introduced by the models A and B and have the “merged” child class reference model Main directly.

What I’m trying to do is accomplish this without losing any data from the existing models. In other words, I want to get all the Main instances, copy the data from their former A and B related, then for the two latter models I want to walk their children and create a MainChild for each of them and copy the values of their fields into them.

Is there a best way to do this?

I have an idea that I’m not 100% confident of, which is the following:

  • I create all the new fields I need inside of Main, but don’t remove the one-to-one fields. Then I create the MainChild model.
  • I iterate over all the instances of Main, access their A and B and copy their values into the newly created fields in Main. Then I iterate through all their children and create a MainChild, then copy the fields of the children combined into the new child
  • I delete the one-to-one fields in Main

Do better ways come to mind to you? Thank you.

As I am one to try to avoid any “non-reversable” change for something like this, I think you’re really close to what I would do.

My first inclination for how I would do this would be:

  • Backup the existing data
  • Verify the backup is complete and usable
  • Create all new models with the new structure
  • Build the data for the new models from the existing models
  • Verify the new models are correct
  • Migrate the rest of your code as necessary
  • Delete the original models at your leasure.

That’s great news!

I’m having problem with this step, to be honest.

I dumped all the data I needed with manage.py dumpdata, but I’m having problems loading it back.

The problem is occurring with a specific model. The model in question uses a hashid field as its primary key. When I load the dumped data back with manage.py loaddata, this error occurs:

django.core.serializers.base.DeserializationError: Problem installing fixture "path/to/fixture" ["'4z3wXvD' value must be a positive integer or a valid Hashids string."]: (courses.event:pk=4z3wXvD) field_value was 'None'

(courses.Event being the model in question)

I had no luck searching the package documentation and github issues. I’m assuming the supplied value is in fact a valid hashids string since, well, it’s being used as the model’s primary key and was generated by the package itself. Do you have any clues how this might be solved?

EDIT:
well, I managed to fix it on my own faster than I was expecting. For anyone that might bump into this in the future: the issue was caused by my dev environment having a different HASHID_SALT setting than the production environment from which the data was loaded.