Migration of large amount of data from one app to another

Hi, everyone! I am handling this problem. I should forward all recordings from one model to another. There I have about 2 million recordings with about 20 columns. Here’s my code:

def forward(frm, to, model, apps, schema_editor):

    old_model = apps.get_model(frm, model)
    new_model = apps.get_model(to, model)
    list_fields = [i.get_attname() for i in old_model._meta.fields]
    qs = old_model.objects.all().order_by('created')
    new_model.objects.all().delete()
    i = 0
    j = 10000
    q = 0
    no = new_model()
    with transaction.atomic():
        while qs.last() != q:
            qs_part = qs[i:j]
            for item in qs_part:
                for field in list_fields:
                    val = getattr(item, field)
                    setattr(no, field, val)
                no.save()
            print("{} in {} forwarded".format(qs_part.count(), model))
            q = qs_part[qs_part.count()-1]
            i += j
            j += j
        print("{} is ok at {}".format(model, datetime.datetime.now()))

Every time I get this error message:

return self.cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

There are a number of ways to speed this up.

First, the fastest way would be to use raw SQL using the INSERT INTO .. AS SELECT ... statement. You might still need a chunked statement, but do this by PK - see below.

If you want to stay in ORM land, there are several things there. You shouldn’t use a single atomic() around the whole. A large trnasaction on the database server will consume a lot of resources. You can also avoid the [i:j] slice pattern, and instead use filter(pk__gte=n, pk__lt=n + chunk_size), iterating from the 0 to the highest PK value. I implemented a version of such iteration in Django-MySQL, haven’t translated it to PostgreSQL yet though. You can use bulk_create() to greatly speed up model execution. You can avoid repeating the qs_part.count() queries. And you can also increase the server query timeout, although it’d be best to get an estimate of how long the whole process will take before you try that.