Migration of large amount of data from one app to another

Eldary89 · March 18, 2020, 6:17am

Hi, everyone! I am handling this problem. I should forward all recordings from one model to another. There I have about 2 million recordings with about 20 columns. Here’s my code:

def forward(frm, to, model, apps, schema_editor):

    old_model = apps.get_model(frm, model)
    new_model = apps.get_model(to, model)
    list_fields = [i.get_attname() for i in old_model._meta.fields]
    qs = old_model.objects.all().order_by('created')
    new_model.objects.all().delete()
    i = 0
    j = 10000
    q = 0
    no = new_model()
    with transaction.atomic():
        while qs.last() != q:
            qs_part = qs[i:j]
            for item in qs_part:
                for field in list_fields:
                    val = getattr(item, field)
                    setattr(no, field, val)
                no.save()
            print("{} in {} forwarded".format(qs_part.count(), model))
            q = qs_part[qs_part.count()-1]
            i += j
            j += j
        print("{} is ok at {}".format(model, datetime.datetime.now()))

Every time I get this error message:

return self.cursor.execute(sql, params)
psycopg2.OperationalError: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

adamchainz · March 18, 2020, 6:51pm

There are a number of ways to speed this up.

First, the fastest way would be to use raw SQL using the INSERT INTO .. AS SELECT ... statement. You might still need a chunked statement, but do this by PK - see below.

If you want to stay in ORM land, there are several things there. You shouldn’t use a single atomic() around the whole. A large trnasaction on the database server will consume a lot of resources. You can also avoid the [i:j] slice pattern, and instead use filter(pk__gte=n, pk__lt=n + chunk_size), iterating from the 0 to the highest PK value. I implemented a version of such iteration in Django-MySQL, haven’t translated it to PostgreSQL yet though. You can use bulk_create() to greatly speed up model execution. You can avoid repeating the qs_part.count() queries. And you can also increase the server query timeout, although it’d be best to get an estimate of how long the whole process will take before you try that.

Topic		Replies	Views
Proposal: Querying objects in batches for large collection ORM	11	4235	October 26, 2024
Handle Large scale of data Using Django	0	110	June 12, 2024
RawQuerySet Memory Issue Using Django	2	1151	November 16, 2021
How to avoid the overhead of model instances in bulk_create Using the ORM	4	3377	November 22, 2023
Database migration different app name Deployment	5	640	September 16, 2023

Migration of large amount of data from one app to another

Related topics