Hello Community!
I have a feature request / enhancement that I’d like to discuss here. It comes from a situation I encountered when I took over maintenance of a Wagtail CMS package called wagtail-ab-testing. To give a quick overview of what it does: it supports CMS editors in testing two versions of the same Wagtail page, keeping track of analytics like views and conversions.
I noticed a raw SQL query is used to insert or update (UPSERT) a record atomically (link to relevant source code, if you want to have a look yourself)
This raw query initially confused me, why is it there? It turns out it is there because Django does not support atomic upserts without taking a lock on the row, which hurts performance. This is important because it is quite a hot code path.
The raw query (for PostgreSQL) looks somewhat like this:
INSERT INTO %s (ab_test_id, version, date, hour, participants, conversions)
VALUES (%ss, %ss, %%s, %%s, %%s, %%s)
ON CONFLICT (ab_test_id, version, date, hour)
DO UPDATE SET participants = %s.participants + %%s, conversions = %s.conversions + %%s;
There are a few things to note here:
- There is an unique constraint on combination of
ab_test_id
,version
,date
, andhour
columns that prevents duplicate objects from being created. - The
participants
andconversions
columns are updated atomically; That is: these columns are incremented by the given values, not set directly. This is important for atomicity. We don’t want multiple concurrent database calls to overwrite each other.
A common way to ‘create or update’ in Django would be to use the update_or_create
method, but this internally takes a lock on the row in case of updates, which is not acceptable because, as mentioned before, my example is a hot code path with a lot of concurrent requests. We cannot afford to take a lock on the row. For reference, here’s the relevant Django source code that takes the lock
The other alternative I researched is to use bulk_create
with update_conflicts=True
but this won’t deal with F
expressions. Which are necessary for the atomicity guarantee.
Why I would like Django to support this feature
I feel like Django’s ORM capabilities would be strenghtened if it would allow developers to make efficient queries out of the box. No need for raw SQL or utility packages (as an example for PostgreSQL, there is django-pg-upsert)
Considerations
Database vendor specific SQL. not all databases use the same syntax and some pose limitations.
Quoting from what I wrote in #35793
PostgreSQL and SQLite support this syntax since PostgreSQL 9.5 and SQLite 3.24.0 respectively. MySQL and MariaDB do not support this natively, but it can be emulated using INSERT … ON DUPLICATE KEY UPDATE syntax and creating a unique index on the columns.
Oracle apparently supports something similar using the vastly different MERGE statement syntax. That might be a bit of a challenge.
How this would fit in the ORM. I’m not feeling at all qualified enough to make suggestions on how this would fit in the ORM. My, perhaps naive, take on it would be that update_or_create
already gets very close now that the create_defaults
parameter was added in Django 5.0. Perhaps we could add an argument to instruct Django to perform an actual UPSERT query instead of emulating it by taking a lock?
Hope to hear thoughts on this. Is there appetite to add support for this in Django?