Race condition in auto-incrementing value

I have the following model which represents an instance of a model that is logically a child of another model, with the ordering of the children being significant.

class OrderedChild(models.Model):
    parent = models.ForeignKey(Parent)
    ordering = models.PositiveIntegerField()

     class Meta:
        ordering = ["parent_id", "ordering"]
        constraints = [
            models.UniqueConstraint(
                fields=["parent_id, "ordering"],
                name="same_parent_unique_ordering",
            ),
        ]

In order to get the correct value of the ordering, I do the following:

# inside OrderedChild class

def save(self, *args, **kwargs):
    if self.pk is None:
        self.ordering = self.get_ordering_position()


def get_ordering_position(self):
        siblings = self.get_siblings()
        max_ordering = siblings.aggregate(max_ordering=Max("ordering"))["max_ordering"]
        return max_ordering + 1 if max_ordering is not None else 0

 def get_siblings(self):
        return OrderedChild.objects.filter(parent=self.parent)

This works fine most of the time, but I’ve seen a few IntegrityErrors here and there, where the unique constraint was violated.

My understanding is that the constraint might be violated if two very close requests come to create instances of OrderedChild with the same parent. During creation, when neither of the two has been created yet, get_ordering_position might give the same result, causing the integrity error.

Wrapping this inside of a transaction wouldn’t help this specific issue: I need some sort of lock. I know there is a select_for_update method, but I’m not sure it’s the right answer in my case.

How would you approach the issue?

It would be helpful here to understand how and where these objects get ordered.

Are they always ordered in the order in which they are inserted?

Or is their position in the list specified when they are being inserted? (e.g. “Insert this new one in position 3.”)

Or can the entire list be arbitrarily reordered?

The requirements for the first are substantially easier to satisfy than the second and third.

Also, what are your requirements for retrieval and use?

  • Do you only care about retrieving them in order, or do you need to identify individual entities by ordinal position?
  • Does that position value have implications involving the processing of data, or is it just an attribute of it being displayed?

Are these insertions regular and frequent, or are they occasional? (Yes, you can do full table locks, but if this is a frequent occurrence, it will have a noticeable effect on your application.)

These are all factors that can affect the solution selected.

The items are ordered by the order in which they’re inserted (i.e. the value of the ordering field), but they can be reordered (think of a drag&drop user interface). At the end of the post I’ll go more in depth about this.

They are always retrieved together as a collection-

It’s mostly about how they’re displayed. There is a routine in the application which iterates over a collection of children, and the ordering is important here, that is, the list iterated needs to be ordered by that attribute.

Not as frequent as retrieval is, but I wouldn’t say they’re occasional.

About reodering: swapping is done in the frontend by dragging the element to a new position. This sends a PATCH request with the new ordering value for that model instance. The save method for Child takes care of reodering in a pretty complicated way, which I’m looking to make a bit simpler. But at the moment, it looks like this:

    def save(self, force_no_swap=False, *args, **kwargs):
        if (
            self.pk is not None
            and hasattr(self, "_old__ordering")
            and self._old__ordering != self._ordering
            and not force_no_swap
        ):
            # object's ordering has changed: re-arrange ordering of siblings
            # until target ordering is reached for this object
            target_ordering, self._ordering = self._ordering, self._old__ordering

            while target_ordering != self._ordering:
                to_be_swapped = (
                    self.get_next()
                    if target_ordering > self._ordering
                    else self.get_previous()
                )
                if to_be_swapped is not None:
                    if (
                        to_be_swapped._ordering < target_ordering < self._ordering
                        or to_be_swapped._ordering > target_ordering > self._ordering
                    ):
                        # object in the target position doesn't exist, and performing
                        # a swap would take this object past that position; directly
                        # assign the target position to prevent looping
                        self._ordering = target_ordering
                        self.save(force_no_swap=True)
                    else:
                        self.swap_ordering_with(to_be_swapped)
                else:
                    if (
                        target_ordering == 0
                        or target_ordering == self.get_siblings().count() - 1
                    ):
                        # edge case: trying to move the object either to position 0
                        # or to the last position, but the element to be swapped with
                        # has been deleted
                        self._ordering = target_ordering
                        self.save(force_no_swap=True)
        else:
            if self.pk is None:
                # automatically populate the _ordering field based on
                # the ordering of siblings when creating new instance
                self._ordering = self.get_ordering_position()
            super().save(*args, **kwargs)


    def get_adjacent(self, step):
        delta = step
        siblings = self.get_siblings()
        for _ in range(0, siblings.count()):
            try:
                return siblings.get(_ordering=self._ordering + delta)
            except Child.DoesNotExist:
                delta += step

        return None

    def get_next(self):
        return self.get_adjacent(1)

    def get_previous(self):
        return self.get_adjacent(-1)

    def swap_ordering_with(self, other):
        with transaction.atomic():
            self._ordering, other._ordering = other._ordering, self._ordering
            other.save(force_no_swap=True)
            self.save(force_no_swap=True)

(the field _ordering mentioned in this code is what I called ordering so far; it was just to make things a bit simpler)

As you can see, there’s even some additional code handling cases where some items have been deleted. Overall, this is a bit messy, there’s definitely room for improvement & simplification.

So there are at least two alternative ways of handling this that may provide a different performance profile depending upon some other factors.

The key to either of them is to avoid having to save the child objects simply due to reordering their position.

The objective is to minimize the number of object needing to be read/saved when this reordering occurs.

For example, the traditional relational model might have you create a second table consisting of a primary key for itself, a foreign key to the target object, and some type of “positioning” value.

This positioning value could be the PK of the next value in the list (a linked list) or a floating-point value representing its relative position in the list - where that value is calculated as being the average of its predecessor and successor objects.

The first solution requires more “bookkeeping” - you’ll be changing 3 rows for almost every position change.

The second requires reading those other two rows - but possibly causing a cascading adjustment if so many updates occur such that you end up in a position where there’s no open value between the two adjacent values.

A third option, made available by PostgreSQL’s support for array and json fields would be to store the ordered list as a single row in a reference table. Updates and reordering would only affect that one row, allowing that row to be updated as an atomic operation and only needing that one row to be locked. (You still would need to address the sematics of how those changes are going to be propagated to other people attempting to update this sequence at the same time.)

This third option would be limited by the size constraints of those data types, but since you’re talking about this in the context of someone taking manual steps to organize these, I find it unlikely that there would be such a large number of objects in these lists to create a problem.