prevent double delivery of unhandled database rows

Hi there,

i have a Django page with a table in Database.
This table contains rows which have to be handled by coworkers.

To mark a row as in processing and finally as handled i have a column for this in it.

Many (up to 100) Coworkers work on a special app-Page and handle this rows.
Every Coworker handles a row, and after being ready with it, coworker gets next unhandled row.

How can i prevent multiple coworkers receive the same row?

i thought about something like a Queue for Row-Ids, has someone a about how to solve this?

i hope someone has an idea to solve this problem :slight_smile:
thanks in advance

I’ve handled this in two different ways. I’m interested in learning how others have solved this.

One is to issue an filtered update and if that says it updated one record, then it means it has acquired the lock on the record. When it’s done, it “unlocks” the record.

if Record.objects.filter(id=SOME_ID, locked=False).update(locked=True):
    # This employee can work on this record now
    # Someone is already working on this record.

Note if you’ve adjusted transaction isolation level, this approach may be impacted.

Another option I’ve used PostgreSQL’s advisory lock via django-pglocks. However, these are harder to find zombie locks because they aren’t as easily accessible as they are within the db only. The the flag on the model could be exposed via the Django Admin. Even better, you could make it a datetime if you’re worried about that sort of thing.

that’s the way i would use on small count of parallel queries, or if only a single coworker is handling this.

But what happens if 2 CoWorker requests on the same time the next unlocked row?
Receive both the same row?
Or handles Django both DB Queries in a queue?
So Coworker which is 1/1000s faster will be handled first and gets DB Query handling first done and after this the other Coworker will be handled?

For this Python has Queue Modul but i wonder if Django has not such method native to handle this similar, to prevent unwanted parallel accessing.

The above database method can work if you’re using a “read commited” transaction isolation level (django’s default) in PostgreSQL and the default django transaction management.

Relevant part of the docs:

UPDATE , DELETE , SELECT FOR UPDATE , and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE , this means it is the updated version of the row that is locked and returned to the client.

I believe it’s the same for MySQL, but I’m not 100% sure on that.

I will admit things get a little hairy if there are many people competing for the same record. In that case, you may want to look into varying how the next task gets selected rather than doing it all sequentially.

This is really a case for using database locks.

The select_for_update function may be all that’s needed. You can use the nowait=True parameter when trying to acquire the lock - if the exception is raised, that means that someone else has already requested that row, and change the request to retrieve a different row. Since this is a fundamental function of the database, you don’t need to worry about concurrent requests from multiple sources.

If you’re looking for more information, see Explicit Table Locking with PostgreSQL and Django | Caktus Group for a Django-style implementation of locking, and PostgreSQL: Documentation: 13: 13.3. Explicit Locking for the details on the PostgreSQL implementation.

1 Like

select_for_update seems to be the solution.
I’ll try it. Thanks !