Using in_bulk()

Hello. When using queryset.in_bulk(), I noticed one oddity.
If you call this method together with the order_by(“field_1”, “field_2”).distinct(“field_1”) and specifying field_name=“field_1”, then when the result is received, the sorting order of the elements passed to order_by() is preserved. Code example:

locations = (
    UserLocation.objects.order_by("user_id", "-date_create")
    .distinct("user_id")
    .in_bulk(field_name="user_id")
)
print(locations)

>>>{2693: <UserLocation: 2023-03-01 15:17>}

However, if you specify an additional id_list argument in the in_bulk method, the sorting specified earlier in the request disappears, and as a result, another instance of the model is returned. Code example:

locations = (
    UserLocation.objects.order_by("user_id", "-date_create")
    .distinct("user_id")
    .in_bulk([2693], field_name="user_id")
)
print(locations)

>>>{2693: <UserLocation: 2023-03-01 14:27>}

What was the purpose of canceling the previously specified sorting when specifying id_list (called .order_by() without arguments) and is there any way to override it

1 Like

What is the actual value of date_create for UserLocation with id = 2693? (I would guess that it’s “2023-03-01 14:27”.)

In your second example, you’re specifically referring to a row by it’s primary key. It’s not that it’s ignoring the sort, it’s that the sort is irrelevant.

Following the git blame from django/db/models/query.py gets us to a change from 11 years ago. This is from the following ticket. I’m not sure your usecase was considered. #15116 (Don't ORDER BY when using .in_bulk()) – Django

You could write your own version of this.

locations = {
    location.user_id: location
    for location in UserLocation.objects.order_by("user_id", "-date_create").distinct("user_id")
}

A user with id 2693 has several locations that differ in the date_create. The most recent is with a time of 15:17. When sorting queryset by “-date_create” I want to get the most recent record. Therefore, I was upset that the sorting is reset when passing the id_list parameter

Thanks for the tip. I usually use something like your code. Today I found out about the existence of the in_bulk() method, at first I was happy, but then I was a little upset that it doesn’t work exactly as I thought. Judging by the change to which you attached a link, it was previously assumed that when this method works, the sorting will be reset in any case. It’s a pity if so. It would be great if the sorting on the contrary is preserved for all use cases.

Thanks for the clarification. Admittedly, I think I’ve only ever used “in_bulk” once or twice, and that was with picking up code someone else had written. I don’t think I’ve ever seen it used with something other than a unique key.

In my opinion, the sorting from queryset was removed before it was possible to pass different field_name to the in_bulk() method. Apparently, it was not taken into account that when sorting is reset, the result obtained when using DISTINCT changes.

Yeah, I think this might be a bug. @felixxm what do you think about in_bulk() calling .order_by()?

Agreed, we shouldn’t clear ordering anymore. Twelve years ago it was useless as dictionaries didn’t preserve ordering anyway.

2 Likes

Thanks Mariusz! @Lajilit you should create a ticket. Here are the docs for getting started on reporting an issue. It’s probably worthwhile to link to this thread in the ticket’s body as well.

1 Like

Thanks. I have never created tickets. Is it done right?

I only dabble in the issue tracker, but it appears that you’re doing things generally well. generally don’t change the status and leave that up to the Django fellows.

Thank you for tackling this bug and ticket!