Django ORM `queryset.distinct()` gets duplicate values due to `.ordered_by()` on the column

To make the long story short, we have a QuerySet of transactions of the same product_id, ordered by the primary key id.

qs_transactions = Transaction.active_objects.filter(product_id=fixed_value).order_by('id')
...

def function(qs_transactions)
  qs_product_id = qs_transactions.values('product_id').distinct()
  ...

However, when retrieving the distinct product_id, the query set returns duplications. It seems that the .ordered_by(id) forced the query set to include column id, so the distinct query got the same product_id with different id’s.

SELECT DISTINCT
    `app_transaction`.`product_id`
  , `app_transaction`.`id` 
FROM `app_transaction` 
WHERE 
  ... conditions ...
ORDER BY `app_transaction`.`id` ASC

We tried with SQL, and if remove app_transaction.id and leave everything else the same, the query returns the distinct product_id as expected. We don’t know how to do this with ORM, though.

SELECT DISTINCT
    `app_transaction`.`product_id`
FROM `app_transaction` 
WHERE 
  ... conditions ...
ORDER BY `app_transaction`.`id` ASC

Our Question:

The Python program appended the .ordered_by() clause to the query set, and then passed it as an argument to a function call. And we are trying to get the distinct value in the function.

We wonder how to get the expected distinct value.

Hello there!

I assume you are using MySQL here due to the backtip quoting of your identifiers?

I’m not sure which version of MySQL you are running but when I try running your adjusted query on MySQL 8 I get this very explicit message

CREATE TABLE app_transaction (
  id int NOT NULL AUTO_INCREMENT,
  product_id int NOT NULL,
  PRIMARY KEY (id)
);
SELECT DISTINCT product_id FROM app_transaction ORDER BY id;
Expression #1 of ORDER BY clause is not in SELECT list, references column 'app_transaction.id' which is not in SELECT list; this is incompatible with DISTINCT

The long story short is that most SQL backends don’t allow ORDER BY fields that are not selected when using DISTINCT (good article on the topic) so Django implicitly add ordering columns when this is the case.

This has been a known footgun for a while and has been extensively discussed in the past but we haven’t proceeded with a solution yet.

In an ideal world we’d first start by erroring out explicitly when trying to do such thing and eventually add support for the ORM to do the subquery pushdown automatically

SELECT DISTINCT product_id
FROM (
    SELECT product_id, id
    FROM app_transaction
    ORDER BY id
) subquery

Hope that helps!

1 Like

Hi, @charettes ,

Thank you for your input.

The database is 10.5.21-MariaDB, and we just tried again, SELECT DISTINCT product_id FROM app_transaction ORDER BY id; works in this version.

For the others, yes, we agree with you. And thank you again for your help.