ORM -- ManyToMany, annotate, distinct: is this a bug ?

Hi,

I’m not sure that this is a bug, and I would like to have confirmation before I open a ticket. Suppose I have the following models:

class Auteur(models.Model):
    nom=models.CharField(max_length=20)

class Genre(models.Model):
    nom=models.CharField(max_length=20)

class Book(models.Model):
    auteurs=models.ManyToManyField(Auteur)
    genre=models.ManyToManyField(Genre)

So, Book model has two ManyToMany fields. Let’s create some data:

paul=Auteur(nom="Paul")
paul.save()
robert=Auteur(nom="Robert")
robert.save()
vincent=Auteur(nom="Vincent")
vincent.save()

roman=Genre(nom="Roman")
roman.save()
nouvelle=Genre(nom="Nouvelle")
nouvelle.save()
action=Genre(nom="Action")
action.save()

b1=Book()
b1.save()
b1.auteurs.set([paul, vincent])
b1.genre.set([roman, action])

b2=Book()
b2.save()
b2.auteurs.set([paul, robert])
b2.genre.set([nouvelle, action])

Now, if I want a Book queryset with annotation of the number of authors, I can do:

books=Book.objects.annotate(NumAuteurs=Count('auteurs'))

It works fine:

>>> books[0].NumAuteurs
2

But suppose I have to filter Book queryset like this:

books=Book.objects.filter(genre__in=[action, roman])

Of course, the first book appears twice:

>>> books
<QuerySet [<Book: Book object (1)>, <Book: Book object (1)>, <Book: Book object (2)>]>

So, it’s convenient to use distinct:

books=Book.objects.filter(genre__in=[action, roman])

Now, I have the two books, one time each:

>>> books
<QuerySet [<Book: Book object (1)>, <Book: Book object (2)>]>

Now, I use this queryset to get the number of authors, like before:

books=books.annotate(NumAuteurs=Count('auteurs'))

I would expect a correct number of authors. But:

>>> books[0].NumAuteurs
4

And more surprisingly, result is not correct even if one annotates before filtering:

books=Book.objects.annotate(NumAuteurs=Count('auteurs'))
>>> books[0].NumAuteurs
2
>>> books=books.filter(genre__in=[action, roman]).distinct()
>>> books[0].NumAuteurs
4

So, is this the expecting behavior ?

When in doubt, check the SQL being generated.
e.g. print(books.query)

You’ll probably find that you are getting a cartesian join between the two many-to-many relationships and that this would be the expected behavior in that case.

I checked it. One get

SELECT DISTINCT "base_book"."id", COUNT("base_book_auteurs"."auteur_id") AS "NumAuteurs"
FROM "base_book" LEFT OUTER JOIN "base_book_auteurs" ON ("base_book"."id" = "base_book_auteurs"."book_id")
			INNER JOIN "base_book_genre" ON ("base_book"."id" = "base_book_genre"."book_id")
			WHERE "base_book_genre"."genre_id" IN (3, 1) GROUP BY "base_book"."id";

But I was expecting something like:

SELECT "b_id", COUNT("base_book_auteurs"."auteur_id") AS N
FROM (
	SELECT DISTINCT "base_book"."id" AS "b_id"
	FROM "base_book" INNER JOIN "base_book_genre" ON ("b_id" = "base_book_genre"."book_id")
	WHERE "base_book_genre"."genre_id" IN (3,1)
	)
LEFT OUTER JOIN "base_book_auteurs" ON ("b_id" = "base_book_auteurs"."book_id")
GROUP BY "b_id";