About Lookups that span relationships

Hi,Experts,
I’m new to Django and I like it very much. There is something make me confused about Loopups that span relationships.

class Blog(models.Model):
name = models.CharField(max_length=100)
tagline = models.TextField()

def __str__(self):
    return self.name

class Author(models.Model):
name = models.CharField(max_length=20)
email = models.EmailField()

def __str__(self):
    return self.name

class Entry(models.Model):
blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
headline = models.CharField(max_length=255)
body_text = models.TextField()
pub_date = models.DateField()
mod_date = models.DateField()
authors = models.ManyToManyField(Author)
number_of_comments = models.IntegerField()
number_of_pingbacks = models.IntegerField()
rating = models.IntegerField()

def __str__(self):
    return self.headline

I think the following should return the same results.

blogs=Blog.objects.filter(entry__id=1)
blogs=Blog.objects.filter(entry__authors__entry__id=1) # just for learning

But, it did not return the same results.

Blog.objects.filter(entry__id=1)
<QuerySet [<Blog: Less whether kid much shake present.
Team expert realize above.>]>

Blog.objects.filter(entry__authors__entry__id=1)
<QuerySet [<Blog: Less whether kid much shake present.
Team expert realize above.>, <Blog: Less whether kid much shake present.
Team expert realize above.>, <Blog: Less whether kid much shake present.
Team expert realize above.>]>

Blog.objects.filter(entry__authors__entry__id=1) return 3 same Blog objects .

Is there are something wrong or Please just tell me what happeded !
Thanks !

Just guessing, but I’d say the most likely situation here is that you have multiple entries relating the same author to a blog.

Look at the contents of your tables to see if you have replicated data. If you’ve got multiple copies of the relationships, you’d see these kinds of results.

Whenever you have a question about what’s happening within the ORM, it can be very instructive to look at the queries being generated.

print(Blog.objects.filter(entry__id=1).query)

and

print(Blog.objects.filter(entry__authors__entry__id=1).query)

You can see that the second query creates the multiple levels of joins, which would return multiple rows in the case of duplicate data.

1 Like

Hi, I tried creating a minimal Django project that only implements a blog app with the models you described. I then used makemigrations+migrate and jumped into the shell to test things out:

from blog.models import Blog, Author, Entry
from datetime import date

Author.objects.create(name='Jane')
Author.objects.create(name='Shantay')
Author.objects.create(name='Jean')

Blog.objects.create(name='Progruni', tagline='Where programmers unite')

e1 = Entry.objects.create(blog=Blog.objects.get(name='Progruni'), 
                          headline='Foo', 
                          body_text='Fizzbuzz', 
                          pub_date=date(year=2020, month=2, day=5), 
                          mod_date=date(year=2020, month=2, day=6), 
                          number_of_comments=0, 
                          number_of_pingbacks=0, 
                          rating=6)

e2 = Entry.objects.create(blog=Blog.objects.get(name='Progruni'), 
                          headline='Bar', 
                          body_text='Fazzbazz', 
                          pub_date=date(year=2020, month=3, day=5), 
                          mod_date=date(year=2020, month=3, day=6), 
                          number_of_comments=0, 
                          number_of_pingbacks=0, 
                          rating=6)

e1.authors.set([Author.objects.get(name='Shantay')])
e2.authors.set([Author.objects.get(name='Shantay')]) 

Blog.objects.filter(entry__id=1)
# <QuerySet [<Blog: Progruni>]>
Blog.objects.filter(entry__authors__entry__id=1)
# <QuerySet [<Blog: Progruni>, <Blog: Progruni>]>
Blog.objects.filter(entry__authors__entry__id=1).distinct()
# <QuerySet [<Blog: Progruni>]>

Looking at the SQL queries produced, as @KenWhitesell suggested:

print(Blog.objects.filter(entry__id=1).query)
# SELECT "blog_blog"."id", "blog_blog"."name", "blog_blog"."tagline" 
# FROM "blog_blog" INNER JOIN "blog_entry" 
# ON ("blog_blog"."id" = "blog_entry"."blog_id") 
# WHERE "blog_entry"."id" = 1
print(Blog.objects.filter(entry__authors__entry__id=1).query)
# SELECT "blog_blog"."id", "blog_blog"."name", "blog_blog"."tagline" 
# FROM "blog_blog" INNER JOIN "blog_entry" 
# ON ("blog_blog"."id" = "blog_entry"."blog_id") INNER JOIN "blog_entry_authors" 
# ON ("blog_entry"."id" = "blog_entry_authors"."entry_id") INNER JOIN "blog_author" 
# ON ("blog_entry_authors"."author_id" = "blog_author"."id") INNER JOIN "blog_entry_authors" T5 
# ON ("blog_author"."id" = T5."author_id") 
# WHERE T5."entry_id" = 1
print(Blog.objects.filter(entry__authors__entry__id=1).distinct().query)
# SELECT DISTINCT "blog_blog"."id", "blog_blog"."name", "blog_blog"."tagline" 
# FROM "blog_blog" INNER JOIN "blog_entry" 
# ON ("blog_blog"."id" = "blog_entry"."blog_id") INNER JOIN "blog_entry_authors" 
# ON ("blog_entry"."id" = "blog_entry_authors"."entry_id") INNER JOIN "blog_author" 
# ON ("blog_entry_authors"."author_id" = "blog_author"."id") INNER JOIN "blog_entry_authors" T5 
# ON ("blog_author"."id" = T5."author_id") 
# WHERE T5."entry_id" = 1

As you can see:

  1. Blog.objects.filter(entry__authors__entry__id=1) “duplicates” the blog name in the produced queryset. This is because two authors have ‘Shantay’ as the author, who in turn is the author of the entry with ID 1.
  2. Blog.objects.filter(entry__authors__entry__id=1).distinct() ensures that no blog objects will be repeated in the produced queryset, simply by adding DISTINCT directly after SELECT in the SQL statement used to fetch data from the database.

If it’s a good idea to use .distinct()/DISTINCT for what you want to do, I don’t know. I’m guessing that if there is a ‘cleaner’ method that does what you want while avoiding ‘duplicates’ being produced to begin with, that might be preferable.

I hope this helps :slight_smile:

1 Like

Thank you very much and I think it’s a good solution.

I still think they should have the same result according to logical analysis.
BTW, Mysql is the database server. The different results are all back from MYSQL. It does not matter with Django ORM.
Could anyone try it on other DATABASE, such as ORACLE or else ?

Thank you !
There is no duplicate data. I think it 's somethign matter with MYSQL.

Your analysis is wrong - the first query will always only return 1 row. The second query will return multiple rows when multiple entries exist in the Entry to Author many-to-many relationship table. (It would be the table named <app>_Entry_Author.)

I have tried it, on all of postgres, mysql, and sqlite3, and it works properly in all cases when there’s no duplicated rows.

Thank you, data are as following. Please help analysis :slight_smile:

Blog.objects.filter(entry__authors__entry__id=1)
<QuerySet [<Blog: bd>, <Blog: bd>, <Blog: bd>]>
Blog.objects.filter(entry__id=1)
<QuerySet [<Blog: bd>]>

Printing the query for Blog.objects.filter(entry__authors__entry__id=1) yields:

SELECT "bt_blog"."id", "bt_blog"."name", "bt_blog"."tagline" FROM "bt_blog" 
INNER JOIN "bt_entry" ON ("bt_blog"."id" = "bt_entry"."blog_id") 
INNER JOIN "bt_entry_authors" ON ("bt_entry"."id" = "bt_entry_authors"."entry_id") 
INNER JOIN "bt_author" ON ("bt_entry_authors"."author_id" = "bt_author"."id") 
INNER JOIN "bt_entry_authors" T5 ON ("bt_author"."id" = T5."author_id") WHERE T5."entry_id" = 1

Using the data as displayed in the images above, yields the final result set table:

bt_blog bt_entry bt_entry_authors bt_author bt_entry_authors(T5) WHERE
ID Name tagline id blog_id headline body_text id entry_id author_id id name id entry_id author_id
1 ba ta 2 1 b B 2 2 4 4 dd 2 2 4
2 bb tb
3 bc tc 5 3 e E
3 bc tc 10 3 j J
4 bd td 1 4 a A 1 1 2 2 bb 1 1 2 *
4 bd td 1 4 a A 1 1 2 2 bb 6 6 2
4 bd td 1 4 a A 3 1 3 3 cc 3 1 3 *
4 bd td 6 4 f F 6 6 2 2 bb 1 1 2 *
4 bd td 6 4 f F 6 6 2 2 bb 6 6 2
5 be te 3 5 c C 5 3 5 5 ee 5 3 5
5 be te 3 5 c C 5 3 5 5 ee 4 4 5
5 be te 4 5 d D 4 4 5 5 ee 5 3 5
5 be te 4 5 d D 4 4 5 5 ee 4 4 5
6 bf tf
7 bg tg
8 bh th
9 bi ti 7 9 g G
9 bi ti 9 9 i I
10 bj tj 8 10 h H

As you can see, after applying the “WHERE” clause (far right column), this query properly returns 3 rows.

Thank you very much!
Seems I get it!