Having some trouble understanding how prefetch_related works

MSoup · January 25, 2022, 2:56am

I’m having trouble translating specific joins to Django (original code in c#). The culprit is simply that I really lack DB knowledge and what’s going on under the code. I will spend some time today trying to understand joins better, but I’d really like to get over this rut.

For example I see this kind of code:

db.someTable1.Join(
  db.someTable2,
  l => new { l.machineNumber, l.clientId },
  r => new { r.machineNumber, r.clientId },
  (l, r) => new 
  {
    historyId = l.id,
    l.machineNumber,
    l.clientId,
    ...
  }
)

The DB tables do not contain any foreign keys and I am not able to modify the database. I think this means that select_related is out of the question.

My attempt is

modelForTable1.objects.prefetch_related('machinenumber__machinenumberids_set', 'clientid__machinenumberids_set')

My 3 questions are:

Am I on the right track?
If I am doing joins with more than one table, should I be chaining them together? Or should I put putting everything as parameters into the method? For example modelForTable1.objects.prefetch_related('machinenumber__machinenumberids_set', 'clientid__machinenumberids_set', 'clientid__anothertable_set')
Is there a better way of doing joins that do not modify the existing DB?

Thanks in advance!

KenWhitesell · January 25, 2022, 3:03am

The select_related and prefetch_related functions have nothing do to with being able to join tables. They are only a performance-enhancement facility. (Well, not strictly true, but close enough for this discussion.)

You don’t specify joins. The ORM will join tables as necessary based on ForeignKey fields.

Now, if you don’t have any foreign keys, then you can’t join tables together directly. Depending upon what the precise needs are, you might be able to augment results using aggregation or annotation. But as often as not, you’ll still end up needed to do extra queries for this.

MSoup · January 25, 2022, 5:38am

Hm I’m convinced that the above does have to do with joins! Django Docs say that select_related works by creating an SQL join and including the fields of the related object in the SELECT statement, but maybe I misinterpreted your statement.

It sounds like the best thing I can do is aggregation or annotation. Maybe it’s fine to take a performance hit.

Actually, after a bit more searching, I came across this post where the top answer suggests that: The Django ForeignKey is different from SQL ForeignKey. Django ForeignKey just represent a relation, it can specify whether to use database constraints.

Does that mean I can actually just change my models to have foreignkey fields and it won’t affect accessing the DB and whatnot?

KenWhitesell · January 25, 2022, 12:15pm

I think that may be the case here.

[Emphasis changed]

While select_related will cause the ORM to use joins in cases where it otherwise wouldn’t, it isn’t the only clause causing the ORM to use a join, and prefetch_related does something completely different.

Yet another case where SO may be doing more harm than good.

No, that interpretation is incorrect. A Django ForeignKey field is a database foreign key.

No, this is not correct. Your models are a Python representation of a database table, with the added possibility of adding functionality at the Python layer. Every field defined in your model must correspond to a column in the table.

That all really depends upon the specific query and the quantity of data involved. Whether or not there’s really going to be a performance hit is going to depend upon many factors. The PostgreSQL query planner really is quite amazing.

alxvallejo · January 2, 2025, 4:09pm

That is completely false. Are you serious?

KenWhitesell · January 2, 2025, 4:46pm

Please explain, how is this false? (Please include your references.) Also please ensure that you have read the complete thread, and are addressing this in the full context of everything I have written above.

There is no physical need to ever use select_related or prefetch_related aside from their performance enhancements.

See the docs for select_related and prefetch_related.

alxvallejo · January 2, 2025, 5:37pm

From the documentation:

select_related works by creating an SQL join and including the fields of the related object in the SELECT statement.

It literally creates a SQL join. Saying is has nothing to do with joining tables misleading to say the least.

KenWhitesell · January 2, 2025, 5:42pm

Unfortunately, you are misquoting me. I never wrote what you’re claiming I wrote.

From my original reply:

And the re-emphasized version below it:

That key phrase is even in the fragment of the reply you quoted originally.

And, the key part of this conversation in context is the remainder of the statement:

and

philgyford · January 3, 2025, 8:55am

I must admit, @KenWhitesell, that I don’t know what you mean. This:

(whichever part you emphasise) reads to me like “select_related and prefetch_related do not create queries that use joins”. Which is, I think, also how @alxvallejo is reading it.

But you’re saying that’s not what you mean. I believe you, but I don’t understand what you do actually mean. Could you expand on it a little so that we understand?

KenWhitesell · January 3, 2025, 1:22pm

Let’s see if I can do better here.

Warning: This is going to be a long answer, because a full explanation requires both some background and context.

First, it should always be understood that every answer I write here is targeted specifically to the question being asked, and in some cases, to the questioner.

Reading an answer without first reading the question (or the complete thread to that point) can be misleading. Likewise, looking at part of an answer, without considering the response as a whole, is also likely to be misleading.

Now, to establish the context behind my answer in this specific situation, I’m going to quote (and extract) what I think are the most relevent parts of the original post.

…

MSoup:

The DB tables do not contain any foreign keys and I am not able to modify the database. I think this means that select_related is out of the question.

My attempt is
modelForTable1.objects.prefetch_related('machinenumber__machinenumberids_set', 'clientid__machinenumberids_set')

My interpretation of all this is that the OP is thinking that they can create SQL joins between unrelated tables (no foreign keys) using either or both prefetch_related and select_related.

The implications of that, lead me to believe that the OP may be of the belief that it’s those two functions that create the ability to join two tables. It’s that belief that I was addressing.

Side Note: This is not an unusual interpretation or line of thinking. There have been about a half dozen other questions that I have answered with this same misconception. Questions where people want to join tables not having any relationship connection between them, and thinking they can use one of those two functions to force the ORM to join them.

To the extent of my knowledge, this is not an accurate belief. (I am quite willing to be corrected on this if my knowledge is out-of-date - my knowledge of the ORM does not go that deep.)

It is not the usage of select_related or prefetch_related that creates the ability to join tables. In the general case, for the ORM to create any join at the SQL layer, there must be an FK somewhere in the mix.

Also take note of the caveat in the last sentence at the end of this oft-quoted paragraph:

This is also to emphasize that I’m not making a universal statement here - this is intended to apply to this question.

Finally, the first sentence of the last paragraph wraps up the original answer:

Again - addressing the original question. You can’t add either select_related or prefetch_related to an ORM statement to join tables that are not related by foreign keys. Neither of those functions create an ability to perform a join.

Then, the other part of this is in my response to the first person asking about my original answer:

This is to emphasize the point that select_related does not create the ability for the ORM to use a join. It is the foreign key that creates that ability. The select_related function instructs the ORM to use that ability to join the tables. (And, that there are many other ways in which a join can be created automatically, such as in filter, annotation, or values functions. )

Hope this helps.

philgyford · January 3, 2025, 4:52pm

Thanks Ken. Yes, I had read every word of every comment here, but I still took your posts to mean something that you didn’t intend.

This, in context:

Still sounds like “these functions have nothing to do with joining tables”. i.e., one aspect of which would be that they don’t use joins.

Whereas I now realise you meant “these functions do not create the ability to join tables that do not already have an FK relationship”.

I think that’s what you meant?

KenWhitesell · January 3, 2025, 5:19pm

Close, but my statement is more broad than that. If you wanted the briefest possible phrasing, it would be:

There’s no need to qualify that statement with the “do not already have …”, the short version stands on its own. Those functions do not create the ability to join tables.

Topic		Replies	Views
How do I join related querysets before passing to a template? Using Django	1	578	April 6, 2020
Prefetching spatially related data Using the ORM	6	61	October 2, 2024
Django joins through table Using the ORM	10	4559	September 6, 2023
Does model.related_name.all() perform a new query to DB? Using the ORM	7	462	September 12, 2023
Help accessing data from connected models in a single query Using the ORM	9	98	October 10, 2024

Having some trouble understanding how prefetch_related works

Related topics