using union() with querysets that have a custom Prefetch() object

Hi,

I’m trying to do something like this (simplified for the example):

n1 = Network.objects.filter(code="net1").prefetch_related(Prefetch("station_set", queryset=Station.objects.filter(code="sta1")))
n2 = Network.objects.filter(code="net2").prefetch_related(Prefetch("station_set", queryset=Station.objects.filter(code="sta2")))
networks = n1.union(n2)

It partially works. But the problem is the following :

networks[0].station_set.all()

returns <QuerySet [<Station: sta1>]> as expected

but

networks[1].station_set.all()

returns <QuerySet []> (I would have expected it to return <QuerySet [<Station: sta2>]>)

It seems that union() method doesn’t include custom prefetches. Regular prefetches (Network.objects.filter(code="net2").prefetch_related("station_set")) seems to be returned correctly.
Is it an expected behavior or not ?

To solve this issue, I tried :

  • using n1 | n2 but the query takes more than 15 seconds to process
  • using list(chain(n1, n2)) which works but doesn’t distinct results when necessary

Does anyone encountered the same problem ? I’m wondering how to solve my issue.

Thanks in advance

Side note: Do you have a typo here or two?

You have:

but then you wrote:

Why would you expect it to return that value with the filter you have defined? Is networks[0] a reference to the same object as networks[1]?

Without seeing the model definitions, and a sufficient representation of the data in those models to demonstrate this issue, and possibly even the queries being generated, it’s going to be pretty much impossible for us to provide any kind of diagnostic assistance.

prefetch_related is applied on the resulting set once the union of results has been retrieved so if you need to prefetch a set of result that depends on the outcome of the union you must craft a queryset accordingly.

Something like the following should do (I’m making assumptions here as you didn’t provide your model definitions)

Network.objects.filter(code="net1").union(
    Network.objects.filter(code="net2")
).prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(
            Q(network__code="net1", code="stat1")
            | Q(network__code="net2", code="stat25")
        )
    )
)

If the ORM supported further filtering union of querysets it could potentially automatically do some of the above but it’s not the case yet.

Hi Ken,
Sorry for that. I fixed the typo in the first post

A simplified models.py would look like :

class Network(models.Model):
    code = models.CharField(max_length=8)


class Station(models.Model):
    code = models.CharField(max_length=8)
    network = models.ForeignKey(Network, on_delete=models.PROTECT)

    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["code", "network"], name="station_code_network_id_unique"
            )
        ]

So in my example I have 2 networks (net1 and net2) and 2 stations (sta1 related to net1 and sta2 related to net2)

Well it looks like calling prefetch_related after union is disallowed which I’m unsure why but if you bundle it up in your first queryset everything should work as expected.

Have a look here.

It looks like we should lift the restriction on using prefetch_related after union as long as the queryset is dealing with model instances (as long as no values / values_list mask has been used).

1 Like

I’m trying to to something a bit different :
responding one shot to a request that would ask
"Give me all networks and their related stations that follow the following constraints :

  • station_code = sta25 related to network_code = net2
  • station_code = sta1 related to network_code = net1
  • etc…"

So, with the following corrected code, (Dry ORM)

from django.db import models
from django.db.models import Prefetch, Q

class Network(models.Model):
    code = models.CharField(max_length=8)
    
    def __str__(self) -> str:
        return self.code


class Station(models.Model):
    code = models.CharField(max_length=8)
    network = models.ForeignKey(Network, on_delete=models.PROTECT)
    
    class Meta:
        constraints = [
            models.UniqueConstraint(
                fields=["code", "network"], name="station_code_network_id_unique"
            )
        ]

    def __str__(self) -> str:
        return self.code
    
def run():
    net1 = Network.objects.create(code="net1")
    net2 = Network.objects.create(code="net2")
    stat1 = Station.objects.create(
        network=net1, code="stat1",
    )
    stat25 = Station.objects.create(
        network=net2, code="stat25",
    )
    networks = list(
        Network.objects.filter(code="net1").prefetch_related(
            Prefetch(
                "station_set",
                queryset=Station.objects.filter(code="stat1")
            )
        ).union(
            Network.objects.filter(code="net2").prefetch_related(
                Prefetch(
                    "station_set",
                    queryset=Station.objects.filter(code="stat25")
                )
            )
        )
    )
    print("with custom prefetch")
    print(networks[0].station_set.all())
    print(networks[1].station_set.all())
    networks = list(
        Network.objects.filter(code="net1").prefetch_related("station_set").union(
        Network.objects.filter(code="net2").prefetch_related("station_set")
    ))
    print("with regular prefetch")
    print(networks[0].station_set.all())
    print(networks[1].station_set.all())

I get the following result :

with custom prefetch
<QuerySet [<Station: stat1>]>
<QuerySet []>
with regular prefetch
<QuerySet [<Station: stat1>]>
<QuerySet [<Station: stat25>]>

I would have expected both results to be the same here…

In my example, union is called after prefetch_related (no exception is thrown)

PS : thanks for the ultra quick replies !

I would have expected both results to be the same here…

It won’t as the prefetch_related call of the queryset passed to union is entirely ignored.

In other words, these three are equivalent

one = Network.objects.filter(code="net1").prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(code="stat1")
    )
).union(
    Network.objects.filter(code="net2").prefetch_related(
        # XXX: This is all silently ignored.
        Prefetch(
            "station_set",
            queryset=Station.objects.filter(code="stat25")
        )
    )
)
#####
two = Network.objects.filter(code="net1").prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(code="stat1")
    )
).union(
    Network.objects.filter(code="net2")
)
#####
# Assuming `prefetch_related` was allowed after `union`
three = Network.objects.filter(code="net1").union(
    Network.objects.filter(code="net2")
).prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(code="stat1")
    )
)

You should take a look at the generated SQL tab and you’ll notice that the only way to achieve what you’re after is to pass a custom prefetch_related to the queryset prior to calling union. You’re only getting the right results in your regular prefetch example because you don’t have any station that would normally be filtered out.

1 Like

Ok, I get it now.

It will be difficult for me to use your solution as in my real use case, I have several levels of prefetched objects and as many unions as requested.
But for now, I can live with the list(chain(querysets)) option.

Do you think opening an issue on django bug tracker to support prefetch_related() querysets would be a good idea ?
Anyway, thanks for the answers !

Do you think opening an issue on django bug tracker to support prefetch_related() querysets would be a good idea ?

Allowing prefetch_related to be called after union (as long as values and values_list are not used) should be acceptable (we demonstrated here that the check can be bypassed anyway) but implementing an automatic merge logic for union of prefetches is unlikely to be given the complexity involved.

Code wise this means allowing

Network.objects.filter(code="net1").union(
    Network.objects.filter(code="net2")
).prefetch_related(...)

is likely a good candidate but getting

Network.objects.filter(code="net1").prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(code="stat1")
    )
).union(
    Network.objects.filter(code="net2").prefetch_related(
        Prefetch(
            "station_set",
            queryset=Station.objects.filter(code="stat25")
        )
    )
)

to automatically get translated to

Network.objects.filter(code="net1").union(
    Network.objects.filter(code="net2")
).prefetch_related(
    Prefetch(
        "station_set",
        queryset=Station.objects.filter(
            Q(network__code="net1", code="stat1")
            | Q(network__code="net2", code="stat25")
        )
    )
)

likely isn’t given the complexity involved. In all cases ticket that requests clarifications on queryset combination (union, intersection, difference) interactions with prefetch_related would certainly be valuable.

Allowing prefetch_related to be called after union (as long as values and values_list are not used) should be acceptable.

There’s already an open ticket proposing support for applying base queryset methods to combined querysets like union(). That likely covers support for calling prefetch_related() after union() as well.

1 Like

In all cases ticket that requests clarifications on queryset combination (union, intersection, difference) interactions with prefetch_related would certainly be valuable.

Opened #36507 (Document that only the base queryset’s prefetch_related() is preserved when using union()) – Django.

1 Like