How to prefetch a queryset that uses distinct?

Hello there! I have the following data structure:

class ProductType(Model):
    name = ...

class Feature(Model):
    product_type = ManyToManyField(ProductType, related_name="features")

class Tag(Model):
    feature = ForeignKey(Feature, related_name="tags")

class Product(Model):
    tags = ManyToManyField(Tag, related_name="products")
    product_type = ForeignKey(ProductType, related_name="products")

So a product has a product type and some tags. A tag belongs to a feature, and a feature can be on many product types. It’s not ideal but that’s what I must work with.

>>> simple_tags_qs = Tag.objects.all()
>>> simple_tags_qs.count()
254
>>> simple_product_types_qs = ProductType.objects.all()
>>> simple_product_types_qs.first().features.first().tags.count()
11

If I want a queryset of Tags with distinct ProductType, filtered by if there are products for that Tag / ProductType, I can do the following:

>>> has_products = (
    Product.objects.filter(
        tags=OuterRef("pk"),
        product_type=OuterRef("feature__product_type"),
   )
)
>>> distinct_product_tags_qs = (
    Tag.objects
    .order_by()
    .distinct("pk", "feature__product_type")
    .filter(Exists(has_products))
)
>>> distinct_product_tags_qs.count()
1005

But now I want a queryset of ProductTypes with prefetched distinct Tags keeping the above filter.

>>> product_types_with_disctinct_tags_qs = (
    ProductType.objects
    .prefetch_related(
        Prefetch(
            "features__tags",
            queryset=distinct_product_tags_qs,
        )
    )
)

However, doing this, I get duplicates of each Tag for each Feature for each ProductType.

>>> product_types_with_disctinct_tags_qs.first().features.first().tags.count()
24  // should be at most initial tag count, 11, or less if there are no products

So I guess my question is: Using Prefetch, how can I join the filtered Tag queryset with the ProductType queryset so that each distinct Tag lands in the correct ProductType tags?

It is not recommended to use related names as duplicate values.
There are systemic problems, and more than anything, it’s confusing.

class Product(Model):
    tags = ManyToManyField(Tag, related_name="pt")
    product_type = ForeignKey(ProductType, related_name="ppt")

filtering m2m.

a = Project.objects.get({some value})
b = a.tags.filter({some value}).count()

Hi, this a good point, thank you.

But does it impact the prefetching at all?

i think no.
maybe filtering prefetched queryset.

I am already filtering the prefetched queryset in distinct_product_tags_qs.

Or do you mean somehow filtering that prefetched queryset for “parent” feature__product_type? How can I do that? That would be ideal but I can’t use an OuterRef here:

>>> distinct_product_tags_qs = (
    Tag.objects
    .order_by()
    .distinct("pk", "feature__product_type")
    .filter(Exists(has_products))
    .filter(OuterRef("feature__product_type"))
)
>>> product_types_with_disctinct_tags_qs = (
    ProductType.objects
    .prefetch_related(
        Prefetch(
            "features__tags",
            queryset=distinct_product_tags_qs,
        )
    )
)
>>> product_types_with_disctinct_tags_qs.first().features.first().tags.count()
"ValueError: This queryset contains a reference to an outer query and may only be used in a subquery."

If prefetching is done, obj.manytomanyfield.all() is expected to retrieve the prefetching data.
But since I haven’t tried it, you’ll have to check it out yourself.

And you forgot Subquery.

distinct_product_tags_qs = (
    Tag.objects
    .order_by()
    .distinct("pk", "feature__product_type")
    .filter(Exists(has_products))
    .filter(OuterRef("feature__product_type"))
)
product_types_with_disctinct_tags_qs = (
    ProductType.objects
    .prefetch_related(
        Prefetch(
            "features__tags",
          **queryset=Subquery(distinct_product_tags_qs),**
        )
    )
)

With Subquery and without the omission of the outer reference filter parameter I’d introduced mistakenly:

>>> distinct_product_tags_qs = (
    Tag.objects
    .order_by()
    .distinct("pk", "feature__product_type")
    .filter(Exists(has_products))
    .filter(feature__product_type=OuterRef("feature__product_type"))
)
>>> product_types_with_disctinct_tags_qs = (
    ProductType.objects
    .prefetch_related(
        Prefetch(
            "features__tags",
          queryset=Subquery(distinct_product_tags_qs),
        )
    )
)
"AttributeError: 'Subquery' object has no attribute '_add_hints'"

This case is challenging my understanding of the ORM and its tools. I understand now that Prefetch does not join the tables in the database through a single query, but rather the ORM itself does it, querying the DB as many times as there are tables implicated. So in a simplified manner, it means that I have two (?) tables which I need the ORM to understand how I want it to join them. Or should I just compose my own SQL query for this, which I have managed to never have to do so far?

I’m using a translator, so it’s hard to understand what you’re saying.

if you want my help, could you explain them to me in an easy and detailed way??

I would like to use a queryset of ProductTypes with prefetched tags. But for each product type in that queryset, there should be distinct tags, filtered by if there exist products for that tag in that product type, and not overall.

You can use annotations and subqueries to calculate the distinct tags count for each product type. Then, prefetch the distinct tags for each feature of the product type, filtered by the specific product type and tag. Finally, you can access the distinct tags count and the distinct tags for a specific product type. Not tested but something like this could work?

from django.db.models import Count, Exists, OuterRef, Subquery, Prefetch

# Annotate the distinct tags count for each product type
distinct_tags_count_subquery = (
    Tag.objects
    .filter(
        feature__product_type=OuterRef("pk"),
        product__product_type=OuterRef("pk")
    )
    .values("feature__product_type")
    .annotate(distinct_tags_count=Count("pk", distinct=True))
    .values("distinct_tags_count")
    .order_by()
    .values("distinct_tags_count")
    .distinct()
)

# Retrieve the ProductTypes queryset with prefetched distinct Tags count
product_types_with_distinct_tags_qs = (
    ProductType.objects
    .annotate(distinct_tags_count=Subquery(distinct_tags_count_subquery))
    .prefetch_related(
        Prefetch(
            "features__tags",
            queryset=Tag.objects.filter(
                feature__product_type=OuterRef("pk"),
                product__product_type=OuterRef("pk")
            ).distinct(),
            to_attr="distinct_tags"
        )
    )
)

# Access the distinct Tags count for a specific ProductType
product_type = product_types_with_distinct_tags_qs.first()
distinct_tags_count = product_type.distinct_tags_count
distinct_tags = product_type.features.first().distinct_tags

Hi, actually I’m not interested in the count of distinct tags filtered by existing products in each product type (the first part of your solution), but in having those tags effectively prefetched:

So ideally I’d have in the end:

>>> for product_type in product_types_with_disctinct_tags_qs:
>>>    for feature in product_type.features.all():
>>>        for tag in feature.distinct_tags.all():
>>>            assert tag.products.filter(product_type=product_type).exists()
               # the tag / product_type without products would have been filtered out in prefetch

Same thing with the second part of your solution as above, I can’t use OuterRef outside a Subquery and I can’t use Subquery as a queryset in Prefetch.

So, Instead of trying to prefetch the distinct tags directly, you can create a through model for the ManyToMany relationship between ProductType and Tag and then prefetch that:

class ProductTypeTag(models.Model):
    product_type = models.ForeignKey(ProductType, on_delete=models.CASCADE)
    tag = models.ForeignKey(Tag, on_delete=models.CASCADE)

    class Meta:
        unique_together = ('product_type', 'tag')

class ProductType(Model):
    name = ...
    tags = models.ManyToManyField(Tag, through=ProductTypeTag, related_name='product_types')

class Feature(Model):
    product_type = models.ManyToManyField(ProductType, related_name="features")

class Tag(Model):
    feature = models.ForeignKey(Feature, on_delete=models.CASCADE, related_name="tags")

class Product(Model):
    tags = models.ManyToManyField(Tag, related_name="products")
    product_type = models.ForeignKey(ProductType, on_delete=models.CASCADE, related_name="products")

Now, you can create a queryset of ProductTypeTag with distinct ProductType and Tag , filtered by if there are products for that Tag / ProductType :

has_products = (
    Product.objects.filter(
        tags=OuterRef("tag"),
        product_type=OuterRef("product_type"),
    )
)
distinct_product_tags_qs = (
    ProductTypeTag.objects
    .order_by()
    .distinct("tag", "product_type")
    .filter(Exists(has_products))
)

You can prefetch the ProductTypeTag queryset when querying ProductType :

product_types_with_disctinct_tags_qs = (
    ProductType.objects
    .prefetch_related(
        Prefetch(
            "producttypetag_set",
            queryset=distinct_product_tags_qs,
        )
    )
)

Now, you can access the distinct tags for each product type like this:

for product_type in product_types_with_disctinct_tags_qs:
    tags = [ptt.tag for ptt in product_type.producttypetag_set.all()]

With this explanation, I can’t tell you exactly what you want.

I see, while this technically works, I’ve got two remarks:

First, I do need to have the tags prefetched and filtered and through the product types’ features relations. The goal is to have a tree-like structure in the end.

ProductType_1 |--> Feature_1 |--> Tag_1
              |              |--> Tag_2
              |
              |--> Feature_2 |--> Tag_3
                             |--> Tag_5

ProductType_2 |--> Feature_2 |--> Tag_4
                             |--> Tag_5
...

I’m using graphene and the goal is to populate a sort of dynamic menu using a single query.

Second, it looks like it introduces a sort of database normalization violation, since now a tag’s product types can be either tag.feature.product_types.all() or tag.product_types.all() (and reciprocally).

Ultimately, I think the data structure I have to work with is not well thought for that purpose.

I am not familiar with graphene and to me at least is still unclear what exactly you want to achieve here…

Thank you for your patience. I’m using graphene and I should also register and post on their forum. Whatever solution graphene might come up with, it should be doable with the ORM, as I assume this must be recurrent use case.

Anyway, let me illustrate:
Product types are car, airplane, motorcycle, bicycle, train, ship …
Features are engine, wings, wheels, steering wheel, radar, chassis, body, …
Tags are four cylinder engine, rotary engine, single cylinder engine, V engine, wheels and tires, split wheels, hard rubber wheels, railway wheels, …
Products are ALCO RS-1 locomotive, EMD SD70 locomotive, Jeep Cherokee, Toyota RAV4, …

Tags like rotary engine, single cylinder engine, V engine belong to the feature engine.
Features like engine belong to the product types car, airplane, motorcycle, train, ship.
Products like Toyota RAV4 have a product type, here car, and tags, here four cylinder engine, wheels and tires, …

My super big company sells all sorts of those cars, airplanes, bicycles, all with different but expected features (planes have wings and engines, etc., cars have steering wheels and engines, etc.), and within these features a selection of tags.

I’m on my company catalog page, where I see all the products. I have a tree-like filter menu on the side. The first level of that filter is product type (am I looking for a car, a ship?), the second level, in each product type, is feature (not all product types have the same features, like no wings on a car), the third level, in each feature, is tag (what kind of wheels am I looking for on a car?).

If I wanted to populate that tree-like filter menu with one queryset, with prefetched features and tags, I would do this:
ProductType.objects.prefetch_related("features", "features__tags").all()
And I would have for each product type the features that belong to it, and for each feature, the tags that belong to it.

Now the problem here is that while wheels is a feature that exists on cars and planes, for example, my company understandably doesn’t sell any cars with railway wheels. So I want to remove that tag from the filter menu in the feature wheels of the product type car, but not in the feature wheels of the product type train. I still want one queryset and prefetched features and tags. This is what I am asking.

Paste your models and queries again to continue from here.

Ok this is a reminder of the abridged version of my models, it has (I think) all the necessary elements:

class ProductType(Model):
    name = CharField(max_length=255)

class Feature(Model):
    name = CharField(max_length=255)
    product_type = ManyToManyField(ProductType, related_name="features")

class Tag(Model):
    name = CharField(max_length=255)
    feature = ForeignKey(Feature, related_name="tags")

class Product(Model):
    name = CharField(max_length=255)
    tags = ManyToManyField(Tag, related_name="products")
    product_type = ForeignKey(ProductType, related_name="products")

Again, if I do this:
ProductType.objects.prefetch_related("features", "features__tags").all()
I would have for each product type the features that belong to it, and for each feature, all the tags that belong to it.

However I want, for each feature of each product type independently, to filter the tags according to the following condition: are there any products with that tag that has the same product type than the tag.
To illustrate, I’m starting with a queryset of product types:
The first product type in my queryset being car, in the car feature called wheels I want to see the car wheel feature tags but only if there are products of type “car” with those tags. I want to have “wheels and tires” (313 products of type car and that tag) but not “railway wheels” (0 products with type car and that tag).
The second one being train, in the train feature called wheels I want to see the train wheel feature tags but only if there are products of type “train” with those tags. I want to have “railway wheels” (31 products of type train and that tag) but not “wheels and tires” (0 products with type train and that tag).
Etc.

If I understand correctly, you want something like this:

from django.db.models import OuterRef, Subquery, Exists

# Subquery to filter tags that have products with the same product type
tags_with_same_product_type = Tag.objects.filter(
    feature=OuterRef('feature'),
    products__product_type=OuterRef('product_type')
).distinct()

# Annotate the Feature model with the filtered tags
features_with_filtered_tags = Feature.objects.annotate(
    filtered_tags=Subquery(tags_with_same_product_type.values('id'))
)

# Prefetch the features with the filtered tags for each product type
product_types_with_features_and_filtered_tags = ProductType.objects.prefetch_related(
    Prefetch('features', queryset=features_with_filtered_tags)
).all()

Yes, this looks like what I’m looking for, however:

Here the features are annotated with the outcome of a subquery which does not result in an aggregation, hence the following error:
django.db.utils.ProgrammingError: more than one row returned by a subquery used as an expression

My understanding is that an annotation must be a value and not an expression. To be noted, I’m using Django 4.2, maybe it is possible in Django 5?