Using annotate, then union from a QuerySet method

I’m not sure if I’ve found a bug or if I’m doing something wrong.

I have a QuerySet with a method that unions two QuerySets together:

class FooQuerySet(QuerySet):
    def for_user(self, user):
        return self.filter(a__b=user).union(self.filter(c__d=user)

This works fine, and gives me a union as I’d expect. However, if I try to use it with an annotated QuerySet, it doesn’t work:

Foo.objects.annotate(str_pk=Cast("pk", output_field=CharField())).for_user(user)

On inspecting the queryset, it seems like the left hand side of the union has the annotation, but it’s missing on the right hand side, leading to errors.

The query works properly if I don’t use the QuerySet method but is quite verbose and doesn’t allow me to use the more re-usable QuerySet method:

(
    Foo.objects.annotate(str_pk=Cast("pk", output_field=CharField()))
    .filter(a__b=user)
    .union(
        Foo.objects.annotate(str_pk=Cast("pk", output_field=CharField()))
        .filter(c__d=user)
    )
)

Am I imissing something here?

The .union creates a second query, you would need to add the annotation method after the filter in the union clause.
e.g., return self.filter(a__b=user).union(self.filter(c__d=user).annotate(...))
(I know that’s not what you want.)

What about if, instead of a union, you write this as an or between two Q objects?
e.g., Foo.objects.filter( Q(a__b=user) | Q(c__d=user) ).annotate(...).distinct()

Thanks Ken.

Yes, this is possible and it’s (roughly) what I’m doing now. Unfortunately on Postgres or queries involving joins can be quite slow (Avoid OR for better PostgreSQL query performance - CYBERTEC) and the tables involved can get quite large. For now it’s fast enough to use, but not sure for how much longer.

Perhaps I can find some sneaky way to copy any annotations, if they exist, to the second query :thinking:

Note that the issue they are addressing in that article is a case where the OR clause is comparing values from two different tables - that’s not the case here. I don’t think you would have the same issues when it’s two different filters on the same table.

A quick look at the explain on a (limited size) test database shows that PostgreSQL expects the OR version to be significantly faster than the UNION.

But if you really want to be sure, you may want to look at the explain (or even run an explain analyze) to see what the query planner has to say about your situation.

It wouldn’t even need to be sneaky. You could add it as a keyword parameter to the function and add it to the function call if necessary.

Is it not comparing values from two tables here? I am confused. Here a and c are going off to different tables. In the real code, there are actually a few more joins than this.

I did find a sneakier way, from some initial testing, something like this may work:

annotations = self.query.annotations
return self.filter(a__b=user).union(self.filter(c__d=user).annotate(**annotations))

If that’s a good idea or not is perhaps another question.

I see - I guess I made a bad assumption here.

In your class, are a and b foreign keys to other tables, or are they the related named for foreign keys in those other tables? If the latter, then yes - the article is on point. If the former (the assumption I had made), then it’s a different issue.

In either case, you would still want to run an explain analyze on the two different query formations to identify which one will be faster on average for your databases, configurations, and versions of PostgreSQL. (2018 was PostgreSQL 11, and there are always changes and improvements over time.)

That’s cool.

Now that’s an interesting question, because the relations are actually more like this:

....filter(foo__bars__user=user)
....filter(spam_hams_egg_wibbles__user=user)

Where the singulars are FKs and the plurals are going through the related names. I am not sure how that changes anything :thinking:

My local test data isn’t really enough to get a good explain out, but I will try to throw in enough rows that I can and see what happens, that’s indeed probably the best idea.

If the schemas are the same (same indexes, etc) and you have a “reasonable” amount of data, then the explains are likely to give you a good baseline. But yes, the analyze isn’t going to be of much value until you’re at scale.

<conjecture>

As soon as you have a reverse FK relationship in there, then yes, you’re in the category similar to what the article is talking about. If every link in that chain is a forward reference, then it is a reference to a single field - you’re not examining multiple rows to identify a candidate row. It seems to me that the evaluation of multiple rows for a single row in the base table that would cause the combinatorial complexity for the search.

If the filter were egg__chicken__coop__farmer=user, then I can follow this chain to identify the farmer by referencing multiple tables - but a single row in each table. I’m not dealing with a set of rows at any point in this chain. On the other hand, if my filter is coop__chickens__eggs=double_yolk, then I’m having to evaluate multiple rows to identify which coops I got my eggs with two yolks from - a much different situation.

</conjecture>

I got a decent explain out and indeed it’s much better with the union.

However…

This doesn’t seem to work for me:

class FooQuerySet(QuerySet):
    def for_account(self, user):
        return self.filter(
            thing__things__other_thing__more_things__user=user
        ).union(
            self.annotate(pk_str=Cast("pk", output_field=models.CharField())).filter(
                something__some_things__user=user
            )
        )

Leaves me with:

django.db.utils.NotSupportedError: Calling QuerySet.annotate() after union() is not supported.

When I was testing it earlier and had it working I was testing it with Foo.objects.(...).union(Foo.objects.(...)), which does seem to work.

So I’m again stuck :frowning:

I think you would need to change your for_account method to apply the annotation at both points.

e.g.,

class FooQuerySet(QuerySet):
    def for_account(self, user, annotation=dict()):
        return self.annotate(**annotation).filter(
            thing__things__other_thing__more_things__user=user
        ).union(
            self.annotate(**annotation).filter(
                something__some_things__user=user
            )
        )

Unfortunately it’s just the same. I wonder (more) if there is a bug (or missing feature) here.

How are you calling for_account?

Stupidly, it turns out :slight_smile:

It does indeed work, after all, annotation copying and all.

Thanks for all the help.