How to get mutiple columns from a subquery?

These are my models:

class Speed(models.Model):

    id = models.UUIDField(primary_key=True, default=uuid4)
    name = models.CharField(_('name'), max_length=128)
    is_public = models.BooleanField(default=True)

    user = models.ForeignKey(settings.AUTH_USER_MODEL, null=True, on_delete=models.CASCADE)

    feedback = models.ManyToManyField(settings.AUTH_USER_MODEL, through='SpeedFeedback', related_name='+')


class SpeedFeedback(models.Model):

    class Meta:
        constraints = [
            models.UniqueConstraint(fields=('user', 'speed'), name="fb_unique_user_speed")
            ]

    # aka direction
    vote = models.IntegerField(_('vote'), choices=Vote.choices, default=Vote.DEFAULT_STATE)

    user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    speed = models.ForeignKey(Speed, on_delete=models.CASCADE)

it’s an upvote/downvote system where each User may vote for some Speed object and each Speed object can have as many votes as there are users.

What I want to achieve is a query that returns all Speed objects and each Speed object should have additional fields from SpeedFeedback- vote and id, but only from an authenticated user (or each Speed object should have additional SpeedFeedback object as a field so I can get id and vote)

Here is the only way I was able to achieve something like that:

Speed.objects.filter(
    Q(is_public=True) | Q(user=self.request.user)
).prefetch_related(
    Prefetch('speedfeedback_set', queryset=SpeedFeedback.objects.filter(
            user=self.request.user
        ), to_attr='speed_feedback'
    )
)  

and then by using serializer.MethodField get these fields:

def get_speed_feedback(self, obj):
    if len(obj.speed_feedback) == 1:
        feedback = obj.speed_feedback[-1]
        return {
            'id': feedback.id, 
            'vote': feedback.vote
            }
    return None 

previously I tried to do the same with a subquery however I was able to retrieve only one column vote:

Speed.objects.filter(
     Q(is_public=True) | Q(user=user)
).annotate(
    speed_feedback=Subquery(
            SpeedFeedback.objects.filter(
               speed=OuterRef('pk'), user=user
            ).values('vote')
        )
)  

I feel like there is some simple way to achieve this with a subquery, especially that plural values method bothers me, however if I add values('vote', 'id') it wants an output_field and it seems like all of them are used for a single field.

In the documentation it says

There are times when a single column must be returned from a Subquery,

as if it returns multiple columns by default, maybe I’m reading it wrong.

I have also found possibly some solution here however it requires Postgres/Psycopg being set up, and I would rather have something that works with SQLite for now, it also seems too complex for my case.

So is there some better way or is my query that uses prefetch_related good enough?

Doing this with a subquery in pure sql would require a join between the Speed table and the subquery.
If using pure sql, you could even directly join between the Speed table and the speedFeedback table and still obtain a single result row for each Speed because used filters in combination with the defined unique constraint allow that.

But, you can’t do such “explicit join” with the “Django query language”. So, the prefetch_related approach is a good choice.

If you still want to do one single query, you could use a raw query to add the additional annotated attributes to returned Speed objects (see Performing raw SQL queries | Django documentation | Django). E.g.:

Speed.objects.raw("SELECT a.*, b.id as spf_id, b.vote as spf_vote FROM your_app_speed a LEFT OUTER JOIN your_app_speedfeedback b ON b.speed_id=a.id AND b.user_id=%(user_id)s WHERE a.is_public=1 OR a.user_id=%(user_id)s ", {"user_id": request.user.pk})

Please, note that such raw query may require some adaptation depending on the database backend (in particular on the a.is_public boolean filter). I didn’t directly tested it and it may also require the use of the translation parameter of raw method (see Performing raw SQL queries | Django documentation | Django) to map the a.* returned columns to Speed attributes (this may be already done by Django as it could not affect attributes with a . in their name on Speed objects).

The raw solution is less “flexible” (because dependent on database engine) and is also not guaranted to give significant performance improvements, so if you are not proficient with raw sql or if you want to be able to work with different database engines, you should consider the prefetch_related solution as good enough.

1 Like

Thank you very much for the response, I will stick to the prefetech_related option then and move on with the project.

Side note: If you’re using PostgreSQL, see the ArraySubquery docs.

Is there some meaningful performance gain with a subquery as I might consider this method?

It depends, mostly upon the size of the tables involved.

I’d guess almost certainly not for tables less than about 1000 rows, and almost definitely yes for tables > 1,000,000 rows.

But where it becomes “meaningful” also depends upon your defintion of “meaningful”.

Benchmarking with actual data is the only way to be sure.

Then I will try this method since it doesn’t cost me anything and I’ll use Postgres anyway, thanks!

I’ve just tried it out, it works like a charm. Thanks again!