Converting Postgres SQL to Django ORM query

saichander17 · October 27, 2020, 3:35pm

I want to convert the following query to Django

select file.id, array_agg(fileregion.defect_ids) from  file 
left outer join (select file_id, jsonb_object_keys(defects) as defect_ids
from ai_fileregion ) as fileregion
on fileregion.file_id=file.id group by file.id

Following are my Django models

class File(Base):
    class Meta:
        db_table = "file"


class FileRegion(Base):
    file = models.ForeignKey(File, on_delete=models.PROTECT, related_name='file_regions')
    # defects is of the form {'<defectid1>': {}, <defectid2>: {}}
    defects = JSONField(default=dict)
    class Meta:
        db_table = "ai_fileregion"

Aim is to basically get a list of records in which one field is the file_id and another field has an array of aggregated keys from defects json field for that file_id.

Note: Each file may have more than one entry in the ai_fileregion table.

Note: There will be more chaining added to this queryset. So, getting results in a different format to server and modifying it at the application level is not an option.

CodenameTim · October 27, 2020, 3:50pm

There is a Django expression for array_agg. You should be able to create your own expression for jsonb_object_keys following the docs here or you might be able to use a Func.

Once you have those created, you might be able to use them in annotate() calls like other aggregates. I’m not sure if it’ll all work as I haven’t done something like this and I’m not sure how ArrayAgg works in conjunction with a Subquery instance. That said, I would expect it to look like the following:

from django.db.models import Func, OuterRef, Subquery
from django.db.postgres.aggregates import ArrayAgg

defect_ids = FileRegion.objects.filter(
    file=OuterRef('id')
).annotate(
    defect_ids=Func(F('defects'), function='jsonb_object_keys')
).values('defect_ids')

files = File.objects.annotate(defect_ids=Subquery(defect_ids[:1])).annotate(
    defect_array=ArrayAgg('defect_ids')
)

Another possibility would be:

file_regions = FileRegion.objects.select_related('file').annotate(
    defect_ids=Func(F('defects'), function='jsonb_object_keys')
).annotate(
    defect_array=ArrayAgg('defect_ids')
)

The select_related('file') would pull the file so you could access it as file_regions.first().file

saichander17 · October 27, 2020, 4:46pm

@CodenameTim, Thank you so much for giving a detailed answer. I still have a few queries in the approach you suggested.

I know for sure that the second approach doesn’t work because postgres doesn’t support set returning function as argument to aggregate functions. It will throw an error saying aggregate function calls cannot contain set-returning function calls

As for the first approach, The problem is that each file can have multiple file regions. Also, when we use jsonb_object_keys it will split all keys as different records. So, when we use defect_ids[:1] in the subquery, will it not loose the information about the other records from defect_ids?

CodenameTim · October 27, 2020, 5:47pm

There’s definitely a chance of that, I can’t say for certain it will work. You can inspect the query with print(qs.query) to verify it’s generating a query that will work for you.

saichander17 · October 31, 2020, 2:08am

@CodenameTim, Yes, I verified it and it’s happening as I said earlier. Any other suggestions on how I could approach this?

Topic		Replies	Views
Converting jsonb_each postgresql query to Django ORM Using Django	0	783	April 8, 2021
Django - MongoDB - Group by query on JSON field ORM	0	1283	June 10, 2022
Trying to map a Postgres SQL Query to an ORM queryset Using the ORM	2	152	May 25, 2024
Using jsonb_array_elements with querysets Using Django	0	1047	June 1, 2021
Advanced PostgreSQL functions like json_to_recordset(json) / jsonb_to_recordset(jsonb) ORM	5	168	December 4, 2024

Converting Postgres SQL to Django ORM query

Related topics