Hi everyone,
I use django to build a lot of data-first platforms. These platforms typically involve generating a lot of dynamic aggregate reports from the data in a database connected to the django app.
In order to retrieve and aggregate the data from the database, I make use of the ORM. However, writing many of the queries required are complicated. Though they can be written elegantly in SQL, they are awkward (or sometimes impossible) to replicate with the ORM alone, meaning a lot of the processing needs to occur in Python on the web server rather than in the database. However, the ORM is beneficial in that it helps us generate dynamic data filters programmatically. Of course, the ORM also helps to clean parameters from user input and ensure that injection attacks do not occur.
In the context of a reporting application, there’s not usually a whole lot of writes to the database and instead more reads. User input is usually limited to selecting filters to narrow down the scope of a given report. Ideally, I’d like to:
-
Collect the filters from a request input (GET or POST parameters)
-
use the
.filter()
model method to parse the filters and generate a QuerySet object (not evaluated, just generate a SQL query that includes all fields of a selected table and the relevant filters. -
take the SQL that would be executed by the QuerySet object if the data were requested and wrap it in a subquery/CTE that handles the actual report logic.
I am aware of the .query
attribute of a query set object. However, this SQL query is not executable. For example, if you added a filter like .filter(some_attribute='some_value')
, the resulting WHERE
clause would look like: WHERE some_attribute = some_value
- notice no quotes around some_value
, so this is not valid SQL.
Is there any way I can generate usable SQL from a queryset? so I can do something like this:
filtered_queryset = MyModel.filter(a='some_val', b__in=[1, 2, 3], c__lte=20)
reporting_query = f"""
with filtered_table as ({filtered_queryset.usable_query}) -- wrap the filtered base table in a CTE
select
a
,b
,c
from
filtered_table
"""
report_data = MyModel.objects.raw(reporting_table)
Allowing the ORM to serve as a ‘filter factory’ that converts filters to SQL and then using a raw query to do the actual report logic.
I don’t think there is any opportunity for SQL injection here, since the entirety of the user input is handled by the ORM. I just want to use it to generate the ‘base data set’ and then use familiar tools (to me and the people who work with me) to do the rest.
My current workaround (since I am using postgres) is to use the cursor’s mogrify
method:
from django.db import connections
# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'
# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()
# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()
# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)
cursor.execute(usable_sql) # This works
cursor.fetchall() # This works
# Have not tried this yet
MyModel.objects.raw(usable_sql)
# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
*
from
base_table
"""
cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)
I’ve also heard of people creating an unmanaged model on top of a SQL view that is pre-calculated with the complex query, but I’m not sure how I could e.g. filter that already-aggregated view if the fields we want to filter on are not explicitly included, which might not make sense.
So
- Why does Django not allow you to export a usable query from a
QuerySet
?
and - What other methods besides what I have presented are preferred when the ORM is insufficient for a given query?