How to extract usable SQL from a QuerySet?

Hi everyone,

I use django to build a lot of data-first platforms. These platforms typically involve generating a lot of dynamic aggregate reports from the data in a database connected to the django app.

In order to retrieve and aggregate the data from the database, I make use of the ORM. However, writing many of the queries required are complicated. Though they can be written elegantly in SQL, they are awkward (or sometimes impossible) to replicate with the ORM alone, meaning a lot of the processing needs to occur in Python on the web server rather than in the database. However, the ORM is beneficial in that it helps us generate dynamic data filters programmatically. Of course, the ORM also helps to clean parameters from user input and ensure that injection attacks do not occur.

In the context of a reporting application, there’s not usually a whole lot of writes to the database and instead more reads. User input is usually limited to selecting filters to narrow down the scope of a given report. Ideally, I’d like to:

  • Collect the filters from a request input (GET or POST parameters)

  • use the .filter() model method to parse the filters and generate a QuerySet object (not evaluated, just generate a SQL query that includes all fields of a selected table and the relevant filters.

  • take the SQL that would be executed by the QuerySet object if the data were requested and wrap it in a subquery/CTE that handles the actual report logic.

I am aware of the .query attribute of a query set object. However, this SQL query is not executable. For example, if you added a filter like .filter(some_attribute='some_value'), the resulting WHERE clause would look like: WHERE some_attribute = some_value - notice no quotes around some_value, so this is not valid SQL.

Is there any way I can generate usable SQL from a queryset? so I can do something like this:

filtered_queryset = MyModel.filter(a='some_val', b__in=[1, 2, 3], c__lte=20)

reporting_query = f"""
with filtered_table as ({filtered_queryset.usable_query})  -- wrap the filtered base table in a CTE
select
    a
    ,b
    ,c
from
    filtered_table
"""

report_data = MyModel.objects.raw(reporting_table)

Allowing the ORM to serve as a ‘filter factory’ that converts filters to SQL and then using a raw query to do the actual report logic.

I don’t think there is any opportunity for SQL injection here, since the entirety of the user input is handled by the ORM. I just want to use it to generate the ‘base data set’ and then use familiar tools (to me and the people who work with me) to do the rest.

My current workaround (since I am using postgres) is to use the cursor’s mogrify method:

from django.db import connections

# Whatever the key is in your settings.DATABASES for the reporting db
WAREHOUSE_CONNECTION_NAME = 'default'

# Get the Query object and separate it into the query and params
filtered_table_query = MyModel.objects.filter(**filters).query
raw_query, params = filtered_table_query.sql_with_params()

# Create a cursor from the relevant connection
cursor = connections[WAREHOUSE_CONNECTION_NAME].cursor()

# Call .mogrify() on the query/params to get an executable query string
usable_sql = cursor.mogrify(raw_query, params)

cursor.execute(usable_sql)  # This works
cursor.fetchall()  # This works

# Have not tried this yet
MyModel.objects.raw(usable_sql)

# Or this
wrapper_query = f"""
with base_table as ({usable_sql})
select
    *
from
    base_table
"""

cursor.execute(wrapper_query)
# or
MyModel.objects.raw(wrapper_query)

I’ve also heard of people creating an unmanaged model on top of a SQL view that is pre-calculated with the complex query, but I’m not sure how I could e.g. filter that already-aggregated view if the fields we want to filter on are not explicitly included, which might not make sense.

So

  1. Why does Django not allow you to export a usable query from a QuerySet?
    and
  2. What other methods besides what I have presented are preferred when the ORM is insufficient for a given query?

I’m going to go waaayyy out on a limb here and guess that it’s because it doesn’t actually generate the SQL that gets issued to the database until it’s actually going to issue the request to the database.
There are differences that may exist between database engines, causing the individual engines to “compile” the queryset into SQL differently depending upon the backend database.
Since a queryset can be created before the using clause is applied, I believe it’s likely that the queryset is stored internally in an abstract form, and is not converted until Django knows which database is going to be the target of the query.

There is, in fact, a function named as_sql which looks like it may perform what you need it to do, if you provide the compiler and connection parameters it requires.

I did some research earlier and that’s actually true - django never creates a useable SQL query (well, not reliably). Instead it passes the parameterized query and the parameter list to the database engine, which in turn injects the parameters as it executes the query. The as_sql method (if I recall correctly) also does not inject the parameters, it only generates the parameterized query and parameter list.

I’m guessing then that the mogrify method I mentioned would be the “right” way to achieve this. I may see if other engines besides psycopg2 have the feature and if so, will submit a PR.