I was wondering if anyone had any ideas on Instrumenting the ORM at the point where the Queryset is converted to an SQL statement to be issued to the database.
(Tracking, timing, logging, categorizing)
I write a lot of management commands to perform data transformations and migrations. I’m looking for a way where I can gather information about my processes over the lifespan of the command.
Ideally, I’d like to gather information at the point in time where the Queryset is converted to SQL to be issued, so that I still have the original Queryset available to me to aid in the categorization process. (I really don’t want to parse SQL)
I’m thinking I might be able to write a custom subclass for django.db.models.sql.compiler.SQLCompiler with my desired hooks and create a custom engine subclassed from the standard PostgreSQL engine to use it, but I’m just starting to read through the code and I don’t quite understand how the pieces all fit together yet.
It kind of sounds like you want to set up application tracing.
Paid services like datadog have libraries that you can pip install as well as handling the daemon processes to push the data to datadog. By default if you hook it up to your Django application, it will “just work” and instrument your middleware, sql, cache, and maybe some other things I’m forgetting.
The “free” tracing will only be so good and ultimately you’ll have to go in and find the best points in your code to wrap for tracing.
Thanks for the suggestions, but no, that’s not quite the direction I’m going. APM tools don’t provide the specificity that I’m trying to obtain.
The typical monitoring / profiling tools will show me where the time is spent in different functions, which is fine as far as it goes. But we do a lot with dynamic queryset construction and selection, so what I’m looking to do to is count and categorize the querysets being used. That’s why I need to find (or create) a hook that lets me analyze querysets at the point at which they’re used to generate a real SQL statement.
Yes a custom compiler class sounds like the kind of thing you would want. Definitely treading into undocumented/murky waters.
What I can remember is: the SQLCompiler object receives the Query, which is the internal state of a QuerySet. From this you can read some details like the model, and many other attributes (source). The names are a bit confusing because they bridge between the ORM and SQL, so it might be a little hard to categorize querysets as you want.
The Query object is exactly what I’m looking for - I’ve already futzed enough with them to know it has enough of what I want for this purpose. My sticking point has been finding the right place (and time) to grab it.
I’m going to dive into this rabbit hole to see how far it goes - and I will report back once I get some results (either way).
So it turns out to be easier than I thought. I’ve “cargo-culted” a bunch of stuff, but this fits into the category of “it works for me”, so I’m not terribly inclined to worry about it too much.
At the risk of boring you to tears…
Created a new app named “apm” and added it to the settings.
Changed the database engine to apm.backends
Created directory apm/backends
Created the following files in apm/backends
from django.db.backends.postgresql import base
from apm.backends.operations import DatabaseOperations
ops_class = DatabaseOperations
The execute_sql method is where the magic happens. For this demo, I’m just printing some information. In reality, this will be writing to a redis queue for out-of-band processing and statistics gathering.
The addition of the other compiler classes is me whacking this with a brick to make it all work. No idea whether or not there’s a cleaner way to do this, but quite honestly, very little concern about it.
from django.db.models.sql.compiler import SQLCompiler as SQLC
from django.db.models.sql.compiler import SQLInsertCompiler as SQLIC
from django.db.models.sql.compiler import SQLDeleteCompiler as SQLDC
from django.db.models.sql.compiler import SQLUpdateCompiler as SQLUC
from django.db.models.sql.compiler import SQLAggregateCompiler as SQLAC
from django.db.models.sql.constants import (
def execute_sql(self, result_type=MULTI, chunked_fetch=False, chunk_size=GET_ITERATOR_CHUNK_SIZE):
retval = super().execute_sql(result_type=result_type, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
print("===== Executing query =====")
from django.db.backends.postgresql.operations import DatabaseOperations as DO
compiler_module = "apm.backends.compiler"