Database instrumentation at a higher layer

I was wondering if anyone had any ideas on Instrumenting the ORM at the point where the Queryset is converted to an SQL statement to be issued to the database.
(Tracking, timing, logging, categorizing)

I write a lot of management commands to perform data transformations and migrations. I’m looking for a way where I can gather information about my processes over the lifespan of the command.

Ideally, I’d like to gather information at the point in time where the Queryset is converted to SQL to be issued, so that I still have the original Queryset available to me to aid in the categorization process. (I really don’t want to parse SQL)

I have read both Database Instrumentation and How to Make Always-Installed Django Database Instrumentation (Thanks @adamchainz!) and I think I understand what they’re doing - but they’re doing it at a lower layer than what I’m looking for.
(I’d also prefer to not have to go into every command and make changes to every command file. I’d really like to find / do something at the system level.)

I’m thinking I might be able to write a custom subclass for django.db.models.sql.compiler.SQLCompiler with my desired hooks and create a custom engine subclassed from the standard PostgreSQL engine to use it, but I’m just starting to read through the code and I don’t quite understand how the pieces all fit together yet.

I’ve checked djangopackages.org and pypi.org and haven’t found any candidates.

Anyone have any thoughts on this?

1 Like

It kind of sounds like you want to set up application tracing.

Paid services like datadog have libraries that you can pip install as well as handling the daemon processes to push the data to datadog. By default if you hook it up to your Django application, it will “just work” and instrument your middleware, sql, cache, and maybe some other things I’m forgetting.

The “free” tracing will only be so good and ultimately you’ll have to go in and find the best points in your code to wrap for tracing.

There’s also open source alternatives such as Open Tracing and Open Telemetry

I could be way off with this suggestion.

Hope it helps!

Thanks for the suggestions, but no, that’s not quite the direction I’m going. APM tools don’t provide the specificity that I’m trying to obtain.

The typical monitoring / profiling tools will show me where the time is spent in different functions, which is fine as far as it goes. But we do a lot with dynamic queryset construction and selection, so what I’m looking to do to is count and categorize the querysets being used. That’s why I need to find (or create) a hook that lets me analyze querysets at the point at which they’re used to generate a real SQL statement.

Yes a custom compiler class sounds like the kind of thing you would want. Definitely treading into undocumented/murky waters.

What I can remember is: the SQLCompiler object receives the Query, which is the internal state of a QuerySet. From this you can read some details like the model, and many other attributes (source). The names are a bit confusing because they bridge between the ORM and SQL, so it might be a little hard to categorize querysets as you want.

Good luck and let us know how you get on!

1 Like

Thanks for the affirmation.

The Query object is exactly what I’m looking for - I’ve already futzed enough with them to know it has enough of what I want for this purpose. My sticking point has been finding the right place (and time) to grab it.

I’m going to dive into this rabbit hole to see how far it goes - and I will report back once I get some results (either way).

Thanks again!

So it turns out to be easier than I thought. I’ve “cargo-culted” a bunch of stuff, but this fits into the category of “it works for me”, so I’m not terribly inclined to worry about it too much.

At the risk of boring you to tears…

  • Created a new app named “apm” and added it to the settings.
  • Changed the database engine to apm.backends
  • Created directory apm/backends
  • Created the following files in apm/backends

base.py

from django.db.backends.postgresql import base
from apm.backends.operations import DatabaseOperations

class DatabaseWrapper(base.DatabaseWrapper):
    ops_class = DatabaseOperations

compiler.py
The execute_sql method is where the magic happens. For this demo, I’m just printing some information. In reality, this will be writing to a redis queue for out-of-band processing and statistics gathering.
The addition of the other compiler classes is me whacking this with a brick to make it all work. No idea whether or not there’s a cleaner way to do this, but quite honestly, very little concern about it.

from django.db.models.sql.compiler import SQLCompiler as SQLC
from django.db.models.sql.compiler import SQLInsertCompiler as SQLIC
from django.db.models.sql.compiler import SQLDeleteCompiler as SQLDC
from django.db.models.sql.compiler import SQLUpdateCompiler as SQLUC
from django.db.models.sql.compiler import SQLAggregateCompiler as SQLAC
from django.db.models.sql.constants import (
    GET_ITERATOR_CHUNK_SIZE, MULTI,
)

class SQLCompiler(SQLC):
    pass

    def execute_sql(self, result_type=MULTI, chunked_fetch=False, chunk_size=GET_ITERATOR_CHUNK_SIZE):
        retval = super().execute_sql(result_type=result_type, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
        print("===== Executing query =====")
        print(self.query.base_table)
        print(self.query.table_map)
        print(self.query)
        return retval

class SQLInsertCompiler(SQLIC):
    pass

class SQLDeleteCompiler(SQLDC):
    pass

class SQLUpdateCompiler(SQLUC):
    pass

class SQLAggregateCompiler(SQLAC):
    pass

operations.py

from django.db.backends.postgresql.operations import DatabaseOperations as DO

class DatabaseOperations(DO):
    compiler_module = "apm.backends.compiler"

Ken

2 Likes

Oh great. This post is probably the best documentation on the topic of creating a custom DB backend now :smile:

You can slightly simplify this since you don’t need your own subclasses… The only thing to do is silence flake8’s warning about unused imports with # noqa: F401:

from django.db.models.sql.compiler import SQLCompiler as SQLC
from django.db.models.sql.compiler import (  # noqa: F401
    SQLInsertCompiler,
    SQLDeleteCompiler,
    SQLUpdateCompiler,
    SQLAggregateCompiler,
)
from django.db.models.sql.constants import (
    GET_ITERATOR_CHUNK_SIZE, MULTI,
)

class SQLCompiler(SQLC):
    def execute_sql(self, result_type=MULTI, chunked_fetch=False, chunk_size=GET_ITERATOR_CHUNK_SIZE):
        retval = super().execute_sql(result_type=result_type, chunked_fetch=chunked_fetch, chunk_size=chunk_size)
        print("===== Executing query =====")
        print(self.query.base_table)
        print(self.query.table_map)
        print(self.query)
        return retval
2 Likes