How to improve hard to read table name aliases like "U0", "W1", etc.?

Hi folks,

I’m working on optimizing some non-trivial queries generated by Django ORM and noticed that some aliases are rather hard to understand for humans. I suspect that this happens when subqueries are involved. The names look like “V0”, “U0”, “W1”, etc.

When reading the generated queries or especially when passing them to EXPLAIN and reading that output, it’s hard to know what table “W1” refers to.

I understand that these aliases are probably used as a straightforward way to having unambiguous table/column references but is there a rationale for them being so terse?

I was thinking of creating a patch or “extension” that expands these “original_table_name_V0” or something like that. Would something like that be accepted into the ORM? Is this easily doable at all? If yes, where should I look first (I’ve never really looked into the ORM internals before, any pointers would be appreciated.)

I’m using Django 4.2 with PostgreSQL.

IMO short table aliases are preferable, even one letter. When writing SQL I would use e.g auth_user AS U, auth_group AS G. I’m not convinced that using the full table name in the alias would be worth it.

It would be a minor QoL improvement if Django used clearer aliases, perhaps one to several letters long. But handling conflicts in such a scheme could be quite complicated, especially when the SQL is read so infrequently.

Regardless if anything gets merged, it would be a good way to dive into the ORM. The SQLCompiler class is what turns a query into SQL and parameters. I think the aliasing is in there…

Hello @salomvary, independently of whether this is a good idea or not I suggest you have a look django.db.models.sql.Query.bump_prefix if you want to play around with this idea. This method is called everytime a sql.Query instance, originating from a Queryset, is used a subquery (AKA subquery pushdown).

That’s where re-aliasing and prefix conflict prevention logic takes place during the resolving phase of expressions which happens before compilation (SQLCompiler)

As long as you maintain a prefix counter (that’s what alias_prefix is in base26 after all) you should be able to preserve the existing alias.

Note that re-aliasing also takes place when the same table is JOINed twice in the same query and that takes place in sql.Query.table_alias.

Good luck!

IMO short table aliases are preferable, even one letter.

That’s a personal preference and falls apart when a large number of tables are involved, or the same table is joined several times. Furthermore Django won’t generate “u” or “a” for “auth_user”, it will use something like “W1”.

especially when the SQL is read so infrequently

SQL is read rather frequently where performance tuning is a thing. Practically all real-life projects that use queries beyond simple “selects by primary key”.

Thanks for the implementation pointers, will have a look some time.

Yes fair. There is a minor cost to increasing the length of SQL though, as it all requires transmitting to the database and parsing there.

Yes, it’s very useful when doing that. What I meant was: when optimizing a query that runs one million times a day, you’ll likely read a handful of those queries. Any overhead from a better aliasing algorithm would be wasted for the majority of executions.

Anyway, I don’t mean to discourage you too much. If Simon’s guidance can help you find a reasonable change, I’d like to see that!

Another idea: a “SQL tidier“ that re-aliases a given query, for post-hoc debugging.

I’m actually trying to find a way to reduce the query size, AWS’s RDS Proxy has a query-size limit that if exceeded pins the db session. I have many large queries, and noticed that Django uses the full table name when specifying all the SELECT … columns to retrieve. I’m trying to find a way for it to use a small alias name instead so I can get the proxy to perform better.