How to improve hard to read table name aliases like "U0", "W1", etc.?

salomvary · August 18, 2023, 9:11am

Hi folks,

I’m working on optimizing some non-trivial queries generated by Django ORM and noticed that some aliases are rather hard to understand for humans. I suspect that this happens when subqueries are involved. The names look like “V0”, “U0”, “W1”, etc.

When reading the generated queries or especially when passing them to EXPLAIN and reading that output, it’s hard to know what table “W1” refers to.

I understand that these aliases are probably used as a straightforward way to having unambiguous table/column references but is there a rationale for them being so terse?

I was thinking of creating a patch or “extension” that expands these “original_table_name_V0” or something like that. Would something like that be accepted into the ORM? Is this easily doable at all? If yes, where should I look first (I’ve never really looked into the ORM internals before, any pointers would be appreciated.)

I’m using Django 4.2 with PostgreSQL.

adamchainz · August 18, 2023, 8:56pm

IMO short table aliases are preferable, even one letter. When writing SQL I would use e.g auth_user AS U, auth_group AS G. I’m not convinced that using the full table name in the alias would be worth it.

It would be a minor QoL improvement if Django used clearer aliases, perhaps one to several letters long. But handling conflicts in such a scheme could be quite complicated, especially when the SQL is read so infrequently.

Regardless if anything gets merged, it would be a good way to dive into the ORM. The SQLCompiler class is what turns a query into SQL and parameters. I think the aliasing is in there…

charettes · August 19, 2023, 2:23pm

Hello @salomvary, independently of whether this is a good idea or not I suggest you have a look django.db.models.sql.Query.bump_prefix if you want to play around with this idea. This method is called everytime a sql.Query instance, originating from a Queryset, is used a subquery (AKA subquery pushdown).

That’s where re-aliasing and prefix conflict prevention logic takes place during the resolving phase of expressions which happens before compilation (SQLCompiler)

As long as you maintain a prefix counter (that’s what alias_prefix is in base26 after all) you should be able to preserve the existing alias.

Note that re-aliasing also takes place when the same table is JOINed twice in the same query and that takes place in sql.Query.table_alias.

Good luck!

salomvary · August 26, 2023, 4:34am

IMO short table aliases are preferable, even one letter.

That’s a personal preference and falls apart when a large number of tables are involved, or the same table is joined several times. Furthermore Django won’t generate “u” or “a” for “auth_user”, it will use something like “W1”.

especially when the SQL is read so infrequently

SQL is read rather frequently where performance tuning is a thing. Practically all real-life projects that use queries beyond simple “selects by primary key”.

salomvary · August 26, 2023, 4:36am

Thanks for the implementation pointers, will have a look some time.

adamchainz · August 26, 2023, 6:10am

Yes fair. There is a minor cost to increasing the length of SQL though, as it all requires transmitting to the database and parsing there.

Yes, it’s very useful when doing that. What I meant was: when optimizing a query that runs one million times a day, you’ll likely read a handful of those queries. Any overhead from a better aliasing algorithm would be wasted for the majority of executions.

Anyway, I don’t mean to discourage you too much. If Simon’s guidance can help you find a reasonable change, I’d like to see that!

Another idea: a “SQL tidier“ that re-aliases a given query, for post-hoc debugging.

josh1 · February 21, 2024, 6:27pm

I’m actually trying to find a way to reduce the query size, AWS’s RDS Proxy has a query-size limit that if exceeded pins the db session. I have many large queries, and noticed that Django uses the full table name when specifying all the SELECT … columns to retrieve. I’m trying to find a way for it to use a small alias name instead so I can get the proxy to perform better.

salomvary · November 26, 2024, 4:33pm

So I’ve made some attempts on making generated aliases more readable: Comparing main...human-readable-aliases · salomvary/django · GitHub

Other than changing T0 table alias to original_table_name_T0, it also improves generated column aliases, from col1 to original_column_name_col1.

The Django test suite seems to pass locally (although I am getting random failures even on a clean copy), I wonder if it is worth the effort of attempting a pull request.

For those who want to give this a go, here is how to patch Django locally:

Change to the directory that contains django, e.g. cd .venv/lib/python3.12/site-packages
Apply the patch: curl https://github.com/salomvary/django/compare/main...human-readable-aliases.patch | patch

Topic		Replies	Views
Connecting to existing database Using Django	2	717	July 12, 2021
ORM filtering with nested joins Using the ORM	4	2012	August 14, 2022
Complex SQL through the Django ORM Using Django	29	3133	December 20, 2021
Django orm make query wrong for Mysql db. Using Django	5	1602	March 27, 2020
Custom inner join using values_list ? Using the ORM	0	216	May 17, 2024

How to improve hard to read table name aliases like "U0", "W1", etc.?

Related topics