Missing support for CREATE INDEX CONCURRENTLY IF NOT EXISTS

Django doesn’t support the IF NOT EXISTS conditional when creating indexes concurrently in postgres.
Has anyone else come across this?

There is a ticket (just created) for that.
Good to check if more people are interested in this:

https://code.djangoproject.com/ticket/35383

I agree with @adamchainz’s assessment and the previous reasoning for closing the tickets as wont-fix.

The existing usage of IF EXISTS in the current Postgres backend schema alteration SQL templates is mainly to avoid internal lookups against information_schema and not in order to provide operation level support for IF (NOT) EXISTS so I don’t see any inconsistencies here.

It’s easy enough to use RunSQL and I don’t see why AddIndexConcurrently should be specialized to support IF NOT EXISTS when AddIndex isn’t.

@charettes And do you know why AddIndexConcurrently does not support Oracle? Is there a technical reason or it’s just nobody did the work so far?

I was also thinking if the way to add index concurrently by manually crafting migrations to using special operation is the ideal workflow. Our project consist of large legacy codebase in custom framework and a new part in Django, both running on Oracle. The custom framework adds all indexes concurrently (ONLINE in Oracle terminology) as it is a sane default for project with very large tables - you don’t want developers to ruin production by forgetting to add the concurrent option.

We solved this in Django by modifying our DB backend (we need it for other hacks anyway) like this

    sql_create_index = schema.DatabaseSchemaEditor.sql_create_index + " ONLINE"
    sql_create_unique_index = schema.DatabaseSchemaEditor.sql_create_unique_index + " ONLINE"

I’m not proposing this should be merged to Django :slight_smile: but I was thinking how to make it easier for developers - maybe add settings or makemigrations switch to use AddIndexConcurrenty instead of AddIndex?

Ah, that’s a good reason. I guess we could summarize the situation as “Django assumes its migrations are the only source of your schema. It uses IF NOT EXISTS only when it helps with its own migration patterns. If you need something custom, provide the SQL and assume the associated risks.”

CONCURRENTLY is PostgreSQL-specific syntax, that’s why it lives in django.contrib.postgres. The lack of CreateIndexOnline (or whatever we’d call it) is due to a general lack of enthusiasm for Oracle. I would suggest you start by creating an operation and releasing it in a third-party package, like my Django-MySQL one. That would be better than any new setting or switch.

I understand why it is generally better to have extra functionality in third-party app but I’m not sure this the right example for such approach. The operation is the same (build index in background) and supported by both DBs, just with a different name. I have always seen one of the main ORM advantages being a db agnostic layer. In fact, we plan to move from Oracle to Postgres in a long-term and having as few differences is a key factor here. I’m not sure why there needs to be AddIndexOnline and AddIndexConcurrently when it is the same operation.

Appart from that, my mention of the switch or settings was meant mainly for existing (PG only) support for concurrent indexes. Do you find it optimal workflow for Django to generate AddIndex and then manually modify it by developer to AddIndexConcurrently? I’d rather see Django to use the concurrent version when generating the migration.

I get your point about adding backend agnostic features.

It remains that even if we were to deprecate AddIndexConcurrently to replace it with a new AddIndex(concurency:bool = False) kwarg that uses the proper syntax on Postgres and Oracle and error out on other backends it is a separate can of worms than the if_not_exists: bool feature which is the origin of the discussion here.

The if_not_exists option doesn’t make sense in the context of the migration framework where it tries very hard to keep a coherent state between the defined migration files and the database.

1 Like

@charettes

The existing usage of IF EXISTS in the current Postgres backend schema alteration SQL templates is mainly to avoid internal lookups against information_schema

Is this for performance reasons? I tried to check if there was something written on the commit history of django about this, and the closest I could find was this commit: Refs #27860 -- Simplified deleting indexes on PostgreSQL using "IF EX… · django/django@6b47431 · GitHub

But it doesn’t say much, and the related ticket seems to be about something unrelated, so it seems like some conversation about it happened elsewhere?

Any more insight about why this is important (avoid the lookup on information_schema) would be of great help, thanks.

Hey @marcelofern.

The commit you linked is a perfect example of what I mentioned. Instead of looking up existing constraints from information_schema, which is what _constraint_names does, we simply ensure that it doesn’t exist by name by relying on IF EXISTS.

If you look at the PR associated with the commit you linked you’ll see that it’s referenced by another PR that ought to simplify index and constraint removal.

Hi @charettes thanks, I had seen that PR, but it still did’tt clarify why this is an optimisation. Maybe I am lacking something.

Essentially, I understand why it fixes the bug, but I don’t know why this is considered an optimisation.

I.e., this PR [Refs #27860 -- Optimized deleting indexes on PostgreSQL by using "IF EXISTS" clause. by felixxm · Pull Request #8136 · django/django · GitHub] refers to this being an optimisation.

Again, apologies if this is a dumb question, I am just trying to understand the core reason why adding IF EXISTS is an optimisation, and why checking the information_schema beforehand would be that bad.

Not a dumb question.

While it’s normally not a problem on Postgres, querying information_schema for introspection purposes can be slow and cumbersome. Have a look at what it does currently to get an idea of the complexity of getting the right information.

In this sense it is faster to let Postgres figure out if the index exists by itself instead of performing multiple potentially slow queries and round trips to the database.

On the complexity front the diff speaks for itself. It’s much easier to ask for forgiveness about the possible lack of existence of an index, which is ultimately what Remove(Constraint|Index) represent, than to figure out which require removals.

So I’d say the optimization is two fold; better performance and reduction of complexity.