Poor performance of NCLOB on Oracle

washeck · February 19, 2020, 8:29am

Hi,

we ran into issue with performance implications of using NCLOB (django_extensions.db.fields.json.JSONField in particular) on Oracle. Just adding such field to a model causes a huge performance penalty (in our setup it was about 10s for queryset with 100 items) because NCLOB cause additional db roundtrips (see https://sourceforge.net/p/cx-oracle/mailman/message/33641087/).

I find this behavior especially tricky because it’s not easily spotted with regular performance debugging tools such as Django Debug Toolbar. We finally resolved it using .defer() but the debugging took us significant amount of time.

Would it be possible to change all *LOB fields on Oracle to defer by default? Maybe I’m missing something but I don’t see much value in fetching the data on queryset evaluation when it involves separate db roundtrips anyway.

andrewgodwin · February 19, 2020, 7:46pm

Is this something that Django itself can fix? The extension you mention isn’t “official”.

That said, if you just want to defer by default, you can set up a default manager to do that. I’m not sure we’d want to change the core behaviour of “fetch everything” without doing it everywhere.

washeck · February 19, 2020, 8:09pm

Well, I’m not familiar enough with ORM internals to say if/how this could be fixed in Django. It is definitely not limited to 3rd party apps as Django’s own TextField suffers the same issue (it is also implemented by NCLOB on Oracle).

I am aware that we can fix it by changing the default manager (or moving the NCLOB fields to a separate model linked by OneToOneField- the path we’re probably taking) but my main motivation is to investigate if Django can behave more efficiently by default.

andrewgodwin · February 19, 2020, 8:32pm

Ah, OK, if it’s more of an overall “should django not return some fields by default” question, that definitely is an internals thing.

My personal argument to this is - if we knew what fields were big and unused, then yes, we could not return them - but that is something only you know, as the schema designer. I’ve seen plenty of TEXT and BLOB fields used for critical data on a model where deferring it would only decrease performance due to the extra SELECTs.

Thus leads the logic to “change the default manager” - in theory, it is a one-line change, but of course we know in practice it is not quite that easy.

Do you think it would be better to have a Meta option that allowed you to list default-deferred/excluded fields? A per-field option like db_index? I’m curious what you think would work for your case, given that it’s unlikely Django can ship code which “magically” knows if you need to defer a potentially-large field or not.

washeck · February 19, 2020, 8:45pm

There is already a ticket for the Meta/field option: https://code.djangoproject.com/ticket/23816 (this would be helpful for us if implemented)

I was more thinking about the “magical” decision not fetch BLOBs as because there is already decreased performance when what looks like single SELECT to Django results in fact in n-times fetching the individual BLOBs one by one (the JSON fields in our case are not large, they have some 100-200 characters yet the time for SELECT of about 100 rows is 10s).

andrewgodwin · February 24, 2020, 10:22am

Ah, right. I’m not sure Django has the relevant query-planner logic to optimise automatically when what it thinks are values in columns are actually single SELECTs; most databases (and BLOB fields in them) don’t suffer that performance penalty.

That means we’re basically still back to either implementing that, which is tough considering how tight the query/join planner is already, or offering the Meta/field option, which I think is my preferred (as then, especially, the custom field for NCLOB could set that option by default).

washeck · February 24, 2020, 10:45am

I also find the field option preferable solution.

andrewgodwin · February 24, 2020, 11:06am

I’m going to try to get this in 3.1 for you, either via mentorship or myself; no promises, but I agree it’s sub-optimal how it is now.

Gert · June 29, 2020, 7:19am

A different approach is taken by others (such as sqlalchemy) in that they instruct cxOracle to fetch CLOB/NCLOB fields as part of the other fields instead of fetching them later.

See https://github.com/zzzeek/sqlalchemy/blob/2a1a9f5f5a9723f757439657d2bdf224baed8748/lib/sqlalchemy/dialects/oracle/cx_oracle.py#L1010 and https://github.com/oracle/python-cx_Oracle/issues/63

We have been using this approach (and a few other tricks that sqlalchemy has) for years and it would be nice if Django’s Oracle support could improve to the same level of maturity as its other DB backends.

tomerp · November 22, 2020, 9:25pm

@andrewgodwin Is there a fix in Django 3.1?

Topic		Replies	Views
Improve the use of cx-Oracle in Django Using the ORM	5	2532	June 16, 2022
Performance Issue with Database Write Operations in Django Using the ORM	4	195	July 29, 2024
Possible performance improvements of psycopg3 support in Django 4.2 ORM	14	9712	March 14, 2023
ORM Dynamic Select & Prefetch Related for serializers Show & Tell	0	694	November 7, 2020
Proposal: Querying objects in batches for large collection ORM	11	3588	October 26, 2024

Poor performance of NCLOB on Oracle

Related topics