Poor performance of NCLOB on Oracle

Hi,

we ran into issue with performance implications of using NCLOB (django_extensions.db.fields.json.JSONField in particular) on Oracle. Just adding such field to a model causes a huge performance penalty (in our setup it was about 10s for queryset with 100 items) because NCLOB cause additional db roundtrips (see https://sourceforge.net/p/cx-oracle/mailman/message/33641087/).

I find this behavior especially tricky because it’s not easily spotted with regular performance debugging tools such as Django Debug Toolbar. We finally resolved it using .defer() but the debugging took us significant amount of time.

Would it be possible to change all *LOB fields on Oracle to defer by default? Maybe I’m missing something but I don’t see much value in fetching the data on queryset evaluation when it involves separate db roundtrips anyway.

Is this something that Django itself can fix? The extension you mention isn’t “official”.

That said, if you just want to defer by default, you can set up a default manager to do that. I’m not sure we’d want to change the core behaviour of “fetch everything” without doing it everywhere.

Well, I’m not familiar enough with ORM internals to say if/how this could be fixed in Django. It is definitely not limited to 3rd party apps as Django’s own TextField suffers the same issue (it is also implemented by NCLOB on Oracle).

I am aware that we can fix it by changing the default manager (or moving the NCLOB fields to a separate model linked by OneToOneField- the path we’re probably taking) but my main motivation is to investigate if Django can behave more efficiently by default.

Ah, OK, if it’s more of an overall “should django not return some fields by default” question, that definitely is an internals thing.

My personal argument to this is - if we knew what fields were big and unused, then yes, we could not return them - but that is something only you know, as the schema designer. I’ve seen plenty of TEXT and BLOB fields used for critical data on a model where deferring it would only decrease performance due to the extra SELECTs.

Thus leads the logic to “change the default manager” - in theory, it is a one-line change, but of course we know in practice it is not quite that easy.

Do you think it would be better to have a Meta option that allowed you to list default-deferred/excluded fields? A per-field option like db_index? I’m curious what you think would work for your case, given that it’s unlikely Django can ship code which “magically” knows if you need to defer a potentially-large field or not.

There is already a ticket for the Meta/field option: https://code.djangoproject.com/ticket/23816 (this would be helpful for us if implemented)

I was more thinking about the “magical” decision not fetch BLOBs as because there is already decreased performance when what looks like single SELECT to Django results in fact in n-times fetching the individual BLOBs one by one (the JSON fields in our case are not large, they have some 100-200 characters yet the time for SELECT of about 100 rows is 10s).

Ah, right. I’m not sure Django has the relevant query-planner logic to optimise automatically when what it thinks are values in columns are actually single SELECTs; most databases (and BLOB fields in them) don’t suffer that performance penalty.

That means we’re basically still back to either implementing that, which is tough considering how tight the query/join planner is already, or offering the Meta/field option, which I think is my preferred (as then, especially, the custom field for NCLOB could set that option by default).

I also find the field option preferable solution.

I’m going to try to get this in 3.1 for you, either via mentorship or myself; no promises, but I agree it’s sub-optimal how it is now.

1 Like

A different approach is taken by others (such as sqlalchemy) in that they instruct cxOracle to fetch CLOB/NCLOB fields as part of the other fields instead of fetching them later.

See https://github.com/zzzeek/sqlalchemy/blob/2a1a9f5f5a9723f757439657d2bdf224baed8748/lib/sqlalchemy/dialects/oracle/cx_oracle.py#L1010 and https://github.com/oracle/python-cx_Oracle/issues/63

We have been using this approach (and a few other tricks that sqlalchemy has) for years and it would be nice if Django’s Oracle support could improve to the same level of maturity as its other DB backends.

@andrewgodwin Is there a fix in Django 3.1?