Profile JSON field -> Python

avremel · September 17, 2020, 4:33pm

I have a query which returns up to 200k rows with a Postgres JSON field. The query logic itself has been optimized, but I suspect that performance can be further improved. I am looking to measure how much time is spent parsing the JSON into Python. How would I go about profiling that?

KenWhitesell · September 17, 2020, 4:42pm

The easiest way would be to use the shell to manually load the data, and then use the Python timeit module to run some experiments.

Ken

avremel · September 17, 2020, 4:45pm

Is there a way to drill down into how much time is spent querying the database vs the time spent parsing JSON? Would I need to subclass the field and measure the from_python method?

KenWhitesell · September 17, 2020, 4:47pm

I’d take a first stab in a different direction. I’d use the values clause to return the data as data rather than as Python objects and factor that out of the overall timings.

avremel · September 17, 2020, 4:47pm

Cool, thanks for the idea.

KenWhitesell · September 17, 2020, 4:49pm

You could also go the raw query route where you time the sql itself rather than any layers in the ORM.

Either way, once you retrieve the data, you can then run separate timings to see how quickly they’re deserialized into object instances.

adamchainz · September 17, 2020, 5:25pm

For production data timing, you can also install an APM solution (Scout, New Relic, etc.) that constantly tracks performance of queries across your site, so you always have data.

avremel · September 17, 2020, 5:38pm

We have that in place, the slow query logs are surfaced from Aruara to DataDog and AWS QuerySite.

avremel · September 17, 2020, 5:42pm

@KenWhitesell Don’t values and values_list both parse the JSON into Python, just not as an ORM instance? I don’t have to call json.loads on a values_list.

KenWhitesell · September 17, 2020, 5:57pm

Could be. We’ve left my scope of knowledge now, everything beyond this is conjecture on my part. I seem to remember that it’s psycopg2 that is doing the raw conversion rather than anything in the ORM.
(From the psycopg2 docs on JSON: Reading from the database, json and jsonb values will be automatically converted to Python objects.)
It goes on to say that you can bypass this process by casting the column to a text field.

So if you’re needing to do more fine-grained analysis of the timing, you might want to run the raw queries with and without casting the column. (You could then also run the raw query through psql to factor out the rest of the overhead introduced by psycopg2 and python.)

If the total amount of data is large enough to warrant concern, you might even want to evaluate the network overhead involved if you’re not running this on the same system as the database. (And if you are on the same system, you might want to look at the difference between using a tcp port through the loopback address vs a unix socket if you’re not already using one.)

Ken

sww314 · September 17, 2020, 8:32pm

I would attempt to determine where the big problem is in the database or not.

In the past, I have found these articles helpful to try and figure that out:

In some recent work, using queryset.filter().values('field1') results in big improvements for me. This selects less data from the database.

avremel · September 17, 2020, 8:50pm

@KenWhitesell I now see the docs are explicit too that the conversion happens at a lower level than Django.

I executed the query directly against the psycopg2 client with json (the default) and again with the json casted as a string (json_field::text). Without the json -> python conversion, the query is ~3x faster than with the python conversion.

Thanks for your help.

I’m going to look into different serialization options that might have better performance.

avremel · September 17, 2020, 8:54pm

Yep, selecting only what I need with values_list, and am specific within the JSON field (field__key) as well.

Thanks for the links.

Topic		Replies	Views
How good work JSON field queries with Django for searching? Getting Started	3	2315	July 14, 2022
Mini-Proposal: Allow psycopg Jsonb types? Django Internals	3	374	October 27, 2023
Converting jsonb_each postgresql query to Django ORM Using Django	0	769	April 8, 2021
RawQuerySet Memory Issue Using Django	2	1142	November 16, 2021
Is there a faster way of importing a large json dataset? Using Django	2	1658	April 8, 2020

Profile JSON field -> Python

Related topics