How good work JSON field queries with Django for searching?


my team (full stack web application development) want to use Django ORM to query a PostgreSQL database table using a JSON field containing arbitrary tag-value pairs (like search tags) with the django ‘contained by’ operation on JSON fields. I hope it is understandable what we are trying to do. It is basically a reversed search engine. We want to select all rows of which the JSON field key-value-pairs are a subset of the requested key-value-pairs. Example: Search for tags Nation:Austria, Gender:Male, Religion:Muslim returns all table entries with any tag (or tag combination) of these tags. THis should work streight forward using the contained-by operator from the django documentation but would this also work on millions of rows within milliseconds and for millions of users (like a search engine)? Is this doable within django using postgreSQL or do you have doubts from your experience? The alternative would be to try ElasticSearch-like databases. My team and I would very much appreciate any suggestion from you.

Not a chance, but this isn’t a Django issue.

You’re not going to be able to do that with PostgreSQL alone - never mind the overhead that Django adds to the process. (At least not without massively scaling out your environment.)

How many millions of dollars are you looking to invest in your infrastructure?

Either way, if that’s the real scale you’re needing to engineer for now, and not the pipe dream of someone who thinks you can become the next Google on a shoestring, you’re completely in the wrong territory here.

Your biggest constraint actually isn’t the “millions of rows” - a couple hundred GB of memory should allow the entire database to be memory resident. That might get you close to “millisecond response” - but that’s not addressing the “millions of users”.

I suggest you build out a Proof of Concept architecture and then perform some primitive benchmarking to prove just how realistic those requirements are for any but the most massive of environments. Or at least get an idea of how far up you can scale this before needing to go through your next round of upgrades.

But keep in mind that “massive scalability” requires a lot of infrastructure work throughout the entire stack. It’s not an issue of “find and fix the bottleneck”. It’s a case of "find the current bottleneck, fix it, and then move on to the next).

You might also find it helpful to read the blog posts at Latest stories published on Instagram Engineering to see their experiences with scaling a public web service to a scale beyond what most organizations need to deal with.

Thank you for replying. As you point out “work on millions of rows within milliseconds and for millions of users” is exaggerated of course. What I really tried to learn is if it is recommended to start with the setup I described (Django ORM, PostgreSQL, JSON queries) for developing an application or if this attempt does not scale at all and we should skip SQL for this task and start off in ElasticSearch right away.But maybe this is the wrong place to ask.