Query a JSON formset using ORM

I have a ReportTemplate table with a JSONField that contains a formset. As usual, the formset looks something like this:

{
'settings-0-a': 10021,
'settings-0-b': 'Locale',
'settings-0-c': '...',
'settings-TOTAL_FORMS': 1,
'settings-INITIAL_FORMS': 1,
'settings-MAX_NUM_FORMS': 1000,
'settings-MIN_NUM_FORMS': 0
}

I’d like to query this in the database. To get to the raw data, I worked up the following SQL query:

SELECT id,
    CASE (split_part(key, '-', 3)) WHEN '' THEN 'management' ELSE split_part(key, '-', 2) END AS form,
    CASE (split_part(key, '-', 3)) WHEN '' THEN split_part(key, '-', 2) ELSE split_part(key, '-', 3) END AS field,
    value from (
        SELECT rt.id, (jsonb_each(rt.settings)).* from app_reporttemplate as rt
    ) AS x(id,key,value)

which I can use like this:

$ ./manage.py dbshell
foo=# SELECT id, ...as above... where id=10000011;
  id   |    form    |     field     |              value              
-------+------------+---------------+---------------------------------
 10001 | 0          | id            | 10021
 10001 | 0          | name          | "Locale"
 10001 | 0          | type          | "paiyroll.report.SettingString"
 10001 | 0          | regex         | ""
 10001 | 0          | value         | "en_GB"
 10001 | management | TOTAL_FORMS   | 1
 10001 | 0          | max_length    | 80
 10001 | 0          | description   | "Locale, e.g. en_US, de_GE"
 10001 | management | INITIAL_FORMS | 1
 10001 | management | MAX_NUM_FORMS | 1000
 10001 | management | MIN_NUM_FORMS | 0
(11 rows)

I tried to wrap this in a Raw SQL query:

$ ./manage.py shell
In [27]: for row in models.ReportTemplate.objects.raw("SELECT id,...query as above..."):
    ...:     print(row.id, row.form, row.field, row.value)

This produces the expected output from the annotations for form, field and value. But of course, the output of a raw query cannot be used to .filter() or anything much else.

So, my question is how else might one approach this problem and end up with something that is more centered in the ORM?

I think the first thing I would do would be to rework the models being used, to more properly reflect the data being submitted and eliminate using a model as an attribute-key-value store. Then you’d be able to use forms and formsets as designed and intended.

Otherwise, you can create a model for this table and use the ORM. (You’d be adding a column for a primary key field, and may need to rename what you’re calling the “id” column, but once you’ve done that, you’d be able to work wiith this model the same way you’d work with any other model.)

I very much agree with @KenWhitesell. It seems that your lack of proper schema definition is the source of the problem.

You’ve optimized for ease of writes by stashing browser submitted data without paying the price of organizing it offhand. You’re now faced with trying to make sense of it at querying time and that’s going to be painful because the ORM only shines when it deals with data associated with a relational schema.

In other words, the queryset interface is optimized for querying organized data and not for organizing it at querying time.

@KenWhitesell @charettes Thanks for the replies. I understand the sentiment, however the design I have is appropriate for the vast majority of my use cases, though it is obviously inconvenient in this case.

Perhaps if I narrowed down what I am after, rather than the wide-open question as currently worded?

The actual case in point requires only that I return the ReportTemplate’s primary key where settings-N-a and settings-N-b have given values. In SQL, the query given above is used as a CTE like this:

WITH formset AS (...query much as above...)
SELECT f1.id
FROM formset as f1
JOIN formset as f2
     ON ....various matching criteria...;

This returns the one column I need. I think I need to wrap single columns in a Subquery, but I am unclear how to get from the SQL to the ORM Subquery. Any pointers appreciated.

It looks like django-cte confirms that CTEs are not directly supported by the ORM and, for me, it may be a way forward, possibly using its raw-cte-sql facility.

It occurred to me that a different approach might be possible, by slightly recasting the problem and using an Expression. Specifically, starting with the “raw” query:

$ ./manage.py dbshell
foo=# SELECT id, ...as above... where id=10000011;
  id   |    form    |     field     |              value              
-------+------------+---------------+---------------------------------
 10001 | 0          | id            | 10021
 10001 | 0          | name          | "Locale"
...

I can use row_to_jsonb() to wrap the above like this:

foo=# select row_to_json(o.*) from (...previous query...) AS o;
                                             row_to_json                                              
------------------------------------------------------------------------------------------------------
 {"id":10001,"form":"0","field":"id","value":10021}
 {"id":10001,"form":"0","field":"name","value":"Locale"}

IIUC, I ought to be able to do this using a django.db.models.Expression to wrap the source JSONField but there isn’t much in the documentation on how this is could work. Any pointers would be much appreciated.

(I did also look at django.db.function.Func but could not see how the template would be formulated as a function call).