Advanced PostgreSQL functions like json_to_recordset(json) / jsonb_to_recordset(jsonb)

Hi, are there any plans to support more advanced PostgreSQL function like json_to_recordset(json) / jsonb_to_recordset(jsonb)? If not, what approach would you suggest, to add such functions? Example dummy usage for such function would be to retrieve values from elements of specified type from following list [{“value”: 3, “type”: “first”}, {“value”: 5, “type”: “second”}, …] stored undef json field.

Hello @Adiorz, these two functions can be implemented today using Func in main

from django.db.models import Func, JSONField

class JSONToRecordset(Func):
    function = "json_to_recordset"
    set_returning = True

The problem you’ll likely face though is that the ORM has very limited support for set-returning functions at this point.

For example, they can be used in the SELECT clause using annotate and in the WHERE clause using filter but the ORM doesn’t provide a way to use them in the FROM clause yet.

You’d need to provide more details about what you are hoping to use them for to determine if the ORM can support your needs.

Hi, thanks for the reply. The use case would be to check if there are any parameters exceeded for setups related to given project (I just made it up). The filtering criteria would be to take into account only adjustable=True and energy_intensive=False.

models:

from django.db import models
from django.contrib.postgres.fields import JSONField

class Project(models.Model):
    name = models.CharField(max_length=100)

class Mode(models.Model):
    name = models.CharField(max_length=50, unique=True)

class MachineSetup(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE, related_name='setups')
    mode = models.ForeignKey(Mode, on_delete=models.PROTECT, related_name='setups')
    config = JSONField()

    class Meta:
        unique_together = ['project', 'mode']

    def __str__(self):
        return f"{self.project.name} - {self.operation_mode.name} - {self.machine_name}"

example config data:

[
    {
        "parameter": "OperationSpeed",
        "value": 80,
        "unit": "%",
        "exceeded": True,
        "details": {
            "adjustable": True,
            "requires_cooldown": False,
            "energy_intensive": False,
            "precision_level": "low"
        }
    },
    {
        "parameter": "IdleModePowerConsumption",
        "value": 0.5,
        "unit": "kW",
        "exceeded": False,
        "details": {
            "adjustable": False,
            "requires_cooldown": False,
            "energy_intensive": False,
            "precision_level": "medium"
        }
    },
    ...
]

Hello @Adiorz, I don’t think you need to use json_to_recordset for that.

The use case would be to check if there are any parameters exceeded for setups related to given project (I just made it up). The filtering criteria would be to take into account only adjustable=True and energy_intensive=False.

This can be expressed with the following JSON query (you can test it up here)

$[?(@.exceeded==true && @.details.adjustable==true && @.details.energy_intensive=false)]

And you can use the @? operator to determine if there are any matches for that (assuming you are on Postgres) otherwise the same can be achieved using JSON_EXISTS on other backends (and Postgres 17+).

For this particular case you could create a custom lookup

from django.db.models import JSONField, Lookup

@JSONField.register_lookup
class JSONExists(Lookup):
    lookup_name = "jexists"

    def as_sql(self, compiler, connection):
        lhs, lhs_params = self.process_lhs(compiler, connection)
        rhs, rhs_params = self.process_rhs(compiler, connection)
        params = (*lhs_params, *rhs_params)
        return "%s @? %s" % (lhs, rhs), params

And then from there you could simply do

query = (
    "$[?("
    "@.exceeded==true && "
    "@.details.adjustable==true && "
    "@.details.energy_intensive=false"
    ")]"
)

problematic_setups = MachineSetup.objects.filter(
    config__jexists=query
)

Project.objects.filter(
    Exists(
        problematic_setups.filter(project_id=OuterRef("pk")),
    )
)

Thanks @charettes, this example was very helpful. I did find that I needed to alter this part of the snippet:

params = lhs_params + rhs_params

to

params.extend(rhs_params)

to avoid

     20 lhs, lhs_params = self.process_lhs(compiler, connection)
     21 rhs, rhs_params = self.process_rhs(compiler, connection)
---> 22 params = lhs_params + rhs_params

TypeError: can only concatenate tuple (not "list") to tuple

The reason being that Lookup.get_db_prep_lookup() returns ("%s", [value]).

Grepping through django/db/models/lookups.py, extend or append seems to be the norm, except for Regex, which does the lhs_params + rhs_params, but maybe that works because there’s a prepare_rhs = False attribute? Haven’t looked into it.

Do you think there’s a best pattern for dealing with the params, or do you think there’s an edge here we can soften?

Thanks for letting me know @jacobtylerwalls, I’ll adjust the example accordingly.

In my opinion as_sql and their underlying parts (such as process_(lhs|rhs)) should always return tuple[str, tuple] and not tuple[str, list] for the following reasons

  1. The parameters will be of heterogeneous types, have a significant ordering and length, and are tied to the sql returned.
  2. It avoids passing mutable collections between calls and possibly corrupting some expressions used in following queries (tuple is immutable)
  3. Most of the compilation layer already standardized on tuple[str, tuple]

The resolving / expression layer part is a mixed bad that often cause these kind interoperability problems so I wish we would simply standardize on one over the other and I have a slight preference for tuple[str, tuple] for the reasons I explained above. It’s kind of hard to enforce without a proper typing system though and that’s another can of worms.

1 Like