How to query for dynamic attributes through Django EAV?

A legacy system’s report table is slow because it contains customized columns with EAV (entity attribute value). The current system behaviour is to:

  1. With a set of transactions referred to as the entity, we query the eav_value and eav_attribute tables for the relevant EAV data.
  2. When initializing the Python serializer, the program loops through the EAV QuerySet and builds a dictionary for random access with the entity_id as the key.
  3. Overwrite the serializer’s to_representation() function to change its default behaviour. The input parameter instance is one single transaction, also called one entity, and the transaction’s ID is the entity_id of EAV. Therefore, we look up the entity’s attributes and values in the above dictionary.

Our system currently uses the original django-eav, not Django EAV 2, although we have fixed compatibility issues and upgraded our forked repository to Python 3.

Despite the arguments about the Pro and Con of EAV, we still hope to improve the system performance within the EAV framework for stability if doable.

We suspect storing many EAV records in a randomly accessed dictionary is inefficient. Usually, the database can more efficiently handle records than an in-memory dictionary. So, we want to improve it by querying EAV data in two steps:

  1. The first database query reads in the metadata: the report lists transactions of the same product, so we retrieve the attributes related to this product and the data types.
  2. Then, make an adaptive QuerySet corresponding to the metadata, query the database for the entity data and append the related EAV data by left outer join.

As a POC (proof of concept), we made a hand-written SQL query with recursive CTEs and tested it on MariaDB v11.5. Please find the sample source code in the Technical Details section with comments as explanations, and let us know for any questions.

The POC has the same time complexity as the old solution. Given page size M and N attributes, the task takes O(M x N). For space complexity, I’m not sure about the dictionary’s implementation space-wise, including hashing, pseudorandom, loading factor, etc., while the SQL query should still be O(M x N).

  • We hope the database handles records more efficiently than an in-memory dictionary so that when the page size M gets to a big number, e.g. tens of thousands, the program still runs reasonably fast with the help of database indexing. In contrast, the dictionary can hit a memory bottleneck, so the task takes longer.
  • We are also considering global searching for the customized columns, so EAV may provide more convenience than wrapping up everything into a text field of JSON.
  • We also need a generic solution within Django’s ecosystem, not the raw SQL query for demonstration.

Our Question:

Besides the raw SQL query above, is it possible to implement the same by Django’s existing functions?

We are open to Dango-EAV2 and other alternative solutions, and we highly appreciate any hints and suggestions.

Technical Details:

1. The relevant part of the Python serializer:

...
class ReportSerializer(serializers.ModelSerializer):
    eav_set = None
    ...

    def __init__(self, *args, **kwargs):
       ...
        value_qs = Value.objects.filter(entity_ct=content_type,
                                        entity_id__in=trans_ids).select_related('attribute')
        eav_set = {}
        for value in value_qs:
            eav_set_value = eav_set.get(value.entity_id)
            if eav_set_value:
                eav_set[value.entity_id].append(value)
            else:
                eav_set[value.entity_id] = [value]
        self.eav_set = eav_set

    def to_representation(self, instance):
       ...
        if self.eav_set:
            eav_tran_set = self.eav_set.get(instance.id)
            if eav_tran_set:
                for value in eav_tran_set:
                    ret_val[value.attribute.slug] = value.value
        else:
            val_dict = instance.eav.get_values_dict()
            for key, val in six.iteritems(val_dict):
                ret_val[key] = val
        return ret_val

...

2. Proof of Concept with a hand-written SQL query:

We referred to Claire’s online article Flattening an EAV model in SQL, the DRY way aiming at a specific scenario. Our query is only a shallow example to demonstrate the thoughts, though. And we can make it more efficient if you have questions, e.g.:

  • whether we can do it without CTE,
  • what CTE’s system behaviour is
  • what is the query execution plan
  • etc. So, for anything, just let us know.
with recursive

-- This CTE simulates a partial set of transactions with pagination,
-- with page size five and offset ten, we are querying for the 3rd page.
-- 
cte_transaction as (
select
    `app_transaction`.`id`
  , `app_transaction`.`create_ts`
from
  `app_transaction`
inner join `app_product` on
  (`app_transaction`.`product_id` = `app_product`.`id`)
where
  (not `app_transaction`.`rec_del`
      and `app_transaction`.`product_id` = __PRODUCT_ID__)
order by
  `app_transaction`.`id` desc
limit 5 OFFSET 10
)



, cte_meta_attribute as (
select
    ''
  , eava.slug 
  , eava.datatype 
from
  app_product ep 
  inner join app_producttemplate ept 
    on ep.product_template_id = ept.id 
  inner join app_transactionattribute eta
    on ept.id = eta.product_template_id 
  inner join eav_attribute eava 
    on eta.attribute_id = eava.id 
where 
  ep.id = __PRODUCT_ID__
)

-- 
-- |slug           |datatype |
-- +---------------+---------+
-- |birthdate      |date     |
-- |phone          |text     |
-- |firstname      |text     |
-- |lastname       |text     |
-- |passtype       |text     |
-- |postalcode     |text     |
-- |reuse_photos   |text     |
-- |clubname       |text     |
-- |firstnamewaiver|text     |
-- |lastnamewaiver |text     |
-- |countrycode    |text     |
-- |waiver         |text     |
--   


, cte_eav_set as (
select
    cte_transaction.id as transaction_id
  , eav_attribute.datatype 
  , eav_attribute.slug 
  , eav_value.*
from 
  cte_transaction
  inner join eav_value
    on cte_transaction.id = eav_value.entity_id 
  inner join eav_attribute
    on eav_value.attribute_id = eav_attribute.id 
)

, cte_eav_set_postalcode as (
select
    cte_eav_set.transaction_id
  , cte_eav_set.value_text
from
  cte_eav_set
where
  cte_eav_set.slug = 'postalcode'
)

, cte_eav_set_birthdate as (
select
    cte_eav_set.transaction_id
  , cte_eav_set.value_date
from
  cte_eav_set
where
  cte_eav_set.slug = 'birthdate'
)

-- 
-- There are more EAV attributes and CTEs, and we can use the 
-- metadata, cte_meta_attribute, to generate the SQL queries automatically, e.g.:
-- 
-- cte_eav_set_birthdate      
-- cte_eav_set_phone          
-- cte_eav_set_firstname      
-- cte_eav_set_lastname       
-- cte_eav_set_passtype       
-- cte_eav_set_postalcode     
-- cte_eav_set_reuse_photos   
-- cte_eav_set_clubname       
-- cte_eav_set_firstnamewaiver
-- cte_eav_set_lastnamewaiver 
-- cte_eav_set_countrycode    
-- cte_eav_set_waiver         
-- 

select 
    cte_transaction.*
  , cte_eav_set_postalcode.value_text as 'postalcode'
  , cte_eav_set_birthdate.value_date as 'birthdate'
from 
  cte_transaction
  left outer join cte_eav_set_postalcode
    on cte_transaction.id = cte_eav_set_postalcode.transaction_id
  left outer join cte_eav_set_birthdate
    on cte_transaction.id = cte_eav_set_birthdate.transaction_id
-- 
-- ... more left outer join's for the other attributes, e.g. 
-- left outer join cte_eav_set_phone ...
-- ...

;

-- Final Results:
-- id    |create_ts              |postalcode|birthdate ||Others... |
-- ------+-----------------------+----------+----------++----------+
-- xxxxxx|xxxx-xx-xx xx:xx:xx.xxx|XXXXXX    |xxxx-xx-xx||... ...   |
-- xxxxxx|xxxx-xx-xx xx:xx:xx.xxx|XXXXXX    |xxxx-xx-xx||... ...   |
-- xxxxxx|xxxx-xx-xx xx:xx:xx.xxx|XXXXXX    |xxxx-xx-xx||... ...   |
-- xxxxxx|xxxx-xx-xx xx:xx:xx.xxx|XXXXXX    |xxxx-xx-xx||... ...   |
-- xxxxxx|xxxx-xx-xx xx:xx:xx.xxx|XXXXXX    |xxxx-xx-xx||... ...   |
-- 

We have a tentative prototype working now, similar to the SQL snippet below. Hope to bring up people’s interest in EAV in our Django community.

We are still climbing the learning curve, but for any questions, just let me know.

select 
    t.id, t.create_ts
    -- ... a few other native columns from the `transaction` table
  , eav_postalcode.value_text as postalcode
  , eav_phone.value_text as phone
  -- ... to assemble more EAV attributes, 
  -- and according to `datatype` retrieve the `value_text`, `value_date`, etc.
from
  (
    select * from transactions
    where product_id = __PRODUCT_ID__
  ) as t
  left outer join eav_value as eav_postalcode
    on t.id = eav_postalcode.entity_id and eav_phone.attribute_id = 122
  left outer join eav_value as eav_phone
    on t.id = eav_phone.entity_id and eav_phone.attribute_id = 123
  -- ... to assemble more EAV attributes
;