Get all children of self-referencing Django model in nested hierarchy

introduction
We’re currently working on a Django REST Framework project. It connects to a Postgres database that holds some hierarchical (tree structure) data, that goes a number of levels deep. We should offer an endpoint for GET requests that returns the entire nested tree structure (parent, children, grandchildren etc.) when no parameter is offered.

Sample data
The table below shows the sample data of regions, where each region can have a parent, indicating the hierarchy of regions. In this example, the hierarchy is three levels deep (world>continent>country). But in reality, the tree could go much deeper, having an unknown number of levels (world>continent>country>province>city>neighborhood>etc.).

id region parent_region_id
1 world NULL
2 europe 1
3 asia 1
4 africa 1
5 belgium 2
6 germany 2
7 spain 2
8 japan 3
9 indonesia 3
10 vietnam 3
11 tanzania 4
12 egypt 4
13 senegal 4

Our goal
The JSON output shown below is what we try to achieve. It’s the goal for the response body of the GET request for the /region resource.

{
   "id":1,
   "region":"world",
   "children":[
      {
         "id":2,
         "region":"europe",
         "children":[
            {
               "id":5,
               "region":"belgium"
            },
            {
               "id":6,
               "region":"germany"
            },
            {
               "id":7,
               "region":"spain"
            }
         ]
      },
      {
         "id":3,
         "region":"asia",
         "children":[
            {
               "id":8,
               "region":"japan"
            },
            {
               "id":9,
               "region":"indonesia"
            },
            {
               "id":10,
               "region":"vietnam"
            }
         ]
      },
      {
         "id":4,
         "region":"africa",
         "children":[
            {
               "id":11,
               "region":"tanzania"
            },
            {
               "id":12,
               "region":"egypt"
            },
            {
               "id":13,
               "region":"senegal"
            }
         ]
      }
   ]
}

What we’ve tried and achieved so far
Here’s how we tried to achieve our goal. See code below for models, serializers and views:

Models.py
________
class HierarchyData:
                region = models.CharField(max_length=100, null=False, default=None)
                parent = models.ForeignKey("self", models.DO_NOTHING, null=True, blank=True, db_column='parent', related_name="children")
 
 
Serializers.py
__________
class HeirarchyDataSerializer(serialisers.ModelSerializer):
                class Meta:
                                model = HierarchyData
                                fields = [“id”,”region”, “children”]
               
Views.py
__________
Class ListHierarchyData(generics.ListAPIView):
                queryset = HierarchyData.objects.all()
                serializer_class = HeirarchyDataSerializer
                permission_classes = [isAuthenticated]

When I call the end point for given scenario, I get the JSON response in the following format:

       {
                                “id”: 1,
                                “region”: “world”,
                                “children”: [ 2,3,4]
                }

Related Stack Overflow questions that didn’t seem to answer my issue

  1. How to recursively query in django efficiently?
  2. Django - Models - Recursively retrieve parents of a leaf node
  3. Django self-recursive foreignkey filter query for all childs

Above mentioned question partially solves my problem but I’m still unable to get the desired result. See details below:

1: I can’t touch database directly, I have to interact with database with ORM only.

2: Recursive time out and can’t serialize, saying object of type “Model” is not serializable.

3: This one partially worked for me: Based on this post, I tried to add the following in the model:

def get_children(self):
          children = list()
          children.append(self)
          for child in self.children.all():
              children.extend(children.get_children())
          return children

I then get all nested children, but all nested values are on the same level. For example world has children [2,3,4] and those have (grand)children themselves. Then it lists those on the same line, e.g children = [2,3,4,5,6,7,8,9, 10, 11,12,13]. This doesn’t represent the levels in the sample data.
Then I tried the following solution for the model:

def get_all_children(self, include_self=True):
    r = []
    if include_self:
        r.append(self)
    for c in Person.objects.filter(parent=self):
        _r = c.get_all_children(include_self=True)
        if 0 < len(_r):
            r.extend(_r)
    return r

That one works; it finds the nested children but it creates two issues: a. It gives me serializer errors when I use the code as it is, but if I add ‘get_all_children’ in serializer and add a different serializer for that attribute, then it serializes the objects, which I’m ok with. b. It is unable to append them in a nested fashion, it just nests a list inside another list without having children. It shows the data like this (limited to Europe, to not have a huge example shown here):

{
   "id":1,
   "region":"world",
   "get_all_children":[
      [
         {
            "id":2,
            "region":"europe"
         }
      ],
      [
         [
            {
               "id":5,
               "region":"belgium"
            }
         ],
         [
            {
               "id":6,
               "region":"germany"
            }
         ],
         [
            {
               "id":7,
               "region":"spain"
            }
         ]
      ]
   ]
}
 

Now the data is fine except that after Europe it doesn’t start to nest the children inside the same array, it just starts a new array for the children and append them with outer list. It basically adds a nested structure, without nesting it inside the parent.

Our question
How can we return the output mentioned in ‘our goal’ for this region data, that holds a tree structure that goes an unknown amount of levels deep? of course it’s finite depth.

The only constraint I have to follow is that I can’t edit the views part!

This is a rather interesting set of constraints that you say you’re working under.

Yes, it’s probably possible to greatly improve over what you have now, but I’m not sure if it’s just putting a bandage on a deeper problem concerning the construction of your database.

The problem with any true recursive implementation is that runtimes get progressively worse the deeper your tree becomes.

There are so many better ways of representing a tree structure in a relational database that if you’re looking at any serious implementation of something like this, you really should be pushing for a better implementation.

Leaving all that aside for the moment, the simplest way of building the recursive data structure that you are looking for would be a model method like this:

class HierarchyData(...):
    ...
    def get_tree(self):
        self.descendents = list(self.children.all())
        for child in self.descendents:
            child.get_tree()

Calling this function from an instance then populates a model attribute with the complete tree from that point.
example:

root = HierarchyData.objects.get(parent=None)
root.get_tree()

Also from Serializer fields - Django REST framework

See: djangorestframework-recursive

Can you use django-cte? We have a nested tree structure in our data model using a recursive CTE (common table expression) worked for a while. We eventually denormalized the relationship into a materialized view to avoid the recursion.

I tried the solution but unfortunately it doesn’t work. It get the children in one array, not in nested manner. May be either to edit the view to write custom query or change my model or create additional models to hold that information, on save it will update the other table with foreign keys or something like that.

Thank you for help.

I looked up the django-cte but I didn’t get it, like how can I change something in serializer or model function and django-cte will work with that.

@shermari Are you solved this problem? I have a similar problem.

If you are looking for assistance with an issue, I would suggest you start a new topic, and post the specific information for your situation. (Notice how the original poster showed the sample data, models, views and serializers.)

1 Like

Hi yes, I found the solution.
One can you to_representation in the serializer in following manner,

def to_representation(self, instance):
        self.fields['children'] = SerializerClass(many=True, read_only=True)
        return super(SerializerClass, self).to_representation(instance)

This gives the exact representation which I was seeking, may be not best performance wise or something but it gives the desired output.

2 Likes

Hello @KenWhitesell, i fell upon this discussion and though the last message is the solution for me as well, I truly am interested in your opinion about the best implementation this could have performance-wise, i thought the solution implementation was nice, but thinking about it, it’s true that it’ll get worse with deep trees. What would your suggestion be, database-architecture-wise and code-wise, to replicate infinite nesting without self indexing the table ?

There are (in general) three common patterns for representing a hierarchical structure within a database. In general, all three maintain “structural data” beyond just tracking foreign keys, and each have their advantages and disadvantages depending upon the relative frequency of common operations on those structures.

See my post at Some modelling advice - #12 by KenWhitesell for more details and a couple of useful links. (And yes, we still rely upon treebeard.)

This was the EXACT piece I was missing. I wish I had found this thread before I read a thousand other (mostly out of date, not working with DRF 4) suggested solutions.