django-filter & pagination

Hi, another question regarding queries (due to being new and size of table (~300M rows), I am quickly finding myself out of depth with the ORM!)

I am trying to create a page that displays the raw sensor data for a specific device (modem). I am using django-filter and pagination which is…sort of working.

I have a very similar page from the summary data model (~30k rows) which is using the same pattern to the one below, and it is working ok. I can filter the displayed data on each device using the filter form (which is loaded from a device model which gives me a list of devices), and move through pages using pagination. Although I have a suspicion there is a better way than the way I am doing it with a getElementById script?

However on the sensor data page, due to the size of the table, my laptop is running out of memory (64Gb) before displaying the page.

On the initial load, it seems to be loading the entire table (I notice the filter form defaults to empty value, therefore all data).

Is there a way to stop it doing this? All I want is 100 rows at a time even on initial load.

models.py:

class Sensordata(models.Model):
row_id = models.AutoField(primary_key=True)
device = models.ForeignKey(Devices, models.DO_NOTHING)
api_param = models.ForeignKey(Apiparams, models.DO_NOTHING)
timestamp = models.DateTimeField(db_column='TimeStamp')
value = models.CharField(db_column='Value', max_length=255, blank=True, null=True)

class Meta:
    managed = False
    db_table = 'sensordata'
    verbose_name = 'Sensor Data'
    verbose_name_plural = 'Sensor Data'

def __str__(self):
    return self.device

filters.py:

class SensordataDevicesFilter(django_filters.FilterSet):

device = django_filters.ModelChoiceFilter(queryset=Devices.objects.order_by('serialno').all())

class Meta:
    model = Sensordata
    fields = ['device', ]

views.py:

def sensordata_list(request):

queryset = Sensordata.objects.select_related("device__machine","api_param__param").order_by('timestamp')
filtered_qs = SensordataDevicesFilter(request.GET, queryset)
filtered_qs_form = filtered_qs.form

filtered_qs = filtered_qs.qs
paginator = Paginator(filtered_qs, 100)

page = request.GET.get('page')
try:
    response = paginator.page(page)
except PageNotAnInteger:
    response = paginator.page(1)
except EmptyPage:
    response = paginator.page(paginator.num_pages)

return render(request, 'modem_api/sensordata_list.html', {'filter': response, 'filtered_qs_form':filtered_qs_form})

sensordata_list.html:

{% extends "_base.html" %}
{% load crispy_forms_tags %}

{% block title %}Sensor Data{% endblock title %}

{% block content %}

  <h1>Sensor Data List</h1>
  <div class="bg-dark text-white rounded py-3 px-5">

  <div class="pagination">
  <span class="step-links">
      {% if filter.has_previous %}
          <a href="?device={{ selectItem.value }}&page={{ filter.previous_page_number }}">previous</a>
      {% endif %}

      <span class="current">
          Page {{ filter.number }} of {{ filter.paginator.num_pages }}.
      </span>

      {% if filter.has_next %}
          <a href="?device={{ selectItem.value }}&page={{ filter.next_page_number }}">next</a>
      {% endif %}
  </span>
</div>

<div class="row">
  <div class="col-md-3">
    <form method="GET">
      {{ filtered_qs_form|crispy }}
      <button type="submit">Search</button>
    </form>
      <h2>{{ filter.0.device.serialno }}</h2>
      <h2>{{ filter.0.device.machine.machineserialno }}</h2>
  </div>

  <div class="col-md-9">
    <table class="table table-striped table-dark">
        <tr>
          <th>API Param</th>
          <th>Value</th>
          <th>Timestamp</th>
        </tr>
            {% for obj in filter %}
            <tr>
                <td>{{ obj.api_param.param.parameter }}</td>
                <td>{{ obj.value }}</td> 
                <td>{{ obj.timestamp }}</td>
            </tr>
            {% empty %}
                <li>No devices yet.</li>
            {% endfor %}
    </table class="table">
  </div>
</div>
<script>
    var e = document.getElementById("id_device");
    var device_num = e.options[e.selectedIndex].value;
  </script>
{% endblock content %}

In your case, try to iterate on

{% for obj in filter.object_list %}

Let me know if that works for you.

Hi Tim, unfortunately not, seems to be same result.

Strange, the pagination should be working. Are you sure it loads all the objects?

Try to add {{ filter.object_list|length }} and {{ filter|length }} to the template and see what numbers it shows. If 100 then pagination works.

Hi Tim, really appreciate your help on this!

To be honest, I am not sure what it is doing, I only have the outcome of the memory creeping upwards.

I have the two tables, raw data (~300M rows) and summary data (~43k rows) and I use the same pattern to display both. I can do everything with the summary data so it is a good sandbox and comparison. Even initial load does not take long at all.

I added your length code to the summary data first, and both came back as 100 when pagination was needed, and the correct row count when filtered per device. I can also click through the pages after the initial load (no device filtering), so all good (As a note I found an error in the filters.py file where the model was set incorrectly, it is now devices).

I also added the code to the raw sensor data and same issue as before. On initial load, the page looks like it is loading and laptop memory creeps steadily upwards until I have to restart.

If I then go straight to a filtered url with a device with a small amount of data, I get the page to load. It takes a few seconds, but even a device with a small amount of data is still 444 pages. Both lengths are 100.

However interestingly when I try and click next to go to page 2 of 444, the same memory issue happens?!

When looking at task manager it appears that mysqld.exe is using the most memory, CPU and disk. Only 541MB so nowhere near the 64Gb that it eventually climbs to, but maybe this is pointing to a mysql thing instead of something loading in Django?

How many devices you have in the DB?

We have about 800 devices at the moment

I’m concerned with this part, not sure what api_param__param is, but it looks like you’re doing a JOIN on a pretty large table which might lead to a huge memory leak. Try to experiment with select_related, remove it at all, then try to add device__machine and api_param__param one by one and see what happens.

Thanks, I will have a look and experiment in that area. Since there are multiple APIs (different suppliers), each with slightly different names for certain parameters (e.g. API_param: engine RPM or engine speed), the __param at the end is an attempt to have “standard” names that the user sees. So everything will be a standard Engine RPM. But you are right that it is another join. And thinking about it, one that is not necessary on the raw data. These screens are more for data engineers rather than end users.

I will go right back to basics and try and strip out any joins. I might also try to experiment with values. I have watched a few videos from Djangocon on this area also just to try and understand what these queries are actually doing. I moved away from values() as this was stripping out the get_absolute_url() from the object (when I was trying to use list/detail pattern). However I am no longer using this with the current pattern, so maybe not a problem…

Slow going when I have to restart my computer every time!