App consumes too much memory - probably bad query?

perczok · December 22, 2023, 1:40pm

Hi,

So I wanted to make some kinda of interface for db in Mssql. I wanted to try something like Django and pls dont hate me im still new to this. I have an issue with memory that pod is using. So my data is about 4mil and I filter it down depending on variables. Model has 67 fields and I have to display them all. So I use something like this:

#this is for displaying each id with latest date and output from this is correct
subquery=model.objects.filter(id=OuterRef(‘id’)).values(‘id’).values(‘id’).annotate(max_date=Max(‘date’)).values(‘max_date’)

#some filtering values that i put there as well
filtered_values.append((‘model_field_date’,Subquery(subquery)
filtered_values.append((‘some_field__in’, some_values))

and heres filtering form in the view (which is built like def home_view(request)):

if request.method==‘GET’:
filter_form=Filter_form(request.get,data)
if filter_form.is_valid():
#code for appending new filtered_values if filtered_field

then after checking all values that should be included in query i do this:

data=model.objects.filter(**dict(filtered_values))

and then i do the part with paginator:

paginator=Paginator(data,items_per_page)
try:
data=paginator(page)
except PageNotAnInteger:
data=paginator.page(1)
except EmptyPage:
data=paginator.page(paginator.num_pages)

then i use django-tables2 to display table in template like this table =Model_table(data)
and put it into context:

‘table’:table,
‘data’:data,
‘filter_form’:filter_form,

And the issue is that on sysdig i see that memory used by pod is super high (like 400Mi with one user logged in and i was crushing when memory limit was 2Gi) even when number of rows in queryset is like 30k (can be max of 300k) and if few people try to use it at the same time im just runnig out of memory.

And heres the question to you, cause i changed setup a litle bit in lower env. I removed paginator part and django-tables2 i letf only queryset part with filter_form and i had some difficulties displaying data directly from queryset so i did this:

final_data=
for record in data:
record_data={}
for field in record._meta.get_fields():
record_data[field.name]=getattr(record, field.name)
final_data.append(record_data)

and on the db query i added .iterator() but still memory on 4k row queryset is like 152Mi and i dont know how to optimize it since paginator does not work with iterator.

Im noob i know i even dont know how to paste here test with code format

Can you help?

KenWhitesell · December 22, 2023, 3:52pm

First, when posting code here, enclose the code between lines of three backtick - ` characters. This means you’ll have a line of ```, then your code, then another line of ```. This forces the forum software to keep your code properly formatted.

Please post the complete view as it is written. Seeing it chopped up in parts like this makes it difficult to read and understand what’s going on.

When you first joined, you should have gotten a message or a link from “discobot” as a tutorial for the site. Following that tutorial would give you a lot of useful information about how to best use this forum.

perczok · December 22, 2023, 5:27pm

Ok, sorry for that. If you would need anything more let me know. Here is full view before i tried iterator.

@login_required(login_url='/login')
def home_view(request):
    items_per_page=25
    filtered_values=[]
    raw_data=None
    filtered_id_value=None
    codes=request.session.get('codes',[])
    subquery=model.objects.filter(id=OuterRef('id')).values('id').annotate(max_date=Max('row_date')).values('max_date')

    filtered_values.append(('row_date',Subquery(subquery)))

    if request.method=='GET':
        filter_form=FilterForm(request.GET,codes=codes)
        if filter_form.is_valid():
            filtered_id_value=filter_form.cleaned_data['id']
            if filtered_id_value:
                filtered_values.append(('id',filtered_id_value))            
            raw_data=model.objects.filter(**dict(filtered_values))
            request.session['filtered_values']=filtered_values
    else:
        filter_form=FilterForm(codes=codes)

    if raw_data==None:
        raw_data=model.objects.filter(code__in=codes,row_date=Subquery(subquery))


    paginator=Paginator(raw_data,items_per_page)
    page=request.GET.get('page')

    try:   
        data=paginator.page(page)
    except PageNotAnInteger:
        data=paginator.page(1)
    except EmptyPage:
        data=paginator.page(paginator.num_pages)  

    table = Model_Table(data)  

    context={
        'table':table,
        'data':data,
        'filter_form': filter_form,

    }
    RequestConfig(request).configure(table)
    return render(request, 'app/home.html', context)

perczok · December 26, 2023, 7:17pm

Hi,

Could anyone help with this problem?

Thanks

perczok · January 5, 2024, 3:51pm

Hi,

Could you take a look now with full code of view? I really don’t know what could be the reason.

KenWhitesell · January 5, 2024, 11:42pm

There’s too much going on here to diagnose this from the information available. This could be related to the query, it could be related to some of the other objects being created in this view - or even in other views in your system. It could be related to how you’re running your Django instance.

Unfortunately, I don’t know of any easy way to resolve this without a lot of experimentation and knowlege of the entire environment.

perczok · January 7, 2024, 1:51pm

Other than this I have login, logout and edit views.

In login and logout there is not much going on. In the edit I just call one row and I update few fields in the record. That’s why I posted main view cause there is much more to think about.

So this view is too complicated? I mean is it standard to make views much more simple? Cause I thought it’s simple view to show paginated data with option to filter. How should I maybe redesign this view?

KenWhitesell · January 7, 2024, 2:32pm

No, the view is not too complicated.

However, there’s a lot more involved here regarding memory utilization than just the view.

For one example, in your view you have the line:

What is “Model_Table”? What memory is being allocated when this instance is created?

And to repeat what I posted before:

There are other settings and environmental factors that can be affecting how memory is being used. You can’t just look at this one view and say that this is causing your memory consumption. You need to look at everything that is going on to get a clear picture.

So, a proper analysis would involve identifying where that memory is being used and what has allocated it. And that’s what takes the time and effort to establish.

See Debugging and Profiling — Python 3.12.1 documentation, and particularly the pages for gc — Garbage Collector interface — Python 3.12.1 documentation and tracemalloc — Trace memory allocations — Python 3.12.1 documentation to see the tooling available for identifying how and where memory is being allocated and used.

Getting familiar with pdb and using it with your application can also be helpful with this.

perczok · January 7, 2024, 2:45pm

Ok, thanks for info. I’ll investigate and will try come back with more.

perczok · January 11, 2024, 8:40pm

Hi,

So i did some work on it.

I run it now locally with recreated data in mssql 300k rows 67 columns and debugg set to False, so its powershell with

python manage.py runserver

To summarize:

gc.collect() did not help after del every unnecessary variable,
tracemalloc shows top 10 stats and differences but those values are so small compared to memory consumption:

[ Top 10 differences ]
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:94: size=584 B (-5151 KiB), count=1 (-69524), average=584 B
...\venv\lib\site-packages\ldap3\protocol\formatters\formatters.py:39: size=0 B (-2357 KiB), count=0 (-5607)
...\venv\lib\site-packages\ldap3\utils\asn1.py:149: size=34 B (-2270 KiB), count=1 (-5616), average=34 B
...\venv\lib\site-packages\ldap3\utils\ciDict.py:52: size=0 B (-1581 KiB), count=0 (-21952)
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:396: size=0 B (-1242 KiB), count=0 (-10965)
...\venv\lib\site-packages\ldap3\utils\ciDict.py:159: size=0 B (-480 KiB), count=0 (-5487)
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:86: size=440 B (-471 KiB), count=1 (-10968), average=440 B
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:467: size=0 B (-409 KiB), count=0 (-5484)
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:84: size=0 B (-381 KiB), count=0 (-5481)
...\venv\lib\site-packages\ldap3\protocol\rfc4512.py:538: size=0 B (-317 KiB), count=0 (-4188)



[ Top 10 ]
frozen importlib._bootstrap_external: 647: size=1825 KiB, count=19860, average=94 B
...devtools\Python3\lib\pathlib.py:198: size=1156 KiB, count=19621, average=60 B
...devtools\Python3\lib\pathlib.py:1232: size=273 KiB, count=2351, average=119 B
...devtools\Python3\lib\tracemalloc.py:505: size=222 KiB, count=4048, average=56 B
...devtools\Python3\lib\pathlib.py:728: size=201 KiB, count=1566, average=131 B
...devtools\Python3\lib\tracemalloc.py:498: size=190 KiB, count=4047, average=48 B
...devtools\Python3\lib\tracemalloc.py:193: size=190 KiB, count=4047, average=48 B
...devtools\Python3\lib\pathlib.py:79: size=147 KiB, count=1280, average=118 B
...devtools\Python3\lib\pathlib.py:706: size=142 KiB, count=1512, average=96 B
...devtools\Python3\lib\tracemalloc.py:67: size=110 KiB, count=1764, average=64 B

I thought maybe the view itself is badly designed and this is the issue (probably it is to be honest ) so i changed it to class based:

class HomeView(LoginRequiredMixin,ListView):
    model=model
    template_name='home.html'
    context_object_name='data'
    paginate_by= 25 # i usually run it with 100 or 500/page and  memory usage is bigger then   

    def get_queryset(self):
        codes = self.request.session.get('codes',[])
        subquery=model.objects.filter(id=OuterRef('id')).values('id').annotate(max_date=Max('date')).values('max_date')
        filtered_values.append(('code__in',codes))
        filtered_values.append(('date',Subquery(subquery)))    

        filter_form=FilterForm(self.request.GET,codes=codes)
        if filter_form.is_valid():
            filtered_id_value=filter_form.cleaned_data['id']
            if filtered_id_value:
                filtered_values.append(('id',filtered_id_value))
            
            filtered_code_value=filter_form.cleaned_data['codes_choices']
            if filtered_code_value:
                filtered_values.append(('code',filtered_code_value))
            queryset=model.objects.filter(**dict(filtered_values))            
            self.request.session['filtered_values']=filtered_values
        else:
            filter_form=FilterForm(codes=codes)

        return queryset.order_by('id')

   
    def get_context_data(self, **kwargs):
        column_names=[]
        fields=model._meta.get_fields()
        for field in fields:
            column_names.append(field.name)
        context['column_names']=column_names
        
        codes = self.request.session.get('codes',[])
        context['filter_form']=FilterForm(self.request.GET,codes=codes)

        return context

I was also trying to observe how the memory acts while walking thru the pages. And even when i go back and forth memory keep increasing. So i thought maybe i should load the data with JSON instead of django queryset and it didnt help. So maybe its the issue with my HTML?


    <form method="get" action="{% url 'home' %}">
      {{filter_form.as_p}}
      <input type="submit" value="Filter">
    </form>
    <div id="table-data" class="table-data">
      <table id="content-table" class="scrollable-table">
        <thead>
          <tr>
            <td><strong class="uppercase-text">Action</td>
            {% for column_name in column_names %}
            <td><strong class="uppercase-text">{{ column_name }}</strong></td>
            {% endfor %}
          </tr>
        </thead>
        <tbody>
            {% for item in data %}
              <tr>
                <td>
                <a href="{% url 'edit' item.id item.snapshot %}">Edit</a>                
                </td>
                {% for column in column_names %} <!-- i dont do it this way in my file but there are so many columns that i put it here in the nested loop so you can see what im doing hopefully  -->
                    <td> {{ item.column }} </td>
                {% endfor %}
              </tr>
            {% endfor %}
        </tbody>
        </table>

Every loaded page on it adds 5-10 Mib with 25items/page for 100/page 25-35Mib for 500/page its 150-200 Mib in every loaded page. So html is also super simple and i dont know how to improve it.

Maybe im missing some config settings? But i don’t know which ones

KenWhitesell · January 11, 2024, 9:20pm

Lots of references in this list to ldap3.

Do the models you’re referencing have any fields or methods that are associated with an ldap database?

I think we need to see the models being rendered here, along with all of its methods and any custom manager being used.

perczok · January 11, 2024, 9:54pm

ldap3 is being used only in login view to authenticate. Data model has nothing from ldap3. I’ll put how data model looks like in my models.py tomorrow.

Not sure I understand what it means. What do you mean rendering models?

perczok · January 12, 2024, 10:43am

So this is my data model since there are a lot of fields i will just ignore most of it (cause every type of field like CharField or IntegerField are all the same in the model):

class model(models.Model):
    id = models.CharField(max_length=50,primary_key=True)
    some_char_fields = models.CharField(max_length=50)
    one_char_choice_field= models.CharField(max_length=50, choices=INDICATOR_CHOICES,blank=True,null=True) #3 choices here
    one_date_field = models.DateField(null=True,blank=True)
    some_datetime_fields = models.DateTimeField(db_column='_date' ,null=True,blank=True)
    some_int_fields = models.IntegerField(null=True,blank=True)
    some_boolean_fields = models.BooleanField(null=True,blank=True)

    class Meta:
        db_table='[schema].[model]'

I was experimenting even with max_length of char fields cause those are most common in the table and i was putting there max values of 4000 and result was the same as with 50 chars.

And this is it no additional methods or anything. Also I dont have any custom menagers.

perczok · January 16, 2024, 7:59am

Also I just noticed with django-debug-toolbar that paginator query for num_pages takes avg 11 sec to execute and return number of pages for data. Could this take this memory?

KenWhitesell · January 16, 2024, 1:28pm

Absolutely - if your Paginator instance is causing a full query to be executed, that most definitely could consume significant amounts of memory.

perczok · January 19, 2024, 9:12am

Hmm ok then, so what is the alternative? cause I need pagination

perczok · January 19, 2024, 9:23am

Cause I see that for example if I use Django rest framework I still need to use paginator.num_pages which will call call function and same query will be executed

KenWhitesell · January 19, 2024, 12:36pm

I’m going to guess that something in your query - perhaps the Subquery - needs to be evaluated fully before the count queryset method used by the Paginator can return a value. (See the green “Note” box in the docs at Pagination | Django documentation | Django)

I’d try to chase this down using the Django shell - see how it acts interactively based upon some simple test cases of what might be entered. I’d both observe the timing that the statements take and what - if any - queries are executed by looking at connection.queries. (Note: You don’t need to create an instance of Paginator. Instead of that, just run raw_data.count(). If that always runs quickly, then try creating the Paginator instance from it.)

If the subquery is the issue - and I’m suspecting it is because of the values clause, then resolving this is adequately may require a different approach for your requirements here.

perczok · January 22, 2024, 1:16pm

Ok thanks. I’ll try to investigate more and will come back with results.

rafalp · January 25, 2024, 6:04pm

Remember that other approaches to pagination exist, like cursor pagination, which skips the count of all existing items altogether.

Topic		Replies	Views
django-filter & pagination Using Django	8	9884	March 26, 2020
listing items with filtering/sorting do not work with pagination Forms & APIs	11	3150	June 12, 2023
retrieving 1000s of rows performace Forms & APIs	32	4364	February 4, 2022
Query performance question Using Django	15	3247	September 15, 2021
Reloading the same page drives memory usage up until the workers die ORM	3	2590	January 27, 2022

App consumes too much memory - probably bad query?

Related topics