when I reading the doc:
i see :
Entry.objects.filter(pub_date__year=2006)
With the default manager class, it is the same as:
Entry.objects.all().filter(pub_date__year=2006)
does that mean django will first fetch all data from db to queryset (cache or memory) then filter the queryset locally ?
so if I have 10 million data in a db and I only need 10 of them, I also have to fetch the 10 million data from db to django then filter out 10 rows locally ?
No - you’re on the right page, but just haven’t quite read far enough. See https://docs.djangoproject.com/en/3.0/topics/db/queries/#querysets-are-lazy
The whole linked page When QuerySets are evaluated is worth reading.
Here’s the problem I was running into with the django ORM solution and what i’ve done so far to try and come up with a solution that stays faithful to the traditional Django ORM. while reducing DB Caching under the hood. I’d love to figure out how to bring this to the Django Community to see if it would be worthwhile to worked into the Django libraries contrib or otherwise what would be the appropriate approach here
If you have a usecase where you need to perform multiple filters on the same function in the same view in the current solution each time you filter you are going to hit the DB. lets say for example a calendar page that displays last month this month and next month, you are going to do three filter queries and hit the DB three times
m1 = Calendar.objects.filter(month='Nov')
m2 = Calendar.objects.filter(month='Dec')
m2 = Calendar.objects.filter(month='Jan')
# in template
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
a more optimal solution and what i’ve got codded to a proof of concept level
ms = Calendar.objects.filter(year=2009).run() # Or some filter that gets all
# the months you need
# .run() causes DB hit
m1 = Calendar.objects.filter(month='Nov') # These three filter methods detect
m2 = Calendar.objects.filter(month='Dec') # the presence of _result_cache
m2 = Calendar.objects.filter(month='Jan') # after .run() and use logical
# python to filter the queryset
# rather than going back to the DB
# Allowing the use of consistent
# ORM language for filtering yet
# not Hitting the DB multiple times
# in template
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # No DBHit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # No DBHit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # No DBHit
If you did want the original behavior simply leaving out the original run
ms = Calendar.objects.filter(year=2009) # Or some filter that gets all
# the months you need
m1 = Calendar.objects.filter(month='Nov') # In the absence of _result_cache
m2 = Calendar.objects.filter(month='Dec') # these three functions fall back
m2 = Calendar.objects.filter(month='Jan') # to the original django filter
# and hit the DB
# in template
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
{% with cal=m1 %}{% include "calendar/display.html" %}{% endwith %} # DB Hit
additionally a dont_cache parameter could be supplied to filter to force the original django filter.
This would be to handle caching withing the context of one view and one queryset so it shouldn’t cross contaminate with what other web clients are doing via a global cache
This intends to retain the original look and feel of Django ORM not forcing the programmer to consistently program using the Django ORM paradigm without need to do additional DB optimizations by hand, while still achieving DB optimization
It would be awesome if someone could point me in the direction of where to go with this idea to see if it is something that would fit into the Django paradigm