Hey,
Just curious what you guys think.
Say I’m fetching data related to another class’s object in a view. Something like that, in this example to fetch data to populate Datatables objects with ajax calls (though could be whatever too):
def ajax_comm_details(request):
id = request.GET.get("id")
try:
objs = Entcomm.objects.get(id=id).details.all()
data = {"data": [{"prod":o.prod, "desc":o.desc, "qte":o.qte, "prix":o.prix} for o in objs]}
except:
data = {"data": [{"prod":None, "desc":None, "qte":None, "prix":None}]}
return JsonResponse(data)
Here I need a try-except, since if the template passes me an inexisting ID for whatever reason the .get() is going to fail.
However if instead I do:
def ajax_comm_details(request):
id = request.GET.get("id")
objs = Detcomm.objects.filter(fk_ent__exact=id)
data = {"data": [{"prod":o.prod, "desc":o.desc, "qte":o.qte, "prix":o.prix} for o in objs]}
return JsonResponse(data)
Here class Detcomm has a ForeignKey (fk_ent) to Entcomm, and also has a related_name=“details” on that foreign key to make the syntaxes above possible.
In that specific case (using Datatables), the table rows will display “No data available” (since it got an empty array). Which seems acceptable to me, if the requested data doesn’t exist. And the syntax is a little lighter than wrapping with the try-except.
Any reason to favor one over the other? Purely preferences? Is there likely to be a performance difference?
The latter is likely to be better than the first. Looking at the first:
Entcomm.objects.get(id=id).details.all()
The first part of that:
Entcomm.objects.get(id=id)
returns an object, not a query set - this is query #1. Then, the .details.all()
is query #2 to retrieve the collection of details.
Your second formulation is one query to retrieve the details.
(Note: The improvement is not just with cutting the number of queries in half - it’s also the work that Django needs to do to build the Entcom object from the query result and then build the second query from that.)
Very good point. I was curious I tested it out, just a quick test and your observation seems to bear out.
TL;DR: fetching the object, constructing it & then building the queryset (option #1 above) is roughly 10times slower than using filter on the related table directly (option #2).
I used timeit, something like this, just for a quick test. See timeit lib for details & adapt to your db. My Entfac table has ~10k rows and my Detfac ~35k rows, 37811 being just some random data point, I tried it with a few and the results remain pretty consistent:
python manage.py shell
import timeit
setup_code = """from facturation.models import Entfac, Detfac"""
code1 = """x=Detfac.objects.filter(fk_ent__id=37811)"""
code2 = """x=Entfac.objects.get(id=37811).dets.all()"""
To test & get the results:
print(timeit.repeat(setup=setup_code, stmt=code1, number=10000, repeat=3))
[1.6238743800204247, 1.5671210519503802, 1.569026280019898]
print(timeit.repeat(setup=setup_code, stmt=code2, number=10000, repeat=3))
[16.89584722399013, 16.991071766999085, 15.614984249987174]
Interestingly, if you preftech the object & build it first (and then just run related queries on that one object), this is still slightly slower than .filter():
setup_code = “”“from facturation.models import Entfac, Detfac
… ent=Entfac.objects.get(id=37811)”“”
code2 = “”“x=ent.dets.all()”“”
print(timeit.repeat(setup=setup_code, stmt=code2, number=10000, repeat=3))
[2.156987576978281, 2.123454877990298, 2.1082980600185692]
I do not find that surprising. There is some overhead in the construction of the implicit related field manager. You may find the times to be more comparable if you were to change ent=Entfac.objects.get(id=37811)
to ent=Entfac.objects.prefetch_related('dets').get(id=37811)
Ah - in fact, it appears to be ~3x faster than the .filter(), at least in my setup:
>>> setup_code = """from facturation.models import Entfac, Detfac
... ent=Entfac.objects.prefetch_related('dets').get(id=37811)"""
>>> code2 = """x=ent.dets.all()"""
>>> print(timeit.repeat(setup=setup_code, stmt=code2, number=10000, repeat=3))
[0.05418583896243945, 0.053476020984817296, 0.05059994902694598]
Whereas filter():
[1.6541078349691816, 1.6345764319994487, 1.614007561991457]
Well that’s a bit of a cheat - you shouldn’t factor out that first request in this situation - an apples-to-apples comparison would be to time that pair of lines as a unit. (You’ve changed it from 1 query in each statement to 2 queries in the first statement and 0 in the second.)
Argh yes you’re right - and not it doesn’t help, turns out just as bad as the 10x slower penality from earlier, even somewhat worst.
So yes:
- Reducing the # of queries help (even if the query just returns 1 object)
- Building objects can be slow (in my case, the objects have ~20 columns, plus all the django-specific stuff, I’m sure that’s significant)
- Be careful timing stuff, easy to cheat it out when you’re just banging on the keybaord…
1 Like