I’m new to these forums, but I’ve been using django professionally for work these past few years. Here and there, I’ve begun to find paper cuts that make the django ORM difficult, and sometimes dangerous to use on our fast-growing team. I wanted to specifically highlight one of those areas today, and gauge the community’s feelings on it.
We recently had an incident at work that was partially due to our use of the QuerySet.first() method in scenarios where we want to get the first result of a query, or if one does not exist, just None. The problem here is that we don’t actually care about the ordering of the results, but .first() will automatically add .order_by(“pk”) if the query is not already ordered, and this additional unnecessary ordering caused postgres to incorrectly use the primary key index on our table (due to the order by), rather than using the index relevant to the .filter(…) in our query, which led to massively degraded performance in a hotpath.
So far as I can tell, there are no methods available in QuerySet that are equivalent to .first(), but unordered, and so to achieve similar functionality, you generally have to write several lines of code instead. This creates a natural incentive to use .first() instead of those alternatives (due to the ergonomics), even though it may improperly apply ordering to a query which does not otherwise need or want it.
So my proposal is to take inspiration from the ActiveRecord ORM used by Ruby on Rails, and add a .take() method, documented here: ActiveRecord::Associations::CollectionProxy
Thanks for taking the time to read! Curious to hear others’ thoughts on this.
So far as I can tell, there are no methods available in QuerySet that are equivalent to .first(), but unordered
Would next(iter(qs[:1]), None) address your use case? It would be nice to see the multi-line recipe you’re using now.
I did have this same use case when I needed to access prefetched relations, and I ended up with the verbose:
my_instance.long_related_model_set.all()[0]
if my_instance.long_related_model_set.all()
else None
if the query is not already ordered, and this additional unnecessary ordering caused postgres to incorrectly use the primary key index on our table (due to the order by), rather than using the index relevant to the .filter(…) in our query, which led to massively degraded performance in a hotpath.
Model managers are nice for encapsulating query logic such that you know the intended index will be used. That’s where I would implement a take() method on your models that need them.
Would next(iter(qs[:1]), None) address your use case? It would be nice to see the multi-line recipe you’re using now.
That is better than what we ended up implementing in the rush of incident management (we used .all()[0] wrapped in a try/catch for handling IndexError), but it is still comparatively more-verbose/less-discoverable compared to a built-in QuerySet method. In other words, this still has poor ergonomics.
I suppose I just feel like this is a very generic feature, fit for use across really any QuerySet. Most other ORMs have an ergonomic method for getting the first row out of a query without auto-ordering it. It actually feels rather divergent that Django’s ORM does not provide a way to do this, IMO.
Some examples other than ActiveRecord (sorry some aren’t hyperlinks, I’m limited in how many links I can include in a post as a new user):
None of these include auto-ordering that I could see, which already makes Django’s ORM somewhat surprising here (it was to me when I first found out about it!), unless you are coming from a Rails background. IMO, providing a method that supports this kind of query is a very natural API for any SQL ORM to support.
I’m not sure I share you concern here. SQL without ORDER BY already is non-deterministic in the results it returns, and developers at large are generally aware that if they need a deterministic ordering for query results, they need to ask for it explicitly. And of course, .first() will continue to exist, and I think novice developers who are not aware of such concerns are much more likely to find .first() than .take() anyway.
I’m starting to agree with you – I ended up editing my response once I remembered I had to implement something similar to take the first element of out a prefetched relation to show what I resorted to
I’m glad to hear you are finding this at least directionally compelling
Is there anything I could do to help contribute something like this upstream? I would be happy to write up a more detailed specification/feature request somewhere and/or put together a pull request which implements this method to enable further discussion. I am open to whatever you would suggest!
P.S. I just saw that you have recently joined the Django Fellowship team. Congrats on the new role!
Thanks! Let’s wait to hear from more voices. If the response is at least mildly positive in a couple weeks, I would open a ticket at django/new-features and be sure to show the current workarounds.
Although clearly documented, the pk auto-ordering of first() / last() always felt like a surprising side effect to me. Pretty much for the same reason as @andy-monroe points out above - I simply was not used to this from other ORMs (mainly SQLAlchemy and SQLObject, if someone still knows it…). If I am not wrong it is also the only occasion, where the ORM does this, so it is somewhat unexpected from knowing other ORM mechanics.
Instead of introducing yet another API method, wouldn’t it be better to revisit the reasons for the auto-ordering and, if possible, lift that behavior on first() / last()?
@jerch while I am personally in favor of that, this would result in a significant/breaking change for current behavior, which seems well worth avoiding.
For now, we could definitely update the docstring for QuerySet.first() to make the existing behavior much more readily discoverable for newcomers.