How do you avoid N+1 queries when not using the ORM, using sorl for photo thumbnails?

Hi folks,

I have a problem with N+1 querys when not using the Django ORM, where the solution isn’t obvious to me, and I figured asking here might help.

I have a sort of “shared address book” application in Django, with contains a Profile model, which has a photo attribute like so:


# abridged for readability
class Profile(models.Model):
    user = models.OneToOneField(settings.AUTH_USER_MODEL, on_delete=models.CASCADE)
    website = models.URLField(_("website"), max_length=200, blank=True, null=True)
    organisation = models.CharField(
        _("organisation"), max_length=254, blank=True, null=True
    )

    bio = models.TextField(_("bio"), blank=True, null=True)
    visible = models.BooleanField(_("visible"), default=False)
    location = models.CharField(_("location"), max_length=254, blank=True, null=True)
    photo = models.ImageField(
        _("photo"), blank=True, null=True, max_length=200, upload_to="photos"
    )

    # let users tag themselves, to show up in searches to find folks with similar interests / skills
    tags = TaggableManager(blank=True)

    # short_id is a unique identifier for a profile, used in the URL
    short_id = ShortUUIDField(length=8, unique=True, blank=True, null=True)

    # for tracking where this profile was imported from
    import_id = models.CharField(
        _("import_code"), max_length=254, blank=True, null=True
    )
    # we have these as cache columns because when we use object storage to 
    # store photos end up making loads of requests to AWS 
    # just to return an url
    _photo_url = models.URLField(blank=True, null=True)
    _photo_thumbnail_url = models.URLField(blank=True, null=True)
    _photo_detail_url = models.URLField(blank=True, null=True)

This model has two property methods like so, for convenience:


    @property
    def thumbnail_photo(self):
        if not self.photo:
            return None

        if self._photo_thumbnail_url:
            return self._photo_thumbnail_url

        return get_thumbnail(self.photo, "100x100", crop="center", quality=99).url

    @property
    def detail_photo(self):
        """
        A photo, designed for showing on a page, when viewing a profile,
        with a user's details

        """
        if not self.photo:
            return None

        if self._photo_detail_url:
            return self._photo_detail_url

        return get_thumbnail(self.photo, "250x250", crop="center", quality=99).url

In this case here, get_thumbnail is the method imported like this from the sorl library:

from sorl.thumbnail import get_thumbnail

The get_thumbnail method is documented on read the docs.

With me so far? Now for the view - I have a function based view which populates the view context with a bunch of profiles, using a method called fetch_profile_list.

The code looks like this:

def fetch_profile_list(request: HttpRequest, ctx: dict):
    """
    Fetch the list of profiles for the given set of params, and
    populate the provided context dictionary
    """
    filtered_profiles = ProfileFilter(
        request.GET, queryset=Profile.objects.all().prefetch_related("tags")
    )

    ctx["profile_filter"] = filtered_profiles

    pager = paginator.Paginator(filtered_profiles.qs, NO_PROFILES_PER_PAGE)
    page = request.GET.get("page", 1)

    try:
        ctx["paginated_profiles"] = pager.page(page)
    except paginator.PageNotAnInteger:
        ctx["paginated_profiles"] = pager.page(1)
    except paginator.EmptyPage:
        ctx["paginated_profiles"] = pager.page(paginator.num_pages)

    return ctx

@login_required
def homepage(request):
    current_site = get_current_site(request)
    logger.info(f"{current_site=}")

    ctx = {"is_authenticated": request.user.is_authenticated, "site": current_site}

    ctx = fetch_profile_list(request, ctx)

    if request.htmx:
        template_name = "pages/_home_partial.html"
    else:
        template_name = "pages/home.html"

    return render(request, template_name, ctx)

Finally, in the template hierarchy, we end up calling this template where we iterate through a list of profiles on a page:


{% for profile in paginated_profiles %}
  <div>
    <div>
      <div class="profile-thumb">
        {% if profile.photo %}
          <img class="profile-thumb" src="{{ profile.thumbnail_photo }}" />
        {% else %}
          {% gravatar profile.user.email 150 %}
        {% endif %}
      </div>
      <div>
          <p>
          {% for tag in profile.tags.all %}
            <span class="badge">{{ tag.name }}</span>
          {% endfor %}
        </p>
      </div>
    </div>
  </div>
{% endfor %}

How do I get rid of the N+1 query when fetching the photos?

I think this line with is the problem, causing a N+1 query:

<img class="profile-thumb" src="{{ profile.thumbnail_photo }}" />` 

Sentry at least is flagging this up as such, and I see the same tell-tale signs of the N+1 behaviour in django debug too.

The problem is, I’m not sure of the idiomatic way to get rid of N+1 queries in this scenario.

The select_related() and prefetch_related() methods don’t seem to be designed for this (they work with the ORM), and while I know it’s probably better to fetch this in one go in the view, I’m a little unsure of the approach to take.

Should I be trying to force the query to be executed in the view ahead of time for all the images, using some call to values() on a queryset, and relying on it being cached somehow?


# maybe something like this?
filtered_profiles = ProfileFilter(
        request.GET, queryset=Profile.objects.all().prefetch_related("tags").values(["fields, "I", "expect_to_use"])
)

pager = paginator.Paginator(filtered_profiles.qs)

Or is there another approach that’s more idiomatic?

Thanks in advance.

For those curious the link to the unabridged project code

The project is called constellate - it’s designed to support networking and skill discovery within communities which are not entirely private, but not entirely public either, like https://climateAction.tec.

The main use case it covers right now is creating a “shared address book” for people who are in the same slack channel, allowing them to sign into the app using their slack credentials, and see other extended profiles, of members who choose to make themselves visible.

See the Profile model code

See the home view code

See the template code calling the photo urls

Hey there!
It would be great to have the actual query that is being duplicated. You can figure that out using django-debug-toolbar and looking up on the SQL window.
With that, then you can bottle down to find which part of the code is emmiting that query, and if you have any way of addressing it.
Looking forward to help you on it, cheers.

Sure:

I think the repeated queries in the paginated profile are below:


SELECT thumbnail_kvstore.key, thumbnail_kvstore.value
FROM thumbnail_kvstore
WHERE thumbnail_kvstore.key = %s
LIMIT 21

and here:


SELECT users_user.id, users_user.password, users_user.last_login, users_user.is_superuser,
  users_user.username, users_user.first_name, users_user.last_name, users_user.email,
  users_user.is_staff, users_user.is_active, users_user.date_joined, users_user.name
FROM users_user
WHERE users_user.id = %s
LIMIT 21

Screenshot below:

Offscreen, you can see somethign like 124 hidden spans, which show the similar image thumbnail lookup queries

Right.
I had to dig a bit into the source code of the sorl library to understand it, as i haven’t used it yet.
But it looks like you’re using the CachedDbKvStore, this makes a query everytime a thumbnail is not present on the cache. And there’s no relation between this table and your user/profile model. So it won’t be easy to reduce the number of queries here.
My suggestion is that you try to setup a redis kv store, it will be probably faster and won’t do any additional queries. There’s aint much more documentation though, hope this helps anyway.

1 Like

Thanks Leandro,

I appreciate the time you took to look into this - I won’t lie, I was really hoping to introduce another component to the stack here, even if Redis woudl be a really sound choice for providng a cache.

Your response reminded me to sanity check whether I’m calling this code here to make sure that profiles have the cached versions of files available when creating Profile objects in the first place:

def update_thumbnail_urls(self):
        """Generate the thumbnails for a profile"""
        if self.photo:
            self._photo_url = self.photo.url
            self._photo_thumbnail_url = get_thumbnail(
                self.photo, "100x100", crop="center", quality=99
            ).url
            self._photo_detail_url = get_thumbnail(
                self.photo, "250x250", crop="center", quality=99
            ).url

            self.save(
                update_fields=[
                    "_photo_thumbnail_url",
                    "_photo_detail_url",
                    "_photo_url",
                ]
            )

This is a method on the profile model, and it’s something I’d ideally call whenever a profile is saved as well as when they’re imported for the first time.

It turns out I hadn’t made sure this was run when importing users for the first time, so even though we had this code here:

def thumbnail_photo(self):
    if not self.photo:
        return None

    if self._photo_thumbnail_url:
        return self._photo_thumbnail_url

    return get_thumbnail(self.photo, "100x100", crop="center", quality=99).url

Because self._photo_thumbnail_url was never set, we’d always need to do the expensive call to get_thumbnail() each time.

This was the source of all the n+1 queries, and storing the value on the model as _photo_thumbnail_url removed the need to hit any cache at all.

I’d likely need to make sure update_thumbnail_urls is called when saving the profile to ensure that the cached photos are always up to date, but broadly speaking, I think my problem is solved.

While I was there, I realised another source of the N+1 queries was me only prefetching one model here, can fetching info from the user model that was related to the Profile. So, even though I had this here - I was only pre_fetching the tag model:

filtered_profiles = ProfileFilter(
        request.GET, queryset=Profile.objects.all().prefetch_related("tags")
    )

Updating it to use this removed that other source of N+1 queries:

Profile.objects.all().prefetch_related("tags", "users")

We’ve gone from more than 120 queries, to about 13. Much better. Thanks for replying - I don’t think I would have fixed as quickly this without your response :+1:

Great to know that you found the root issue!

About this comment:

This is something that happens a lot and i think everyone have at least once shot itself on the foot with this. A good way to solve this is:

  • using the signals framework that comes built-in with django (i tend to avoid as much as i can); or
  • having this on your clean method, and making sure you always call full_clean before the save call. This comes from some tips from the HackSoftware styleguide that is really useful.

Cheers.