Two almost same queries, but one takes 50x more time

Hello,

I have added new endpoint to my API which returns list of articles and list of videos. The data model for both is basically the same - title, link, timestamps and FK to website model (for articles) or YouTube channel (for videos). I am making sure to prefetch these to avoid N+1.

While checking with the Django Debug Toolbar, I initially noticed that the query to fetch articles takes like 3ms but the videos one took 170ms. The only difference I was able to identify was index on is_visible on articles, which I did not have on videos for similar property. After adding it, the videos query dropped to 45ms, which is big improvement but still slow compared to few ms for the articles. See the image.

Any tips where to look for issues? I am caching this so it has very little real world impact but I am curious what is going on.

The videos have couple more filters related to the avatar_url but I tried temporarily removing it and I did not help things.

I also have separate list endpoints for articles and videos which have basically the same queries and the discrepancy exists there also. The articles one is like 5ms while the videos one is 130ms.

Below is my quite MVP code :smiley:

@staticmethod
    def fetch_data(parsed_languages):
        videos_queryset = Video.objects.valid().select_related('feed').exclude(feed__avatar_url__isnull=True).exclude(
            feed__avatar_url__exact="")
        filtered_videos = None

        if parsed_languages:
            filtered_videos = videos_queryset.filter(feed__language_code__in=parsed_languages)

        if not filtered_videos:
            filtered_videos = videos_queryset.filter(feed__language_code__iexact="en")

        news_queryset = NewsItem.objects.filter(is_visible=True, website__is_enabled=True).exclude(
            social_preview_url__exact="").select_related('website')
        filtered_news = None

        if parsed_languages:
            filtered_news = news_queryset.filter(website__language_code__in=parsed_languages)

        if not filtered_news:
            filtered_news = news_queryset.filter(website__language_code__iexact="en")

        return {
            'news': NewsItemSerializer(filtered_news[:30], many=True).data,
            'video': VideoSerializer(filtered_videos[:30], many=True).data
        }

What does an EXPLAIN show for these two queries? That’s generally the best first place to check when trying to identify a performance problem with a query.

Without seeing an explain, I’m going to guess that it’s the AND NOT (... IS NULL) and the AND NOT ... (... AND ... IS NOT NULL) clauses being the “expensive” operations here.

Knowing the cardinality of these models would help as well. If there’s an order-of-magnitude difference in sizes of these modules, that would explain some of this, too.

Or, you may see a benefit by adding indexes on the YouTubeFeed.is_visible and Video.visible fields.

Do you mean this explain from the Debug Toolbar or something else?

This is for the videos, the explain for the news doesn’t have the third step on bottom, but this is first time I am checking this explain.

I’ve never used that, I’ve always used the EXPLAIN (and EXPLAIN ANALYZE) command in psql. I’m guessing it is probably able to get the same information.

I think I managed to find it. I noticed that the created field for articles had DB index while the published property on videos did not. After adding the index, the queries are now idential speed-wise. :exploding_head:

Seems like I have some studying to do around indexes. I know about N+1 and similar but would never guess that ordering can cause such difference.

1 Like

Depending upon the cardinality of the tables involved, indexing is almost always the first thing to consider to avoid full-table scans in queries.

1 Like