Counting how many authors have at least one book

I’m looking to get both the following statistics by country code: number of authors and the number of authors that have at least one related book.

I’ve got something working but I’m curious if there’s a simpler or nicer way to do it.

Models

class Author(models.Model):
    name = fields.CharField(max_length=100)
    country_code_choices = [
        ('GB', 'United Kingdom'),
        ('US', 'United States'),
        # ...
    ]
    country_code = fields.CharField(max_length=2, choices=country_code_choices)
    
    
class Book(models.Model):
    author = fields.ForeignKey(Author, on_delete=models.CASCADE, related_name='books')
    title = fields.CharField(max_length=300)

Desired output

+--------------+-------+----------------+
| Country code | Count | Count has book |
+--------------+-------+----------------+
| GB           |    12 |              4 |
| US           |    50 |             30 |
+--------------+-------+----------------+

My attempt

author_book_subquery = Book.objects.filter(author_id=OuterRef('id'))
Author.objects.annotate(
    has_book=Exists(author_book_subquery)
).annotate(
    has_book_as_int=Case(
        When(has_book=True, then=Value(1)),
        default=Value(0),
        output_field=IntegerField(),
    )
).values(
    'country_code'
).annotate(
    num_authors=Count('*'),
    num_authors_with_book=Sum(has_book_as_int),
)

Any alternative ideas would be greatly appreciated.

You can use a conditional aggregation, e.g.

author_book_subquery = Book.objects.filter(author_id=OuterRef('id'))
Author.objects.values(
    'country_code'
).annotate(
    num_authors=Count('pk'),
    num_authors_with_book=Count('pk', filter=Exists(author_book_subquery)),
)
1 Like

Your suggestion looks a lot nicer! Unfortunately I’m getting the following error when trying to use it:

Traceback (most recent call last):
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 55, in inner
    response = get_response(request)
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 197, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/andrew/Code/derp/python/djangobooks/library/views.py", line 24, in authors_overview
    for row in author_stats:
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/query.py", line 320, in __iter__
    self._fetch_all()
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/query.py", line 1507, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/query.py", line 130, in __iter__
    for row in compiler.results_iter(
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1312, in results_iter
    results = self.execute_sql(
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1348, in execute_sql
    sql, params = self.as_sql()
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 573, in as_sql
    extra_select, order_by, group_by = self.pre_sql_setup()
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 64, in pre_sql_setup
    self.setup_query()
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 55, in setup_query
    self.select, self.klass_info, self.annotation_col_map = self.get_select()
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 284, in get_select
    sql, params = self.compile(col)
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 494, in compile
    sql, params = vendor_impl(self, self.connection)
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/expressions.py", line 25, in as_sqlite
    sql, params = self.as_sql(compiler, connection, **extra_context)
  File "/home/andrew/Code/derp/python/djangobooks/venv/lib/python3.8/site-packages/django/db/models/aggregates.py", line 118, in as_sql
    return sql, params + filter_params
TypeError: can only concatenate list (not "tuple") to list

Any ideas what’s causing that?

Notice the section in your code that is flagged by that error. We would need to see that section of the code in your view.

Here’s the view code that causes the error:


def authors_overview(request):
    author_book_subquery = Book.objects.filter(author_id=OuterRef('id'))
    author_stats = Author.objects.values(
        'country_code'
    ).annotate(
        num_authors=Count('pk'),
        num_authors_with_book=Count('pk', filter=Exists(author_book_subquery)),
    )

    output = ''
    for row in author_stats:
        output += f'{row["country_code"]}: {row["num_authors"]}, {row["num_authors_with_book"]}\n'
    return HttpResponse(output, content_type='text/plain')

This is a bug in Django, fixed in (Django 4.1+):

What if you wrap it in Q()?

        num_authors_with_book=Count('pk', filter=Q(Exists(author_book_subquery))),
1 Like

Would changing the expression to an isnull test help?

e.g.

num_authors_with_book=Count('pk', filter=Q(book__isnull=False))),

or would that generate the same query?

@felixxm Wrapping with a Q seems to work!

@KenWhitesell I tried this method before going with the Exists approach. If an Author has more than one book then it causes that Author to be counted more than once. Adding a distinct doesn’t seem to help either.

author_stats = Author.objects.values(
    'country_code'
).annotate(
    num_authors=Count('id'),
    num_authors_with_book=Count('pk', filter=Q(books__isnull=False)),
).distinct()

The Q(Exists(subquery)) approach worked in my test project on 4.0.3 but unfortunately errors my Django 2.2.27 project. Here’s the error I get in 2.2.27:

Traceback (most recent call last):
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/core/handlers/exception.py", line 34, in inner
    response = get_response(request)
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 115, in _get_response
    response = self.process_exception_by_middleware(e, request)
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/core/handlers/base.py", line 113, in _get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/home/andrew/Code/derp/python/djangobooks22/library/views.py", line 10, in authors_overview
    author_stats = Author.objects.values(
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/query.py", line 1056, in annotate
    clone.query.add_annotation(annotation, alias, is_summary=False)
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/sql/query.py", line 999, in add_annotation
    annotation = annotation.resolve_expression(self, allow_joins=True, reuse=None,
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/aggregates.py", line 48, in resolve_expression
    c.filter = c.filter and c.filter.resolve_expression(query, allow_joins, reuse, summarize)
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/query_utils.py", line 93, in resolve_expression
    clause, joins = query._add_q(self, reuse, allow_joins=allow_joins, split_subq=False)
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1315, in _add_q
    child_clause, needed_inner = self.build_filter(
  File "/home/andrew/Code/derp/python/djangobooks22/venv/lib/python3.8/site-packages/django/db/models/sql/query.py", line 1187, in build_filter
    arg, value = filter_expr
TypeError: cannot unpack non-iterable Exists object

Is there another approach that will work in 2.2.27?

I don’t think so, it needs an extra annotation in Django < 3.0, because support for using boolean expressions was added in Django 3.0:

https://docs.djangoproject.com/en/stable/releases/3.0/#filter-expressions

You should also take into account that support for Django 2.2 ends in April, 2022.

Okay. I think I’ll go with this approach as a stop gap then:

author_book_subquery = Book.objects.filter(author_id=OuterRef('id'))
author_stats = Author.objects.annotate(
    has_book=Exists(author_book_subquery),
).values(
    'country_code'
).annotate(
    num_authors=Count('id'),
    num_authors_with_book=Count('id', filter=Q(has_book=True)),
)

Thanks so much @felixxm and @KenWhitesell for your help and all your contributions to Django!

P.S. I’ll get upgraded off 2.2 soon, thanks for the heads up :slight_smile: