Annotate duration/integer -> MySql error

Hello all,

models.py:

class User(models.Model):
    email = models.EmailField(...)
    created_datetime = models.DatetimeField(...)

class UserLog(models.Model):
    created_datetime = models.DatetimeField(...)
    user = models.ForeignKey('user.User' ...)
    login = models.BooleanField('Log in' ..)

And the following query:

users = User.objects.filter(
    Q(...)
).annotate(
    login_count=SubqueryCount(
        UserLog.objects.filter(
            user__pk=OuterRef('pk'),
            login=True,
        ),
       output_field=models.IntegerField(),
    ),
    login_frequency=ExpressionWrapper(
        (Now() - F('created_datetime')) / F('login_count'),
        output_field=models.DurationField()
    ),
)

SubqueryCount being this:

class SubqueryCount(Subquery):
    template = "(SELECT COUNT(*) FROM (%(subquery)s) _count)"

The error is:

MySQLdb._exceptions.ProgrammingError: (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '/ (SELECT COUNT(*)

The problematic generated SQL seems to be this:

(
  INTERVAL TIMESTAMPDIFF(
    MICROSECOND,
    `user_user`.`created_datetime`,
    CURRENT_TIMESTAMP
  ) MICROSECOND / (
    SELECT
      COUNT(*)
    FROM
      (
        SELECT
          U0.`id`
        FROM
          `user_userlog` U0
        WHERE
          (
            U0.`user_id` = `user_user`.`id`
            AND U0.`login`
          )
      ) _count
  )
) AS `login_frequency`,

Django 3.2.12, Python 3.9.3, Mysql 8.0.28

Unless there’s more to this issue than what you’re showing, you’re making this a lot harder than it needs to be.

You should be able to do something like:

users = User.objects.filter(...).annotate(login_count=Count('userlog', filter=Q(userlog__login=True))).annotate(login_frequency=...)

See the docs and examples at Aggregation | Django documentation | Django

mm using Count does not fix the issue (the counts themselves are ok in both cases):

(
  INTERVAL TIMESTAMPDIFF(
    MICROSECOND,
    `user_user`.`created_datetime`,
    CURRENT_TIMESTAMP
  ) MICROSECOND / COUNT(
    CASE WHEN `user_userlog`.`login` THEN `user_userlog`.`id` ELSE NULL END
  )
) AS `login_frequency`,

Same error: Mysql seems not to like the line ) MICROSECOND / COUNT(

Not that it does you any good, but this works fine on PostgreSQL.

To try and narrow it down some, I’d try experiment a bit with that expression. (For example, removing the division so that you’re just calculating the difference. Or maybe replacing one or more of the field references with other variables - anything to try and change what’s being generated.)

Quick update:

I manually replaced INTERVAL TIMESTAMPDIFF( with TIMESTAMPDIFF( and it works.

So, could it be django generating something problematic?

Possibly?

From my cursory reading of the MySQL docs, the formation of the query may more likely be this:

It’s possible that you may want to try adding an extra set of parens around that expression.

e.g.

By the way, that expression appears to be coming from django.db.backends.mysql.operations, the format_for_duration_arithmetic method within the DatabaseOperations class.

hehe was looking at that def right now as well :slight_smile:

trying the () wrapping… (edit) and no, doesn’t work.

Not that you necessarily want to do this, but the only other thing I can think of would be to divide this into two expressions.
e.g.

users = User.objects.filter(
    Q(...)
).annotate(
    login_count=SubqueryCount(
        UserLog.objects.filter(
            user__pk=OuterRef('pk'),
            login=True,
        ),
       output_field=models.IntegerField(),
    ),
    login_duration=ExpressionWrapper(
        (Now() - F('created_datetime')),
        output_field=models.DurationField()
    ),
).annotate(
    login_frequency=ExpressionWrapper(
        F('login_duration') / F('login_count'),
        output_field=models.DurationField()
    )
)

(Sorry I can’t be any real help here - this gets into areas well outside my realm of knowledge.)

alas does not work. Nice idea though.

Another thing that generally bothers me is … when an annotation is created, for example:

login_count=Count('userlog', filter=Q(userlog__login=True))
COUNT(
    CASE WHEN `user_userlog`.`login` THEN `user_userlog`.`id` ELSE NULL END
  ) AS `login_count`,

why does django not re-use the SQL variable login_count in subsequent annotations? Instead it re-runs the complete expression COUNT().

I am sure there is a good reason for this, but I would love to understand it… Or I have been using annotations wrongly :slight_smile:

That is an interesting question. I could make some guesses, but that’s all it would be.

(Idle thought - I wonder if that’s a “Django-level” situation or specific to MySQL. I’ll have to see if I can try this on either / both PostgreSQL and Sqlite.)

or a WHERE versus HAVING thing?

1 Like

interesting. Running a similar query on SQLlite yields:

class Person(models.Model):
    created_datetime = models.DateTimeField('Created at', null=False, blank=False, default=timezone.now)
    name = models.CharField('Name', max_length=128, null=True, blank=True)
    parent = models.ForeignKey('self', null=True, blank=True, on_delete=models.CASCADE)
    
    def __str__(self):
        return self.name

class PersonLog(models.Model):
    person = models.ForeignKey('person.Person', null=True, blank=True, on_delete=models.CASCADE)
    created_datetime = models.DateTimeField('Created at', null=False, blank=False, default=timezone.now)
    login = models.BooleanField('Login', null=True, blank=True)
    
    def __str__(self):
        return self.person.name

the query:

persons = Person.objects.all().annotate(
    log_login_count=Count(
        'personlog',
        filter=Q(personlog__login=True)
    ),
    log_login_latest_datetime=Subquery(
        PersonLog.objects.filter(
            person__pk=OuterRef('pk'),
            login=True
        ).order_by(
            '-created_datetime'
        ).values(
            'created_datetime'
        )[:1],
        output_field=models.DateTimeField(),
    ),
    # log_login_duration_over=F('log_login_latest_datetime') - F('created_datetime'),
    log_login_duration_over=Now() - F('created_datetime'),
    log_login_frequency=ExpressionWrapper(
        F('log_login_duration_over') / F('log_login_count'),
        output_field=models.DurationField(),
    ),
).values()

… and the resulting SQL:

SELECT
  "person_person"."id",
  "person_person"."created_datetime",
  "person_person"."name",
  "person_person"."parent_id",
  COUNT("person_personlog"."id") FILTER (
    WHERE
      "person_personlog"."login"
  ) AS "log_login_count",
  (
    SELECT
      U0."created_datetime"
    FROM
      "person_personlog" U0
    WHERE
      (
        U0."login"
        AND U0."person_id" = ("person_person"."id")
      )
    ORDER BY
      U0."created_datetime" DESC
    LIMIT
      1
  ) AS "log_login_latest_datetime",
  django_timestamp_diff(
    CURRENT_TIMESTAMP,
    "person_person"."created_datetime"
  ) AS "log_login_duration_over",
  (
    django_format_dtdelta(
      '/',
      django_timestamp_diff(
        CURRENT_TIMESTAMP,
        "person_person"."created_datetime"
      ),
      COUNT("person_personlog"."id") FILTER (
        WHERE
          "person_personlog"."login"
      )
    )
  ) AS "log_login_frequency"
FROM
  "person_person"
  LEFT OUTER JOIN "person_personlog" ON (
    "person_person"."id" = "person_personlog"."person_id"
  )
GROUP BY
  "person_person"."id",
  "person_person"."created_datetime",
  "person_person"."name",
  "person_person"."parent_id",
  django_timestamp_diff(
    CURRENT_TIMESTAMP,
    "person_person"."created_datetime"
  )

Apparently variables are not re-used, and re-computed each time.

Oh and it works., frequency get output as expected…

I really fail to understand what I am doing wrong. Is there a way to raise this to the django team so they can chime in?

managed to make it work, but I do believe this is ugly…? :slight_smile:

login_frequency=Cast(
    ExpressionWrapper(
        Cast(F('login_duration_over'), output_field=models.BigIntegerField()) / F('login_count'),
        output_field=models.BigIntegerField()
    ),
    output_field=models.DurationField()
),

This forces the db-side division to use bigints for the arithmetic operation. Then it all gets wrapped in a DurationField to get a timedelta client-side.

There has to be a better way! :stuck_out_tongue:

the resulting SQL gets output:

CAST(
  (
    CAST(
      TIMESTAMPDIFF(
        MICROSECOND,
        `user_user`.`created_datetime`,
        (
          SELECT
            U0.`created_datetime`
          FROM
            `user_userlog` U0
          WHERE
            (
              U0.`user_id` = `user_user`.`id`
              AND U0.`login`
            )
          ORDER BY
            U0.`created_datetime` DESC
          LIMIT
            1
        )
      ) AS signed integer
    ) / COUNT(
      CASE WHEN `user_userlog`.`login` THEN `user_userlog`.`id` ELSE NULL END
    )
  ) AS signed integer
) AS `login_frequency`,

You could open up a ticket - that’s the most sure way that someone would take a look at it.