using annotate to return number of users between two dates

Hi,

I’m having trouble trying to calculate the number of users between today and yesterday and then i will present this is a table along with the project name, current count and change daily.

I’m thinking that annotate is the correct way to do this so that the processing is executed on the SQL?

The model:

class Usergrowth(models.Model):
    project = models.ForeignKey(Project, on_delete=models.CASCADE)
    capture_date = models.DateField(null=False)
    user_count = models.IntegerField(null=False)

    def __str__(self):
        return str(self.project)

The function:

    yesterday = Usergrowth.objects.filter(
        project=F('project'), 
        capture_date=(current_datetime) - timedelta(days=1)
        ).values('user_count')

    user_count = Usergrowth.objects.annotate(
        user_count_yesterday = yesterday,
        difference = ('user_count' - F('user_count_yesterday')),output_field=models.IntegerField()
    )

But this is throwing:
TypeError: QuerySet.annotate() received non-expression(s): <django.db.models.fields.IntegerField>.

Where am i going wrong? Probably in loads of places :frowning:

Thanks

Tom.

Unfortunately, you’re right.

Let’s start with the first issue.

capture_date is a DateField. That means that it’s not timezone aware. However, a “current_datetime” is likely going to be timezone aware. That means you may not be looking at the proper rows to get these counts.

Then

F(‘project’) is going to retrieve the value of the project field of the model. It’s then going to compare that value to the project field of that same model.
They’re always going to be the same. That filter isn’t going to filter out any rows.

In the line:

Your reference to 'user_count' is incorrect here. This is a field within the model that you’re trying to access, and so you need to use the F function to identify it as such.

On the other hand, you’re annotating the user_count query with a value for user_count_yesterday, and then using that field for the annotation for difference. You could simply use your value for yesterday in the calculation for difference.

Finally, I think you’re trying to use the query defined for yesterday as a subquery - it needs to be written as such if you want this to be done as a single query. (Otherwise, what you have here are two separate queries - the first to get yesterday’s user_count and the second to calculate the difference.) It depends upon whether you’re trying to run this as one query for multiple projects, or if this is one query for one project. The intent is not clear here.

Thanks for the reply, Ken.

All I’m really after is to have the difference between yesterday’s and today’s users available within a single queryset so that I can present the data in a table showing all projects (project name, current count (value as of today)), and the difference. Basically just trying to show the increase or decrease of users day on day.

Does it make sense to do this all in a single query? The table for this particular view is going to show all projects, not just 1, but it might be that later on that it could be for a single project view.

Thanks

Tom.

Hi, Ken,

Am i right in thinking that i would use this as the subquery (once fixed) within annotate

something like Project.objects.annotate(...=Subquery(yesterday...)) ?

I don’t know - it’s not clear from what you’ve posted so far what your ultimate objective is for this query.

What do you want your final results to be? (What do you want the output to be?)

Ignore that last post, I am going over the docs, trying different things and getting myself in a mess.

This is my current approach which I feel is getting closer to what I need, which is the Sum of user_count minus yesterdays user_count. I’d like this as a parameter delta within the queryset for every project.

I’ve taken this approach to setting the time zone to be consistent.

def get_london_time():
    tz = pytz.timezone('Europe/London')
    return datetime.datetime.now(tz)

which i call using
current_date = get_london_time().date()
i then get yesterdays date using
yesterday = current_date - timedelta(days=1)

Using this i can get the delta parameter added:

    whatdoesthisreturn = Usergrowth.objects.filter(
        capture_date=yesterday
    ).values('user_count').annotate(delta=Sum('user_count')).values()
    
    print(whatdoesthisreturn)

but what I can’t do is minus yesterday’s user_count from today’s user_count within the single annotate.

For the purpose of my testing, I was just using Sum('user_account) to see what was being returned. But what I need is annotate(delta=Sum(user_account - #how do i get yesterday count here#)

Not sure if that is any clearer.

Thanks

Tom

Sorry, it’s still not really any clearer to me what your objective is.

Let’s start from an example. Let’s say that Usergrowth has the following data in it

project  |	 capture_date  |  user_count
1        |    2023-01-01   |        1
1        |    2023-01-02   |        2
1        |    2023-01-03   |        4
1        |    2023-01-04   |        8
2        |    2023-01-01   |        2
2        |    2023-01-02   |        3
2        |    2023-01-03   |        7
2        |    2023-01-04   |       19

What should your output be from your query if “today” is 2023-01-03?

What should your output be from your query if “today” is 2023-01-04?

Ok sorry, i wasn’t clear.
So what you have here is how my data looks in the database.

Now within my template, i have a datatable that shows:

Project Name | Users | **want Delta here**
Project 1    | 8     | 4
Project 2    |19     |12

Delta would show for project 1 the most recent count 8 minus yesterdays 4 to give me a difference of 4?

Great! So, the situation is that there’s no aggregation involved. There’s no need for the Sum function anywhere in this.

For the purposes of this discussion, I’m going to assume you’ve already figured out today and yesterday. So, for each Project, I want today’s user_count and today’s user_count - yesterday’s user_count.

This is going to be a query based on Usergrowth.

Usergrowth.objects.filter(capture_date=today)
will give us today’s count (among the rest of the data).

Usergrowth.objects.filter(capture_date=yesterday)
will give yesterday’s count. This is what we’re going to (basically) use as the subquery. Additionally, we need to match this to the project in the “outer” query, which is where the OuterRef function comes into play. So the subquery is going to look like:
Usergrowth.objects.filter(project=OuterRef('project'), capture_date=yesterday)
We then only want the user_count field, so we’ll use the values function to specify the data element to return.

Put it all together and…

Usergrowth.objects.filter(capture_date=today).annotate(
    yesterday_count=Subquery(
        Usergrowth.objects.filter(
            project=OuterRef('project'), capture_date=yesterday
        ).values('user_count')
    ),
    difference=F('user_count') - F('yesterday_count')
)
1 Like

Amazing. Thanks, Ken.

1 Like