How to compute the average score for each user, by time interval?

All we need is an easy explanation of the problem, so here it is.

I have a table User that only has an id and a name. A user has many Games, which is a table that contains an id, a score, inserted_at and a user_id referencing the User table. I would like that my query returns the following result:

label user_id data
Jacques 1 {null, 1, 4, 3, null, 6, null, 8, 9, null, 2, 5}
Marine 2 {null, null, null, 4, 6, 2, 8, 6, 10, 10, 3, 8}

where label is the name of the user and data is the average score of each games, per user, per unit of time within a time interval (here it is per month for the last 12 months) from this query:

SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '11 month', LOCALTIMESTAMP, '1 month')) as time_unit)

which return this result:

"2021-07-01 00:00:00"
"2021-08-01 00:00:00"
"2021-09-01 00:00:00"
"2021-10-01 00:00:00"
"2021-11-01 00:00:00"
"2021-12-01 00:00:00"
"2022-01-01 00:00:00"
"2022-02-01 00:00:00"
"2022-03-01 00:00:00"
"2022-04-01 00:00:00"
"2022-05-01 00:00:00"
"2022-06-01 00:00:00"

The unit of time can be changed, in the above query, it is every months from the last 12 months, but it could be every months from the last 6 months, or every day for the last 2 weeks.

The NULL are important, because this data is then to be fed to Chart.js, so I need to know when a user hasn’t played a game and scored anything.

So far I have two attempts:

Attempt 1:

(SELECT time_unit, avg, u.name as user_name, u.id as user_id
FROM users u
CROSS JOIN (
    SELECT time_unit, AVG(ROUND(g.score, 1))
    FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '3 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
    LEFT JOIN games g
    ON DATE_TRUNC('month', g.inserted_at) = time_unit
    LEFT JOIN users u
    ON u.id = g.user_id
    GROUP BY time_unit, u.name, u.id
) as q1
ORDER By 1);

Which returns the roughly this result:

  time_unit | score | user_name | user_id
  ...
  "2021-05-01 00:00:00",  null,                 "Marine",     1
  "2021-05-01 00:00:00",  null,                 "Jacques",  2
  "2021-05-01 00:00:00",  null,                 "Jean",     3
  "2021-06-01 00:00:00",    6.0000000000000000, "Marine",     1
  "2021-06-01 00:00:00",    6.0000000000000000, "Jacques",  2
  "2021-06-01 00:00:00",    6.0000000000000000, "Jean",     3
  "2021-06-01 00:00:00",    7.8333333333333333, "Marine",     1
  "2021-06-01 00:00:00",    7.8333333333333333, "Jacques",  2
  "2021-06-01 00:00:00",    7.8333333333333333, "Jean",     3
  ...

As you can see, the results are wrong because all users have the same score, which is not the case in the database.

Attempt 2:

SELECT u.name AS label,
    u.id,
    ARRAY(
        SELECT AVG(ROUND(g.score))
        FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '1 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
        LEFT JOIN games g
        ON DATE_TRUNC('month', g.inserted_at) = time_unit 
        GROUP BY time_unit, user_id
    )
FROM games g
JOIN users u
ON u.id = g.user_id
GROUP BY u.id

returns this:

user_name | user_id | data
"Jean"  3   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Jacques"   2   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Marine"    1   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}

As you can see here too, all users have the same score …

Attempt 3:

WITH span AS (
  SELECT
    1 AS user_id,
    NULL AS avg,
    DATE(series) AS date
  FROM GENERATE_SERIES(DATE_TRUNC('month', NOW() - INTERVAL '11' MONTH), NOW(), '1 month') AS series
  UNION
  SELECT
    games.user_id, ROUND(AVG(games.score), 1) AS avg, DATE_TRUNC('month', games.inserted_at) AS date
  FROM games
  WHERE games.user_id = 1
  AND games.inserted_at > NOW()-INTERVAL '11' MONTH
  GROUP BY games.user_id, date
  ORDER BY date)
SELECT
  users.name AS name,
  ARRAY_AGG(s.avg) AS avg
FROM
  (SELECT user_id, ROUND(AVG(avg), 1) AS avg, date FROM span GROUP BY user_id, date ORDER BY date) AS s JOIN users ON s.user_id = users.id
GROUP BY
  name;

returns this:

user | data
"Marine"    {6.8,6.6,4.4,5.3,7.0,5.7,5.8,6.0,5.0,4.0,6.5,NULL}

but only works for one user.

How can I get the correct score for each user and how can I aggregate the result into an array so that I get the data in the proper format?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You can check a DB fiddle of the solution here (without a function just raw SQL): https://www.db-fiddle.com/f/fR2JtQKVcQzrnuZ9SfpcNS/0

For the solution, I am providing some dummy data to show some real output as I don’t have your data:

create table if not exists users (
    user_id serial primary key,
    name text
);

create table if not exists games (
    game_id serial primary key,
    user_id int references users(user_id),
    score numeric,
    inserted_at timestamptz
);

insert into users (name)
values ('5w3n'),('CCoX'),('FHWbb8f7'),('NhsX1XcP'),('OPP6A3j3'),('t4N'),('scdHS'),('Adve'),('yqy'),
       ('fALF'),('J77T'),('TAXQ5B3V'),('GQGtBXV'),('U2yx6o'),('4in74FhL'),('ZsoZs'),('9dn6'),
       ('ewhi3u6q'),('q18');

insert into games(user_id, score, inserted_at)
values  (5,589.53,'2022-01-01'),(9, 48.70,'2022-01-11'),(6, 27.05,'2022-02-01'),
        (1,952.81,'2022-02-23'),(11,23.80,'2022-02-15'),(5,732.60,'2022-03-02'),
        (7,460.05,'2022-03-09'),(4,402.98,'2022-03-29'),(7,375.06,'2022-03-30'),
        (3,325.03,'2022-04-12'),(2,363.32,'2022-04-15'),(12,34.60,'2022-04-19'),
        (6,370.70,'2022-04-22'),(6,462.08,'2022-05-01'),(2,575.39,'2022-05-02'),
        (13,14.56,'2022-05-03');

The function:

create or replace function get_user_averages(
    p_start_date timestamptz,
    p_end_date timestamptz default now(),
    p_interval interval default '1 month',
    p_truncate text default 'month'
)
  returns table(
    user_id int,
    name text,
    data numeric[]
  )
  language plpgsql
as
$$
begin
    return query
    select x.user_id, x.name, array_agg(x.average order by x.gen)
    from (
        select
            gen,
            u.user_id,
            u.name,
            round(avg(g.score) filter(where gen = date_trunc(p_truncate, g.inserted_at)) ,2) as average
        from generate_series(p_start_date, p_end_date, p_interval) gen
            cross join games g
            join users u on u.user_id = g.user_id
        group by gen, u.user_id, u.name
    ) x
    group by x.user_id, x.name
    order by x.user_id;
end;
$$;

This function allows you to input your own custom start and end date as well as choosing your own interval (it can be 1 month, 1 day etc.). The output of the array itself (data column) is intentionally ordered (so the array itself is ordered) by the generated dates in ascending order. It might not be a useful feature, however, considering you don’t have the dates available in the final output you won’t be able to know which average belongs to which generated date.

For this reason I have sorted the array so that your application code may be able to at least do something with the dates if it needs to. I would recommend trying to use a jsonb output instead of array where you would have a {month1:average1, month2:average2 ...} kind of output which may be more useful.

You have to match the p_interval and p_truncate parameters to get the correct output. As one is for date generation, the other specifies the truncation of the timestamptz of the real table’s inserted_at (so if you’re sending interval ‘1 month’, you should send month in p_truncate parameter)

Calling the function:

select * from get_user_averages('2022-01-01');

The result:

+-------+--------+-----------------------------------+
|user_id|name    |data                               |
+-------+--------+-----------------------------------+
|1      |5w3n    |{null,952.81,null,null,null,null}  |
|2      |CCoX    |{null,null,null,363.32,575.39,null}|
|3      |FHWbb8f7|{null,null,null,325.03,null,null}  |
|4      |NhsX1XcP|{null,null,402.98,null,null,null}  |
|5      |OPP6A3j3|{589.53,null,732.6,null,null,null} |
|6      |t4N     |{null,27.05,null,370.7,462.08,null}|
|7      |scdHS   |{null,null,417.56,null,null,null}  |
|9      |yqy     |{48.7,null,null,null,null,null}    |
|11     |J77T    |{null,23.8,null,null,null,null}    |
|12     |TAXQ5B3V|{null,null,null,34.6,null,null}    |
+-------+--------+-----------------------------------+

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply