# How to compute the average score for each user, by time interval?

## All we need is an easy explanation of the problem, so here it is.

I have a table `User` that only has an `id` and a `name`. A user has many `Game`s, which is a table that contains an `id`, a `score`, `inserted_at` and a `user_id` referencing the `User` table. I would like that my query returns the following result:

label user_id data
Jacques 1 {null, 1, 4, 3, null, 6, null, 8, 9, null, 2, 5}
Marine 2 {null, null, null, 4, 6, 2, 8, 6, 10, 10, 3, 8}

where `label` is the name of the user and `data` is the average score of each games, per user, per unit of time within a time interval (here it is per month for the last 12 months) from this query:

``````SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '11 month', LOCALTIMESTAMP, '1 month')) as time_unit)
``````

which return this result:

``````"2021-07-01 00:00:00"
"2021-08-01 00:00:00"
"2021-09-01 00:00:00"
"2021-10-01 00:00:00"
"2021-11-01 00:00:00"
"2021-12-01 00:00:00"
"2022-01-01 00:00:00"
"2022-02-01 00:00:00"
"2022-03-01 00:00:00"
"2022-04-01 00:00:00"
"2022-05-01 00:00:00"
"2022-06-01 00:00:00"
``````

The unit of time can be changed, in the above query, it is every months from the last 12 months, but it could be every months from the last 6 months, or every day for the last 2 weeks.

The `NULL` are important, because this data is then to be fed to Chart.js, so I need to know when a user hasn’t played a game and scored anything.

So far I have two attempts:

Attempt 1:

``````(SELECT time_unit, avg, u.name as user_name, u.id as user_id
FROM users u
CROSS JOIN (
SELECT time_unit, AVG(ROUND(g.score, 1))
FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '3 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
LEFT JOIN games g
ON DATE_TRUNC('month', g.inserted_at) = time_unit
ON u.id = g.user_id
GROUP BY time_unit, u.name, u.id
) as q1
ORDER By 1);
``````

Which returns the roughly this result:

``````  time_unit | score | user_name | user_id
...
"2021-05-01 00:00:00",  null,                 "Marine",     1
"2021-05-01 00:00:00",  null,                 "Jacques",  2
"2021-05-01 00:00:00",  null,                 "Jean",     3
"2021-06-01 00:00:00",    6.0000000000000000, "Marine",     1
"2021-06-01 00:00:00",    6.0000000000000000, "Jacques",  2
"2021-06-01 00:00:00",    6.0000000000000000, "Jean",     3
"2021-06-01 00:00:00",    7.8333333333333333, "Marine",     1
"2021-06-01 00:00:00",    7.8333333333333333, "Jacques",  2
"2021-06-01 00:00:00",    7.8333333333333333, "Jean",     3
...
``````

As you can see, the results are wrong because all users have the same score, which is not the case in the database.

Attempt 2:

``````SELECT u.name AS label,
u.id,
ARRAY(
SELECT AVG(ROUND(g.score))
FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '1 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
LEFT JOIN games g
ON DATE_TRUNC('month', g.inserted_at) = time_unit
GROUP BY time_unit, user_id
)
FROM games g
ON u.id = g.user_id
GROUP BY u.id
``````

returns this:

``````user_name | user_id | data
"Jean"  3   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Jacques"   2   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Marine"    1   {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
``````

As you can see here too, all users have the same score …

Attempt 3:

``````WITH span AS (
SELECT
1 AS user_id,
NULL AS avg,
DATE(series) AS date
FROM GENERATE_SERIES(DATE_TRUNC('month', NOW() - INTERVAL '11' MONTH), NOW(), '1 month') AS series
UNION
SELECT
games.user_id, ROUND(AVG(games.score), 1) AS avg, DATE_TRUNC('month', games.inserted_at) AS date
FROM games
WHERE games.user_id = 1
AND games.inserted_at > NOW()-INTERVAL '11' MONTH
GROUP BY games.user_id, date
ORDER BY date)
SELECT
users.name AS name,
ARRAY_AGG(s.avg) AS avg
FROM
(SELECT user_id, ROUND(AVG(avg), 1) AS avg, date FROM span GROUP BY user_id, date ORDER BY date) AS s JOIN users ON s.user_id = users.id
GROUP BY
name;
``````

returns this:

``````user | data
"Marine"    {6.8,6.6,4.4,5.3,7.0,5.7,5.8,6.0,5.0,4.0,6.5,NULL}
``````

but only works for one user.

How can I get the correct score for each user and how can I aggregate the result into an `array` so that I get the data in the proper format?

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

You can check a DB fiddle of the solution here (without a function just raw SQL): https://www.db-fiddle.com/f/fR2JtQKVcQzrnuZ9SfpcNS/0

For the solution, I am providing some dummy data to show some real output as I don’t have your data:

``````create table if not exists users (
user_id serial primary key,
name text
);

create table if not exists games (
game_id serial primary key,
user_id int references users(user_id),
score numeric,
inserted_at timestamptz
);

insert into users (name)
('fALF'),('J77T'),('TAXQ5B3V'),('GQGtBXV'),('U2yx6o'),('4in74FhL'),('ZsoZs'),('9dn6'),
('ewhi3u6q'),('q18');

insert into games(user_id, score, inserted_at)
values  (5,589.53,'2022-01-01'),(9, 48.70,'2022-01-11'),(6, 27.05,'2022-02-01'),
(1,952.81,'2022-02-23'),(11,23.80,'2022-02-15'),(5,732.60,'2022-03-02'),
(7,460.05,'2022-03-09'),(4,402.98,'2022-03-29'),(7,375.06,'2022-03-30'),
(3,325.03,'2022-04-12'),(2,363.32,'2022-04-15'),(12,34.60,'2022-04-19'),
(6,370.70,'2022-04-22'),(6,462.08,'2022-05-01'),(2,575.39,'2022-05-02'),
(13,14.56,'2022-05-03');
``````

The function:

``````create or replace function get_user_averages(
p_start_date timestamptz,
p_end_date timestamptz default now(),
p_interval interval default '1 month',
p_truncate text default 'month'
)
returns table(
user_id int,
name text,
data numeric[]
)
language plpgsql
as
\$\$
begin
return query
select x.user_id, x.name, array_agg(x.average order by x.gen)
from (
select
gen,
u.user_id,
u.name,
round(avg(g.score) filter(where gen = date_trunc(p_truncate, g.inserted_at)) ,2) as average
from generate_series(p_start_date, p_end_date, p_interval) gen
cross join games g
group by gen, u.user_id, u.name
) x
group by x.user_id, x.name
order by x.user_id;
end;
\$\$;
``````

This function allows you to input your own custom start and end date as well as choosing your own interval (it can be 1 month, 1 day etc.). The output of the array itself (`data` column) is intentionally ordered (so the array itself is ordered) by the generated dates in ascending order. It might not be a useful feature, however, considering you don’t have the dates available in the final output you won’t be able to know which average belongs to which generated date.

For this reason I have sorted the array so that your application code may be able to at least do something with the dates if it needs to. I would recommend trying to use a jsonb output instead of array where you would have a `{month1:average1, month2:average2 ...}` kind of output which may be more useful.

You have to match the `p_interval` and `p_truncate` parameters to get the correct output. As one is for date generation, the other specifies the truncation of the timestamptz of the real table’s `inserted_at` (so if you’re sending interval ‘1 month’, you should send `month` in `p_truncate` parameter)

Calling the function:

``````select * from get_user_averages('2022-01-01');
``````

The result:

``````+-------+--------+-----------------------------------+
|user_id|name    |data                               |
+-------+--------+-----------------------------------+
|1      |5w3n    |{null,952.81,null,null,null,null}  |
|2      |CCoX    |{null,null,null,363.32,575.39,null}|
|3      |FHWbb8f7|{null,null,null,325.03,null,null}  |
|4      |NhsX1XcP|{null,null,402.98,null,null,null}  |
|5      |OPP6A3j3|{589.53,null,732.6,null,null,null} |
|6      |t4N     |{null,27.05,null,370.7,462.08,null}|
|7      |scdHS   |{null,null,417.56,null,null,null}  |
|9      |yqy     |{48.7,null,null,null,null,null}    |
|11     |J77T    |{null,23.8,null,null,null,null}    |
|12     |TAXQ5B3V|{null,null,null,34.6,null,null}    |
+-------+--------+-----------------------------------+
``````

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂