All we need is an easy explanation of the problem, so here it is.
I have a table User
that only has an id
and a name
. A user has many Game
s, which is a table that contains an id
, a score
, inserted_at
and a user_id
referencing the User
table. I would like that my query returns the following result:
label | user_id | data |
---|---|---|
Jacques | 1 | {null, 1, 4, 3, null, 6, null, 8, 9, null, 2, 5} |
Marine | 2 | {null, null, null, 4, 6, 2, 8, 6, 10, 10, 3, 8} |
where label
is the name of the user and data
is the average score of each games, per user, per unit of time within a time interval (here it is per month for the last 12 months) from this query:
SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '11 month', LOCALTIMESTAMP, '1 month')) as time_unit)
which return this result:
"2021-07-01 00:00:00"
"2021-08-01 00:00:00"
"2021-09-01 00:00:00"
"2021-10-01 00:00:00"
"2021-11-01 00:00:00"
"2021-12-01 00:00:00"
"2022-01-01 00:00:00"
"2022-02-01 00:00:00"
"2022-03-01 00:00:00"
"2022-04-01 00:00:00"
"2022-05-01 00:00:00"
"2022-06-01 00:00:00"
The unit of time can be changed, in the above query, it is every months from the last 12 months, but it could be every months from the last 6 months, or every day for the last 2 weeks.
The NULL
are important, because this data is then to be fed to Chart.js, so I need to know when a user hasn’t played a game and scored anything.
So far I have two attempts:
Attempt 1:
(SELECT time_unit, avg, u.name as user_name, u.id as user_id
FROM users u
CROSS JOIN (
SELECT time_unit, AVG(ROUND(g.score, 1))
FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '3 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
LEFT JOIN games g
ON DATE_TRUNC('month', g.inserted_at) = time_unit
LEFT JOIN users u
ON u.id = g.user_id
GROUP BY time_unit, u.name, u.id
) as q1
ORDER By 1);
Which returns the roughly this result:
time_unit | score | user_name | user_id
...
"2021-05-01 00:00:00", null, "Marine", 1
"2021-05-01 00:00:00", null, "Jacques", 2
"2021-05-01 00:00:00", null, "Jean", 3
"2021-06-01 00:00:00", 6.0000000000000000, "Marine", 1
"2021-06-01 00:00:00", 6.0000000000000000, "Jacques", 2
"2021-06-01 00:00:00", 6.0000000000000000, "Jean", 3
"2021-06-01 00:00:00", 7.8333333333333333, "Marine", 1
"2021-06-01 00:00:00", 7.8333333333333333, "Jacques", 2
"2021-06-01 00:00:00", 7.8333333333333333, "Jean", 3
...
As you can see, the results are wrong because all users have the same score, which is not the case in the database.
Attempt 2:
SELECT u.name AS label,
u.id,
ARRAY(
SELECT AVG(ROUND(g.score))
FROM (SELECT DATE_TRUNC('month', generate_series(LOCALTIMESTAMP - INTERVAL '1 year', LOCALTIMESTAMP, '1 month')) as time_unit) as time_unit_query
LEFT JOIN games g
ON DATE_TRUNC('month', g.inserted_at) = time_unit
GROUP BY time_unit, user_id
)
FROM games g
JOIN users u
ON u.id = g.user_id
GROUP BY u.id
returns this:
user_name | user_id | data
"Jean" 3 {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Jacques" 2 {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
"Marine" 1 {6,5.222222222222222,2.625,7.833333333333333,4.6,5.111111111111111,NULL,6,6.5,6.888888888888889,6.545454545454546,4.125,5.666666666666667,6.666666666666667,4,7,5,5.75,4.428571428571429,5.545454545454546,5.111111111111111,5.25,4.375,6.181818181818182,5.083333333333333,6,5.666666666666667,5.857142857142857,5.8,6.625,6.875,5.125}
As you can see here too, all users have the same score …
Attempt 3:
WITH span AS (
SELECT
1 AS user_id,
NULL AS avg,
DATE(series) AS date
FROM GENERATE_SERIES(DATE_TRUNC('month', NOW() - INTERVAL '11' MONTH), NOW(), '1 month') AS series
UNION
SELECT
games.user_id, ROUND(AVG(games.score), 1) AS avg, DATE_TRUNC('month', games.inserted_at) AS date
FROM games
WHERE games.user_id = 1
AND games.inserted_at > NOW()-INTERVAL '11' MONTH
GROUP BY games.user_id, date
ORDER BY date)
SELECT
users.name AS name,
ARRAY_AGG(s.avg) AS avg
FROM
(SELECT user_id, ROUND(AVG(avg), 1) AS avg, date FROM span GROUP BY user_id, date ORDER BY date) AS s JOIN users ON s.user_id = users.id
GROUP BY
name;
returns this:
user | data
"Marine" {6.8,6.6,4.4,5.3,7.0,5.7,5.8,6.0,5.0,4.0,6.5,NULL}
but only works for one user.
How can I get the correct score for each user and how can I aggregate the result into an array
so that I get the data in the proper format?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
You can check a DB fiddle of the solution here (without a function just raw SQL): https://www.db-fiddle.com/f/fR2JtQKVcQzrnuZ9SfpcNS/0
For the solution, I am providing some dummy data to show some real output as I don’t have your data:
create table if not exists users (
user_id serial primary key,
name text
);
create table if not exists games (
game_id serial primary key,
user_id int references users(user_id),
score numeric,
inserted_at timestamptz
);
insert into users (name)
values ('5w3n'),('CCoX'),('FHWbb8f7'),('NhsX1XcP'),('OPP6A3j3'),('t4N'),('scdHS'),('Adve'),('yqy'),
('fALF'),('J77T'),('TAXQ5B3V'),('GQGtBXV'),('U2yx6o'),('4in74FhL'),('ZsoZs'),('9dn6'),
('ewhi3u6q'),('q18');
insert into games(user_id, score, inserted_at)
values (5,589.53,'2022-01-01'),(9, 48.70,'2022-01-11'),(6, 27.05,'2022-02-01'),
(1,952.81,'2022-02-23'),(11,23.80,'2022-02-15'),(5,732.60,'2022-03-02'),
(7,460.05,'2022-03-09'),(4,402.98,'2022-03-29'),(7,375.06,'2022-03-30'),
(3,325.03,'2022-04-12'),(2,363.32,'2022-04-15'),(12,34.60,'2022-04-19'),
(6,370.70,'2022-04-22'),(6,462.08,'2022-05-01'),(2,575.39,'2022-05-02'),
(13,14.56,'2022-05-03');
The function:
create or replace function get_user_averages(
p_start_date timestamptz,
p_end_date timestamptz default now(),
p_interval interval default '1 month',
p_truncate text default 'month'
)
returns table(
user_id int,
name text,
data numeric[]
)
language plpgsql
as
$$
begin
return query
select x.user_id, x.name, array_agg(x.average order by x.gen)
from (
select
gen,
u.user_id,
u.name,
round(avg(g.score) filter(where gen = date_trunc(p_truncate, g.inserted_at)) ,2) as average
from generate_series(p_start_date, p_end_date, p_interval) gen
cross join games g
join users u on u.user_id = g.user_id
group by gen, u.user_id, u.name
) x
group by x.user_id, x.name
order by x.user_id;
end;
$$;
This function allows you to input your own custom start and end date as well as choosing your own interval (it can be 1 month, 1 day etc.). The output of the array itself (data
column) is intentionally ordered (so the array itself is ordered) by the generated dates in ascending order. It might not be a useful feature, however, considering you don’t have the dates available in the final output you won’t be able to know which average belongs to which generated date.
For this reason I have sorted the array so that your application code may be able to at least do something with the dates if it needs to. I would recommend trying to use a jsonb output instead of array where you would have a {month1:average1, month2:average2 ...}
kind of output which may be more useful.
You have to match the p_interval
and p_truncate
parameters to get the correct output. As one is for date generation, the other specifies the truncation of the timestamptz of the real table’s inserted_at
(so if you’re sending interval ‘1 month’, you should send month
in p_truncate
parameter)
Calling the function:
select * from get_user_averages('2022-01-01');
The result:
+-------+--------+-----------------------------------+
|user_id|name |data |
+-------+--------+-----------------------------------+
|1 |5w3n |{null,952.81,null,null,null,null} |
|2 |CCoX |{null,null,null,363.32,575.39,null}|
|3 |FHWbb8f7|{null,null,null,325.03,null,null} |
|4 |NhsX1XcP|{null,null,402.98,null,null,null} |
|5 |OPP6A3j3|{589.53,null,732.6,null,null,null} |
|6 |t4N |{null,27.05,null,370.7,462.08,null}|
|7 |scdHS |{null,null,417.56,null,null,null} |
|9 |yqy |{48.7,null,null,null,null,null} |
|11 |J77T |{null,23.8,null,null,null,null} |
|12 |TAXQ5B3V|{null,null,null,34.6,null,null} |
+-------+--------+-----------------------------------+
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0