All we need is an easy explanation of the problem, so here it is.
I have a Timescale DB / PostgreSQL table (DataTable) which looks like this
I’m trying to find a query which would return different columns for different subsystem/metrics combination like this
Furthermore I would like to reduce the amount of requested data points using the Timescale DB function time_bucket_gapfill() to keep the response time low for big time ranges. I would also like to use different aggregation functions for different subsystem/metric combinations.
I use the queried data to plot the curves in Grafana.
What is the most time efficient way to achieve that?
At the moment I use separate queries for each metric which quickly gets very slow. For the example the queries could look like that
SELECT
time_bucket_gapfill('30s',time,start=>'2021-07-19T09:06:26.605Z',finish=>'2021-07-19T20:11:12.340Z') AS "time",
max(value) AS "Latitude"
FROM DataTable
WHERE
time BETWEEN '2021-07-19T09:06:26.605Z' AND '2021-07-19T20:11:12.340Z' AND
subsystem = 'position' AND
metric = 'lat'
GROUP BY 1,metric,subsystem
ORDER BY time
SELECT
time_bucket_gapfill('30s',time,start=>'2021-07-19T09:06:26.605Z',finish=>'2021-07-19T20:11:12.340Z') AS "time",
min(value) AS "Longitude"
FROM DataTable
WHERE
time BETWEEN '2021-07-19T09:06:26.605Z' AND '2021-07-19T20:11:12.340Z' AND
subsystem = 'position' AND
metric = 'lon'
GROUP BY 1,metric,subsystem
ORDER BY time
SELECT
time_bucket_gapfill('30s',time,start=>'2021-07-19T09:06:26.605Z',finish=>'2021-07-19T20:11:12.340Z') AS "time",
avg(value) AS "Temperature"
FROM DataTable
WHERE
time BETWEEN '2021-07-19T09:06:26.605Z' AND '2021-07-19T20:11:12.340Z' AND
subsystem = 'health' AND
metric = 'temperature'
GROUP BY 1,metric,subsystem
ORDER BY time
I would much appreciate if somebody could point me in the right direction.
Update:
Using the FILTER clause as suggested by @Charlieface doesn’t work for me, because it doesn’t remove the other rows but instead places NULLs there. Here an example with some real data (the example before was using simplified data and names).
I also noticed that the timestamps of the metrics belonging together have tiny differences which would prevent different metrics being on the same row, but that is easy to changes in the unit which feeds the data into the DB.
Update 2:
The timestamps issue was fixed and the WHERE statement was added again as suggested by @Charlieface, but there are still NULL in the data which makes Grafana plot the data points without lines in between.
I would like the result to look like this instead.
Update 3:
I missed that @Charlieface answer only groups by time. If I do that, then I get the result I wanted.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
You can use conditional aggregation for this.
In PostgreSQL you can use the FILTER
clause.
In other DBMSs you can use a condition inside the aggregation
avg(case when ... then value end)
SELECT
time_bucket_gapfill('30s', time, start => '2021-07-19T09:06:26.605Z', finish => '2021-07-19T20:11:12.340Z') AS "time",
max(value) FILTER (WHERE subsystem = 'position' AND metric = 'lat') AS "Latitude",
min(value) FILTER (WHERE subsystem = 'position' AND metric = 'lon') AS "Longitude",
avg(value) FILTER (WHERE subsystem = 'health' AND metric = 'temperature') AS "Temperature"
FROM DataTable
WHERE
time BETWEEN '2021-07-19T09:06:26.605Z' AND '2021-07-19T20:11:12.340Z'
AND (subsystem = 'position' AND metric IN ('lat', 'lon') OR
subsystem = 'health' AND metric = 'temperature')
GROUP BY 1
ORDER BY time;
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0