PostgreSQL query with max and min date plus associated id per row

All we need is an easy explanation of the problem, so here it is.

I have the following table:

CREATE TABLE trans (
    id SERIAL PRIMARY KEY,
    trans_date date,
    trans_time time        
);

I want to have the following view

CREATE OR REPLACE VIEW daily_trans AS
SELECT trans_date,
    max(trans_time) as first, 
    min(trans_time) as last,
    calculate_status(min(trans_time), max(trans_time)) as status 
GROUP BY trans_date 

with columns that specify the ids of the max and min trans_time.

How do I do that?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You can calculate this in one step with window functions:

CREATE OR REPLACE VIEW daily_trans AS
SELECT DISTINCT
       trans_date
     , first_value(trans_time) OVER w AS first_time
     , first_value(id)         OVER w AS first_id
     , last_value(trans_time)  OVER w AS last_time
     , last_value(id)          OVER w AS last_id
     , calculate_status(min(trans_time) OVER w
                      , max(trans_time) OVER w) AS status 
FROM   trans
WINDOW w AS (PARTITION BY trans_date ORDER BY trans_time, id
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
ORDER  BY 1;

I added id as secondary ORDER column in the window clause, to make the sort order stable in case of identical times per day.

If you are not familiar with window functions, be sure to read this chapter of the manual.

ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is a not so commonly used clause to adjust the window frame, because we want last_value() to always return the last value of the whole partition (the date), not the last value up to the current (including peers) according to the sort order.

How you to combine DISTINCT with window functions:

Method 2

Roughly, you want something like

select min_trans_time, min_trans.id as min_trans_time_id,
       max_trans_time, max_trans.id as max_trans_time_id
from (
    select   trans_date,
             max(trans_time) as max_trans_time, 
             min(trans_time) as min_trans_time,
    from     trans        
    group by trans_date) min_max_trans,
       trans as min_trans,
       trans as max_trans
where min_trans.trans_time = min_max_trans.min_trans_time
and   max_trans.trans_time = min_max_trans.max_trans_time

Which might not be stable if several transes share the same trans_time (that is, running the query on the same set of data might yield different results. One easy way of solving this, if it’s a concern, is to pick the max or min id, for instance. Of course, this might bias the results :).

You can also use Window Functions if you are using PostgreSQL 8.4 or later; they will provide a clearer query (clear if you are familiar with window functions :), and they let you do stuff which is pretty hard to do with the above strategy (for instance, getting the 2nd-higher value instead of the maximum). However, in my short experience, they performed worse than the self-join approach I propose here. Erwin’s answer contains a version of the query using window functions. It would be interesting to note which query executes better and/or can be optimized better (by adding indexes, etc.).

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply