All we need is an easy explanation of the problem, so here it is.
Table messages
:
conv_id | user_id | content | sent_time |
---|---|---|---|
1 | 001 | 1st_msg | 01-01-1990 00:00:00 |
2 | 002 | 2nd_msg | 02-01-1990 00:00:00 |
How do we select the first message and the first reply sent in a conversation (conv_id
) every day?
Notes:
- There can be many users.
- A single user can send multiple messages.
- This is a dataset of people, and only two people are chatting with each other.
- Throughout the day, multiple messages get exchanged.
- First message can be defined by the minimum sent time within day 1 of the first user.
- First reply can be defined as the minimum sent time within day 1 of the second user.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
There is a truly simple solution with DISTINCT ON
:
SELECT DISTINCT ON (date_trunc('day', sent_time), conv_id, user_id)
*
FROM tbl
ORDER BY date_trunc('day', sent_time), conv_id, user_id, sent_time;
db<>fiddle here
See:
An emulated index-skip may be faster. See:
- Optimize GROUP BY query to retrieve latest row per user
- Best performance to get distinct values for a given key from a big table
In case of timestamptz
, "days" are defined by the timezone setting of your current session, unless defined explicitly. See:
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0