# Detect multiple valid date periods for same id, interleaved with non-valid periods

## All we need is an easy explanation of the problem, so here it is.

My table structure and some sample data (test table), as well as my desired output (results) are available here.

When I do

``````SELECT distinct colony, min(date_check), max(date_check)
from test
where protection ='Y'
group by colony
``````

I only detect 1 event by colony.

Should I use a function or is there another way ?

ps : if you think of a better title to help users, don’t hestitate, I struggled.

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

To solve your issue, I did the following (a fiddle for all code below is available here):

A couple of points:

• This solution makes use of the LAG() window function.

• Window functions are very powerful and will repay any effort spent learning them many times over.

• And finally, PostgreSQL 9.5 is no longer supported – you might be better off moving to a supported version.

#### First step:

We obtain the points at which a change occurs in either colony or protection.

``````SELECT
colony,
protection,
date_check,
CASE
WHEN (LAG(protection, 1) OVER (PARTITION BY colony
ORDER BY date_check) != protection) THEN 1
ELSE 0
END AS change
FROM test;
``````

Result (snipped for brevity):

``````colony  protection    date_check   change
2           N    2019-10-26        0
2           N    2019-10-27        0
2           Y    2019-11-01        1
2           Y    2019-11-03        0
7           Y    2019-10-12        0
7           Y    2019-10-13        0
...
...
``````

#### Step 2:

We sum the changes, obtaining distinct records for each start and stop date of when protection = ‘Y’.

``````SELECT
colony,
date_check,
SUM(change) OVER (PARTITION BY colony ORDER BY date_check) AS sc
FROM
(
SELECT
colony,
protection,
date_check,
CASE
WHEN (LAG(protection, 1) OVER (PARTITION BY colony
ORDER BY date_check) != protection) THEN 1
ELSE 0
END AS change
FROM test
) AS tab
WHERE protection = 'Y'  -- test what happens when we comment out this line...
ORDER BY colony, date_check;
``````

Result:

``````colony  date_check  sc
2       2019-11-01  1
2       2019-11-03  1
7       2019-10-12  0
7       2019-10-13  0
7       2019-10-14  0
7       2019-10-15  0
7       2019-10-16  0
7       2019-10-17  0
7       2019-10-23  1
7       2019-10-24  1
7       2019-10-25  1
7       2019-10-26  1
7       2019-11-01  2
7       2019-11-04  2
``````

#### Step 3:

Finally, we get the MIN() and MAX() of the `check_date` where there has been a change from ‘N’ to ‘Y’ (or vice versa) but only taking those where `protection` = ‘Y’.

``````SELECT
colony,
MIN(date_check) AS "Date in",
MAX(date_check) AS "Date out"
, sc  -- not strictly necessary here in the SELECT - illustrative!
FROM
(
SELECT
colony,
date_check,
SUM(change) OVER (PARTITION BY colony ORDER BY date_check) AS sc
FROM
(
SELECT
colony,
protection,
date_check,
CASE
WHEN (LAG(protection, 1) OVER (PARTITION BY colony
ORDER BY date_check) != protection) THEN 1
ELSE 0
END AS change
FROM test
) AS tab1
WHERE protection = 'Y'
ORDER BY colony, date_check
) AS tab2
GROUP BY colony, sc   -- sc not necessary in the SELECT but
ORDER BY colony, sc;  -- it is required in the   GROUP BY - test!!!
``````

Result:

``````colony    Date in     Date out  sc
2  2019-11-01  2019-11-03  1
7  2019-10-12  2019-10-17  0
7  2019-10-23  2019-10-26  1
7  2019-11-01  2019-11-04  2
``````

Q.E.D.

The window function solution appears to be the most performant – see here – run the fiddle several times and vary the order of the queries… I’ve tried to warm up the cache before running `EXPLAIN (ANALYZE...)` but test with your own tables(s) and hardware…

### Method 2

Without CTEs and window functions:

``````SELECT colony, MIN(date_in) date_in, date_out
FROM ( SELECT t1.colony, t1.date_check date_in, MAX(t2.date_check) date_out
FROM test t1
JOIN test t2 ON t1.colony = t2.colony
AND t1.date_check < t2.date_check
AND t1.protection = 'Y'
AND t2.protection = 'Y'
AND NOT EXISTS ( SELECT NULL
FROM test t3
WHERE t1.colony = t3.colony
AND t1.date_check < t3.date_check
AND t3.date_check < t2.date_check
AND t3.protection != 'Y' )
GROUP BY t1.colony, t1.date_check ) subquery
GROUP BY colony, date_out
ORDER BY colony, date_in;
``````

With CTE and window functions:

``````WITH
cte1 AS (SELECT *,
CASE WHEN protection = 'N' OR LAG(protection) OVER (PARTITION BY colony ORDER BY date_check) = 'Y'
THEN 0
ELSE 1 END AS row_in,
CASE WHEN protection = 'Y' OR LEAD(protection) OVER (PARTITION BY colony ORDER BY date_check) = 'N'
THEN 0
ELSE 1 END AS row_out
FROM test),
cte2 AS (SELECT *, SUM(row_in) OVER (PARTITION BY colony ORDER BY date_check) group_no
FROM cte1)
SELECT colony, MIN(date_check) date_in, MAX(date_check) date_out
FROM cte2
WHERE protection = 'Y'
GROUP BY colony, group_no
ORDER BY colony, date_in;
``````

https://dbfiddle.uk/?rdbms=postgres_9.5&fiddle=bf769987d7211b73e89c0564bc6902a5

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂