# Sum date between two date and group by month

## All we need is an easy explanation of the problem, so here it is.

I have a table as below:

``````Name  Start Date  End Date
Joe   20/04/2021  20/05/2021
John  01/05/2021  28/05/2021
``````

I am using a SQL table-valued function to return a table that has 2 columns: Month and Count total Date of this month.

Example:

• Joe: 10 days in Apr, 20 days in May
• John: 28 days in May

Finally I will return a new table

Month Count
4 10
5 48

I tried to use datediff and datepart to group by month, but don’t know how to sum after group. Is there any way to do this?

Besides, I want to add filter from date and to date.

## How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

### Method 1

``````WITH
-- Generate months list
cte1 AS ( SELECT MIN(MONTH([Start Date])) [Month],
MAX(MONTH([End Date])) LastMonth,
MIN(YEAR([Start Date])) [Year]
FROM test
UNION ALL
SELECT [Month] + 1,
LastMonth,
[Year]
FROM cte1
WHERE [Month] < LastMonth ),
-- Convert months list to start-end list
cte2 AS ( SELECT DATEFROMPARTS([Year], [Month], 1) MonthStart,
EOMONTH(DATEFROMPARTS([Year], [Month], 1)) MonthEnd,
[Month]
FROM cte1 )
-- Get needed data
SELECT cte2.[Month],
SUM(1 + DATEDIFF(day, CASE WHEN test.[Start Date] < cte2.MonthStart
THEN cte2.MonthStart
ELSE test.[Start Date] END,
CASE WHEN test.[End Date] > cte2.MonthEnd
THEN cte2.MonthEnd
ELSE test.[End Date] END )) [Count]
FROM test
-- join only overlapped periods
JOIN cte2 ON test.[Start Date] <= cte2.MonthEnd
AND cte2.MonthStart <= test.[End Date]
GROUP BY cte2.[Month];
``````

https://dbfiddle.uk/?rdbms=sqlserver_2014&fiddle=c2d6d92223f6516d78550a021cd5c3ce

The query assumes that all periods belongs the same year. Of course it can be simplified. The period length from ‘2021-04-20’ to ‘2021-04-30’ (inclusive) is 11 days, not 10.

I want to add filter from date and to date

This affects only `cte1` (generate the calendar based not on table data but on needed period dates) and, if you want to filter some month partially (not from first day till the last day of month) on `cte2`. Main query will be filtered automatically due to `ON` clause expression.

### Method 2

Another approach:

``````CREATE TABLE dbo.DateRanges
(
[Name] varchar(20) NOT NULL,
[Start Date] date NOT NULL,
[End Date] date NOT NULL,

PRIMARY KEY ([Name], [Start Date]),
CHECK ([End Date] >= [Start Date])
);

INSERT dbo.DateRanges
([Name], [Start Date], [End Date])
VALUES
(CONVERT(varchar(20), 'Joe'), CONVERT(date, '20/04/2021', 103), CONVERT(date, '20/05/2021', 103)),
(CONVERT(varchar(20), 'John'), CONVERT(date, '01/05/2021', 103), CONVERT(date, '28/05/2021', 103));
``````

Solution:

``````SELECT
[Month] = Months.m,
[Count] =
SUM
(
-- Number of days in the current month
1 + DATEDIFF
(
DAY,
-- Latest of [Start Date] and current month start date
IIF(DR.[Start Date] <= MonthRange.StartDate, MonthRange.StartDate, DR.[Start Date]),
-- Earliest of [End Date] and current month end date
IIF(DR.[End Date] >= MonthRange.EndDate, MonthRange.EndDate, DR.[End Date])
)
)
FROM dbo.DateRanges AS DR
JOIN
(
-- Month numbers
VALUES
(01), (02), (03), (04), (05), (06),
(07), (08), (09), (10), (11), (12)
) AS Months (m)
-- Months that overlap the [Start Date], [End Date] range
ON Months.m BETWEEN MONTH(DR.[Start Date]) AND MONTH(DR.[End Date])
OR Months.m BETWEEN MONTH(DR.[End Date]) AND MONTH(DR.[Start Date])
CROSS APPLY
(
-- Start and end day of each month
VALUES
(
DATEFROMPARTS(YEAR(DR.[Start Date]), Months.m, 1),
EOMONTH(DATEFROMPARTS(YEAR(DR.[Start Date]), Months.m, 1))
)
) AS MonthRange (StartDate, EndDate)
GROUP BY
Months.m
ORDER BY
Months.m;
``````

Output:

Month Count
4 11
5 48

Online demo:

db<>fiddle demo

### Method 3

#### TL;DR

I added some sample data over a previous year (see fiddle) and included a period which straddled New Year’s Eve. The approach I’ve used assumes the existence of a calendar table – ways of doing this using recursive CTE’s are discussed below. By adding the `WHERE` clauses supplied (commented out), one can either restrict the date range and/or whether there was any stay in a given month.

``````SELECT
DATENAME(MONTH, ad.tdate) AS "Name",  -- Not strictly necessary
SUM
(
CASE
WHEN s.start_date IS NOT NULL THEN 1
ELSE 0
END
) AS "Stays"
LEFT JOIN stay s
-- AND ad.tdate <= '2021-06-30'       -- Restrict dates of stays
-- WHERE s.start_date IS NOT NULL     -- Show only months with at least 1 stay
OPTION (MAXRECURSION 0);
``````

Result (see extra data in fiddle – this result appears to be consistent):

``````Month        Name   YEAR    Stays
1     January   2020        0
2    February   2020        0
3       March   2020        0
4       April   2020        0
5         May   2020       12
6        June   2020       20
7        July   2020       12
8      August   2020       20
9   September   2020        0
10     October   2020        6
11    November   2020       11
12    December   2020       17
1     January   2021        2
2    February   2021        2
3       March   2021        2
4       April   2021       11
5         May   2021       48
6        June   2021        0
7        July   2021        0
8      August   2021        0
9   September   2021        2
10     October   2021        0
11    November   2021       11
12    December   2021        5
``````

#### More detailed discussion:

There are two or three alternative methods to the accepted answer which might work. This is a classic case of where a calendar table is required. There are two ways of doing this – the first is by using "classic" SQL – using a function – and also by using the more modern `CTE` (Common Table Expression) approach, in particular a recursive CTE (see links in the reference above).

The first gem I came across was this article by Jon Tavernier on how to create a (materialised) calendar table in SQL Server – I pared it back to the minimum required by this problem, although this sort of table has so many potential uses, you may wish to consider doing it for period 10-20 years (only ~ 3.5K – 7K records) and with far more fields (e.g. 1st day of month, end of accounting period…).

Just one small point – your table definition seems to be simplified, but I would suggest that you make it something more like this – increasing the amount of information, the server has makes it easier for it to come up with a good plan:

``````CREATE TABLE stay
(
name VARCHAR(255) NOT NULL,
start_date DATE NOT NULL,
end_date DATE   NOT NULL,

CONSTRAINT n_sd_ed_pk PRIMARY KEY (name, start_date, end_date),

CONSTRAINT n_sd_uq UNIQUE (name, start_date),

CONSTRAINT n_ed_uq UNIQUE (name, end_date),

CONSTRAINT sd_lt_ed_ck CHECK (start_date < end_date),

);
``````

You might also want to ensure that there is no overlap between a given `name` and their `start_date` and `end_date` – but I don’t know how to do this in SQL Server short of using a trigger – my UNIQUE constraints are an attempt at that through SQL – but they won’t prevent nested intervals…

Here’s an article on how to do it using a trigger.

#### First approach (calendar table – Tavernier):

The code for approach 1 is available on the fiddle here.

``````DECLARE @start_dt AS DATE = '2021-01-01';       -- Date from which the calendar table will be created (inclusive).
DECLARE @end_dt   AS DATE = '2022/01/01';       -- Calendar table will be created up to this date (not inclusive).

CREATE TABLE all_dates
(
tdate DATE PRIMARY KEY
);

WHILE @start_dt < @end_dt
BEGIN
INSERT INTO all_dates
(
tdate   -- can generate many more useful values - see article by Jon Tavernier
)
VALUES
(
@start_dt
)
SET @start_dt = DATEADD(DAY, 1, @start_dt)
END;
``````

and then I check by:

``````SELECT * FROM all_dates;
``````

Result (snipped for brevity):

``````  tdate
2021-01-01
2021-01-02
2021-01-03
2021-01-04
2021-01-05
``````

Next, we set up our test data as per the OP’s question (I called the table `stay`):

``````name    start_date  end_date
Joe     2021-04-20  2021-05-20
John    2021-05-01  2021-05-28
``````

Then, I ran this query:

``````SELECT
FROM stay s
``````

Result:

``````Month     Name  Count
4    April     11
5      May     48
``````

The month name wasn’t included in the spec, and can easily be dropped – just thought it was a nice to have.

I then `turned on Statistics` as follows:

``````SET STATISTICS PROFILE ON;
SET STATISTICS TIME ON;
SET STATISTICS IO ON;
``````

and reran the query – results discussed below.

#### Second approach (calendar year table using pre-defined dates via a CTE):

The code for this approach is available on the fiddle here.

Sometimes consultants aren’t allowed to create physical tables, and I wanted the natural approach for this is to use a CTE approach and I stumbled across this next gem (`Creating a date dimension or calendar table in SQL Server`) by Aaron Bertand (of this parish).

He begins his piece with these sage words:

One of the biggest objections I hear to calendar tables is that people
don’t want to create a table. I can’t stress enough how cheap a table
can be in terms of size and memory usage, especially as underlying
storage continues to be larger and faster, compared to using all kinds
of functions to determine date-related information in every single
query. Twenty or thirty years of dates stored in a table takes a few
MBs at most, even less with compression, and if you use them often
enough, they’ll always be in memory.

He goes on to say that:

This is a one-time population, so I’m not worried about speed, even
though this specific CTE approach is no slouch. I like to materialize
all of the columns to disk, rather than rely on computed columns,
since the table becomes read-only after initial population. So I’m
going to do a lot of those calculations during the initial series of
CTEs. To start, I’ll show the output of each CTE one at a time.

So, even the function in the first approach might be suitable – but, `CTE`‘s (Common Table Expressions – also, see the link to RECURSIVE CTE’s within) were, as best as I can ascertain, introduced into SQL Server after Tavernier’s article above.

I adapted the code from Bertrand’s second CTE as follows:

``````WITH seq (n) AS
(
SELECT 0
UNION ALL
SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, '2021-01-01', '2021-12-31')
),
all_dates (tdate) AS
(
SELECT DATEADD(DAY, n, '2021-01-01')  FROM seq
)
SELECT tdate FROM all_dates
OPTION (MAXRECURSION 0);  -- Important!!! Fails after 100 recursions if not present
``````

Result (snipped for brevity):

``````         tdate
2021-01-01 00:00:00.000
2021-01-02 00:00:00.000
2021-01-03 00:00:00.000
``````

So, we’re golden as far as our calendar table is concerned…

Then, I just put that over my original query from approach 1 as follows:

``````WITH seq (n) AS
(
SELECT 0
UNION ALL
SELECT n + 1 FROM seq
WHERE n < DATEDIFF(DAY, '2021-01-01', '2021-12-31')
),
all_dates (tdate) AS
(
SELECT DATEADD(DAY, n, '2021-01-01') FROM seq
)
SELECT
FROM stay s
OPTION (MAXRECURSION 0);
``````

Result (same):

``````Month    Name   Count
4   April      11
5     May      48
``````

So, I did the same as for the first approach and turned on Statisics – discussed below.

#### 3rd approach (restricted dates):

The fiddle is available here.

Here, in the interests of "efficiency", I restricted the dates in the calendar table to the `MIN()` of start_time and the `MAX()` of `end_time` in the `stay` table as follows:

``````WITH stays (min_sd, max_ed, no_days) AS
(
SELECT
MIN(start_date) AS min_sd,
MAX(end_date) AS max_ed,
DATEDIFF(DAY, MIN(start_date), MAX(end_date)) AS no_days
FROM stay
),
seq(n, tdate) AS
(
SELECT 0, (SELECT min_sd FROM stays)
UNION ALL
SELECT n + 1, DATEADD(DAY, n + 1, (SELECT min_sd FROM stays))
FROM seq
WHERE n < (SELECT no_days FROM stays)
)
SELECT * FROM seq;
``````

Result (snipped for brevity):

``````n   tdate
0   2021-04-20
1   2021-04-21
2   2021-04-22
(39 rows...)
``````

So, now we have every date between the earliest `start_date` and the latest `end_date`.

And using these dates as our calendar table, we formulate the query:

``````WITH stays (min_sd, max_ed, no_days) AS
(
SELECT
MIN(start_date) AS min_sd,
MAX(end_date) AS max_ed,
DATEDIFF(DAY, MIN(start_date), MAX(end_date)) AS no_days
FROM stay
),
all_dates(n, tdate) AS
(
SELECT 0, (SELECT min_sd FROM stays)
UNION ALL
SELECT n + 1, DATEADD(DAY, n + 1, (SELECT min_sd FROM stays))
FROM all_dates
WHERE n < (SELECT no_days FROM stays)
)
SELECT
FROM stay s
OPTION (MAXRECURSION 0);
``````

Result (same):

``````Month    Name   Count
4   April      11
5     May      48
``````

I have some performance stats at the end of the fiddles. It is, of course, very difficult to draw many conclusions from performance analyses of (relatively) tiny amounts of data on a server over which one has no control – I would urge you to test with your own data and under your own load…

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂