Add missing month in a table of values

All we need is an easy explanation of the problem, so here it is.

I have a table with the following entries:

customer_ID lastDate Sum
1 ‘2012-04-30’ 5
1 ‘2012-06-30’ 4
1 ‘2013-07-31’ 25
2 ‘2012-04-30’ 7
2 ‘2012-05-31’ 4
2 ‘2012-06-30’ 1
2 ‘2012-07-31’ 6

I need to add missing month and the date which gets added should be the last date of that month with frequency value as 0.

The expected output is:

customer_ID lastDate Sum
1 2012-04-30 5
1 2012-05-31 0
1 2012-06-30 4
1 2012-07-31 0
1 2012-08-31 0
1 2012-09-30 0
1 2012-10-31 0
1 2012-11-30 0
1 2012-12-31 0
1 2013-01-31 0
1 2013-02-28 0
1 2013-03-31 0
1 2013-04-30 0
1 2013-05-31 0
1 2013-06-30 0
1 2013-07-31 25
2 2012-04-30 7
2 2012-05-31 4
2 2012-06-30 1
2 2012-07-31 6

I need to add missing month and the date which gets added should be the last date of that.

We can achieve this using CTE method. But can a similar result be obtained using joins?

Solution through CTE:

with cte as (
      select id, date, frequency,
             lead(date) over (partition by id order by date) as next_date
      from t
      union all
      select id, eomonth(date, 1), 0, next_date
      from cte
      where eomonth(date, 1) < dateadd(day, -1, next_date)
     )
select id, date, frequency
from cte
order by id, date;

I am trying to build a data pipeline and when I run this on that editor, it does not support recursive CTEs. I have no options.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If you have a table of numbers, or can access the built-in table in master:

SELECT
    T1.id,
    Expanded.[date],
    Expanded.frequency
FROM 
(
    -- Add next date for the current id
    SELECT
        T.*,
        next_date = 
            LEAD(T.[date], 1) OVER (
                PARTITION BY T.id
                ORDER BY T.[date])
    FROM dbo.t AS T
) AS T1
CROSS APPLY
(
    -- All month ends >= the current date and < next date
    SELECT
        [date] = EOMONTH(T1.[date], SV.number),
        frequency = IIF(SV.number = 0, T1.frequency, 0),
        SV.number
    FROM master.dbo.spt_values AS SV
    WHERE
        SV.[type] = N'P'
        AND SV.number < ISNULL(DATEDIFF(MONTH, T1.[date], T1.next_date), 1)
) AS Expanded
ORDER BY
    T1.id,
    T1.[date],
    Expanded.number;

The output matches that of the recursive CTE you provided (see demo link below):

id date frequency
1 2012-04-30 5
1 2012-05-31 0
1 2012-06-30 4
1 2012-07-31 0
1 2012-08-31 0
1 2012-09-30 0
1 2012-10-31 0
1 2012-11-30 0
1 2012-12-31 0
1 2013-01-31 0
1 2013-02-28 0
1 2013-03-31 0
1 2013-04-30 0
1 2013-05-31 0
1 2013-06-30 0
1 2013-07-31 25
2 2012-04-30 7
2 2012-05-31 4
2 2012-06-30 1
2 2012-07-31 6

Explanation

Logically, each row from T1 is expanded by the APPLY into the set of month-ends that ought to be present. Let’s look at one particular row in the source table:

id date frequency
1 2012-04-30 5

Adding the LEAD in the table expression aliased to T1 gives:

id date frequency next_date
1 2012-04-30 5 2012-06-30

This row is passed to the APPLY:

SELECT
    [date] = EOMONTH(T1.[date], SV.number),
    frequency = IIF(SV.number = 0, T1.frequency, 0),
    SV.number
FROM master.dbo.spt_values AS SV
WHERE
    SV.[type] = N'P'
    AND SV.number < ISNULL(DATEDIFF(MONTH, T1.[date], T1.next_date), 1)

The references to columns in T1 are outer references. Substituting the values from the current T1 row for [date], next_date, and frequency gives:

SELECT
    [date] = EOMONTH('2012-04-30', SV.number),
    frequency = IIF(SV.number = 0, 5, 0),
    SV.number
FROM master.dbo.spt_values AS SV
WHERE
    SV.[type] = N'P'
    AND SV.number < ISNULL(DATEDIFF(MONTH, '2012-04-30', '2012-06-30'), 1)

The ISNULL expression determines how many rows are returned from our table of numbers. In the current iteration the ISNULL expression evaluates to 2, so the rows returned from the numbers table have values 0 and 1.

The APPLY table expression as a whole returns:

date frequency number
2012-04-30 5 0
2012-05-31 0 1

Notice:

  • When number = 0, we are dealing with a row that exists in T1, so we just use the frequency given.
  • When number > 0, the row does not exist in T1 so we need to return a zero for frequency.

That gives us the results needed for the single row we chose from T1. We can do the same for the next row from T1 (substituting values as before), and so on until the result is complete. For more on how APPLY works logically, see my article Understanding and Using APPLY.

The only special handling is for the last row per id, where the LEAD returns NULL. In that case, we only need one row from the numbers table (value zero) to process the current row from T1.


The built-in table is sufficient for a range of dates from ‘2000-01-01’ to ‘2170-08-01’.

Try the db<>fiddle demo

Method 2

Using SET Based Approach

First you need Calender Table or Number table.

create table CalendarDate(Dates DateTime primary key)
    insert into CalendarDate WITH (TABLOCK)  (Dates)
select top (20000)
DATEADD(month, ROW_NUMBER()over(order by (select null))-1,'1950-01-01') DT
from sys.objects a ,sys.objects b


Alter table CalendarDate
add EOM datetime
update CalendarDate
set EOM=eomonth(dates)

Populate it in whatever way you want.

Sample data and query,

declare @t table(customer_ID int,lastDate datetime, Sums int)
insert into @t values
(1,'2012-04-30',5)
,(1,'2012-06-30',4)
,(1,'2013-07-31',25)
,(2,'2012-04-30',7)
,(2,'2012-05-31',4)
,(2,'2012-06-30',1)
,(2,'2012-07-31',6)

;With CTE as
(
select customer_ID,min(lastDate) minDate,max(lastDate)maxDate
from @t group by customer_ID
)
,CTE1 as
(
select customer_ID,eom from CTE t
, CalendarDate CD where cd.EOM>= minDate and cd.EOM<=maxDate
)
select c.customer_ID,eom,isnull(t.Sums,0) from CTE1 C
left join @t t on c.customer_ID=t.customer_ID and c.eom=t.lastDate

You should Create one very general Calendar table which will help in other requirement also.For any specific period you can do this,

select * from
(
select top (20000)
DATEADD(month, ROW_NUMBER()over(order by (select null))-1,'2000-01-01') DT
from sys.objects a ,sys.objects b
) tbl where dt<='2009-08-01'

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply