Calculating multiple medians

All we need is an easy explanation of the problem, so here it is.

I have defined function with table value parameter which returns median as per values passed to it. Function defined as:

    CREATE FUNCTION [dbo].[fn_GetMedian](@List TypeMedian READONLY)
    RETURNS INT
    AS
    BEGIN
    <function body>
    END

And Table Type TypeMedian definition as:

CREATE TYPE [dbo].[TypeMedian] AS TABLE(
    [VALUE] [int] NULL
)

Now I have a table Listing with filled values in it as and a table RESULT to be filled according to table Listing:

Tables structure as

LISTING(ListingCol1,ListingCol2,ListingCol3,ListingCol4,ListingCol5)
RESULT(Col1,Col2,Col3,Col4,Col5)

Listing table has more that 1000 rows of data.

All column from both tables are of type int.
Now I want to fill columns of RESULT table and that column could be calculate as:

Col1 = SUM(ListingCol1)
Col2 = SUM(ListingCol2)
Col3 = dbo.fn_GetMedian(ListingCol3)
Col4 = dbo.fn_GetMedian(ListingCol4)
Col5 = dbo.fn_GetMedain(ListingCol5)

And I’m doing so as:

INSERT INTO RESULT(Col1)
SELECT SUM(ListingCol1)

Update RESULT
SET Col2 = SUM(ListingCol2)

DECLARE @tbl_Median TypeMedian

INSERT INTO @tbl_Median
SELECT ListingCol3
FROM Listing

UPDATE RESULT
SET Col3 = dbo.fn_GetMedian(@tbl_Median)

-- For next column
DELETE FROM @tbl_Median

INSERT INTO @tbl_Median
SELECT ListingCol4
FROM Listing

UPDATE RESULT
SET Col4 = dbo.fn_GetMedian(@tbl_Median); 

–And this update query I repeating for remaining columns.
How could I do that in single query?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

For two sums and three medians, on a single table, I honestly can’t see the benefit of using a complicated dynamic or function-based solution.

It is quite easy to construct a single query using Peter Larsson’s median method that I showed you before:

CREATE TABLE dbo.Listing
(
    ListingCol1 integer NULL,
    ListingCol2 integer NULL,
    ListingCol3 integer NULL,
    ListingCol4 integer NULL,
    ListingCol5 integer NULL
);

CREATE TABLE dbo.Result
(
    Col1 integer NULL,
    Col2 integer NULL,
    Col3 integer NULL,
    Col4 integer NULL,
    Col5 integer NULL
);

-- Just to show indexes are helpful for the median calculations
CREATE INDEX i ON dbo.Listing (ListingCol3)
CREATE INDEX j ON dbo.Listing (ListingCol4)
CREATE INDEX k ON dbo.Listing (ListingCol5)

Solution

INSERT dbo.Result
(
    Col1,
    Col2,
    Col3,
    Col4,
    Col5
)
SELECT
    SC.SumCol1,
    SC.SumCol2,
    SQ3.MedianCol3,
    SQ4.MedianCol4,
    SQ5.MedianCol5
FROM 
(
    -- Sums + counts needed for the median calculations
    SELECT 
        SumCol1 = SUM(L.ListingCol1),
        SumCol2 = SUM(L.ListingCol2),
        CountCol3 = COUNT_BIG(L.ListingCol3),
        CountCol4 = COUNT_BIG(L.ListingCol4),
        CountCol5 = COUNT_BIG(L.ListingCol5)
    FROM dbo.Listing AS L
) AS SC
CROSS APPLY 
(
    -- Median for column 3
    SELECT
        MedianCol3 = AVG(1.0 * SQ.ListingCol3)
    FROM
    (
        SELECT 
            L3.ListingCol3
        FROM dbo.Listing AS L3
        WHERE 
            L3.ListingCol3 IS NOT NULL
        ORDER BY 
            L3.ListingCol3 ASC
        OFFSET (SC.CountCol3 - 1) / 2 ROWS
        FETCH NEXT 1 + (1 - SC.CountCol3 % 2) ROWS ONLY
    ) AS SQ
) AS SQ3
CROSS APPLY 
(
    -- Median for column 4
    SELECT
        MedianCol4 = AVG(1.0 * SQ.ListingCol4)
    FROM
    (
        SELECT 
            L4.ListingCol4
        FROM dbo.Listing AS L4
        WHERE 
            L4.ListingCol4 IS NOT NULL
        ORDER BY 
            L4.ListingCol4 ASC
        OFFSET (SC.CountCol4 - 1) / 2 ROWS
        FETCH NEXT 1 + (1 - SC.CountCol4 % 2) ROWS ONLY
    ) AS SQ
) AS SQ4
CROSS APPLY 
(
    -- Median for column 5
    SELECT
        MedianCol5 = AVG(1.0 * SQ.ListingCol5)
    FROM
    (
        SELECT 
            L5.ListingCol5
        FROM dbo.Listing AS L5
        WHERE 
            L5.ListingCol5 IS NOT NULL
        ORDER BY 
            L5.ListingCol5 ASC
        OFFSET (SC.CountCol5 - 1) / 2 ROWS
        FETCH NEXT 1 + (1 - SC.CountCol5 % 2) ROWS ONLY
    ) AS SQ
) AS SQ5;

Expected execution plan shape:

Plan

Method 2

And this update query I repeating for remaining columns. How could I do that in single query?

Performance implications aside for a moment, the easiest way to accomplish this in a single query** is by creating a User-Defined Aggregate (UDA) via SQLCLR. This is what I described in my answer to your related question:

How can I pass column to function in sql?

If you had such a function, you could do the following:

INSERT INTO RESULT(Col1, Col2, Col3, Col4, Col5)
  SELECT SUM(ListingCol1) AS [Col1],
         SUM(ListingCol2) AS [Col2],
         dbo.AggMedian(ListingCol3) AS [Col3],
         dbo.AggMedian(ListingCol4) AS [Col4],
         dbo.AggMedian(ListingCol5) AS [Col5]
  FROM   LISTING;

With that said, performance is still something that needs to be considered. The approach shown above does not work in all situations. If the LISTING table has millions of rows (at least millions per each grouping), then this might not work. But if each grouping has 5000 rows, or maybe even 10,000 or something along those lines, and the process doesn’t run multiple times per minute, then you should be fine. Of course, as with anything, it should be tested against your actual data to determine if there is a performance issue or not.

** By “easiest way in a single query”, I am assuming that the desired method is one that can be easily applied in multiple situations, especially ad hoc queries.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply