All we need is an easy explanation of the problem, so here it is.
I have defined function
with table value parameter
which returns
median
as per values passed to it. Function
defined as:
CREATE FUNCTION [dbo].[fn_GetMedian](@List TypeMedian READONLY)
RETURNS INT
AS
BEGIN
<function body>
END
And Table Type
TypeMedian
definition as:
CREATE TYPE [dbo].[TypeMedian] AS TABLE(
[VALUE] [int] NULL
)
Now I have a table
Listing
with filled values in it as and a table
RESULT
to be filled according to table
Listing
:
Tables structure
as
LISTING(ListingCol1,ListingCol2,ListingCol3,ListingCol4,ListingCol5)
RESULT(Col1,Col2,Col3,Col4,Col5)
Listing table has more that 1000 rows of data.
All column
from both tables
are of type int
.
Now I want to fill columns
of RESULT table
and that column
could be calculate as:
Col1 = SUM(ListingCol1)
Col2 = SUM(ListingCol2)
Col3 = dbo.fn_GetMedian(ListingCol3)
Col4 = dbo.fn_GetMedian(ListingCol4)
Col5 = dbo.fn_GetMedain(ListingCol5)
And I’m doing so as:
INSERT INTO RESULT(Col1)
SELECT SUM(ListingCol1)
Update RESULT
SET Col2 = SUM(ListingCol2)
DECLARE @tbl_Median TypeMedian
INSERT INTO @tbl_Median
SELECT ListingCol3
FROM Listing
UPDATE RESULT
SET Col3 = dbo.fn_GetMedian(@tbl_Median)
-- For next column
DELETE FROM @tbl_Median
INSERT INTO @tbl_Median
SELECT ListingCol4
FROM Listing
UPDATE RESULT
SET Col4 = dbo.fn_GetMedian(@tbl_Median);
–And this update query I repeating for remaining columns.
How could I do that in single query?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
For two sums and three medians, on a single table, I honestly can’t see the benefit of using a complicated dynamic or function-based solution.
It is quite easy to construct a single query using Peter Larsson’s median method that I showed you before:
CREATE TABLE dbo.Listing
(
ListingCol1 integer NULL,
ListingCol2 integer NULL,
ListingCol3 integer NULL,
ListingCol4 integer NULL,
ListingCol5 integer NULL
);
CREATE TABLE dbo.Result
(
Col1 integer NULL,
Col2 integer NULL,
Col3 integer NULL,
Col4 integer NULL,
Col5 integer NULL
);
-- Just to show indexes are helpful for the median calculations
CREATE INDEX i ON dbo.Listing (ListingCol3)
CREATE INDEX j ON dbo.Listing (ListingCol4)
CREATE INDEX k ON dbo.Listing (ListingCol5)
Solution
INSERT dbo.Result
(
Col1,
Col2,
Col3,
Col4,
Col5
)
SELECT
SC.SumCol1,
SC.SumCol2,
SQ3.MedianCol3,
SQ4.MedianCol4,
SQ5.MedianCol5
FROM
(
-- Sums + counts needed for the median calculations
SELECT
SumCol1 = SUM(L.ListingCol1),
SumCol2 = SUM(L.ListingCol2),
CountCol3 = COUNT_BIG(L.ListingCol3),
CountCol4 = COUNT_BIG(L.ListingCol4),
CountCol5 = COUNT_BIG(L.ListingCol5)
FROM dbo.Listing AS L
) AS SC
CROSS APPLY
(
-- Median for column 3
SELECT
MedianCol3 = AVG(1.0 * SQ.ListingCol3)
FROM
(
SELECT
L3.ListingCol3
FROM dbo.Listing AS L3
WHERE
L3.ListingCol3 IS NOT NULL
ORDER BY
L3.ListingCol3 ASC
OFFSET (SC.CountCol3 - 1) / 2 ROWS
FETCH NEXT 1 + (1 - SC.CountCol3 % 2) ROWS ONLY
) AS SQ
) AS SQ3
CROSS APPLY
(
-- Median for column 4
SELECT
MedianCol4 = AVG(1.0 * SQ.ListingCol4)
FROM
(
SELECT
L4.ListingCol4
FROM dbo.Listing AS L4
WHERE
L4.ListingCol4 IS NOT NULL
ORDER BY
L4.ListingCol4 ASC
OFFSET (SC.CountCol4 - 1) / 2 ROWS
FETCH NEXT 1 + (1 - SC.CountCol4 % 2) ROWS ONLY
) AS SQ
) AS SQ4
CROSS APPLY
(
-- Median for column 5
SELECT
MedianCol5 = AVG(1.0 * SQ.ListingCol5)
FROM
(
SELECT
L5.ListingCol5
FROM dbo.Listing AS L5
WHERE
L5.ListingCol5 IS NOT NULL
ORDER BY
L5.ListingCol5 ASC
OFFSET (SC.CountCol5 - 1) / 2 ROWS
FETCH NEXT 1 + (1 - SC.CountCol5 % 2) ROWS ONLY
) AS SQ
) AS SQ5;
Expected execution plan shape:
Method 2
And this update query I repeating for remaining columns. How could I do that in single query?
Performance implications aside for a moment, the easiest way to accomplish this in a single query** is by creating a User-Defined Aggregate (UDA) via SQLCLR. This is what I described in my answer to your related question:
How can I pass column to function in sql?
If you had such a function, you could do the following:
INSERT INTO RESULT(Col1, Col2, Col3, Col4, Col5)
SELECT SUM(ListingCol1) AS [Col1],
SUM(ListingCol2) AS [Col2],
dbo.AggMedian(ListingCol3) AS [Col3],
dbo.AggMedian(ListingCol4) AS [Col4],
dbo.AggMedian(ListingCol5) AS [Col5]
FROM LISTING;
With that said, performance is still something that needs to be considered. The approach shown above does not work in all situations. If the LISTING
table has millions of rows (at least millions per each grouping), then this might not work. But if each grouping has 5000 rows, or maybe even 10,000 or something along those lines, and the process doesn’t run multiple times per minute, then you should be fine. Of course, as with anything, it should be tested against your actual data to determine if there is a performance issue or not.
** By “easiest way in a single query”, I am assuming that the desired method is one that can be easily applied in multiple situations, especially ad hoc queries.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0