Why changing the column in the ORDER BY section of window function “MAX() OVER()” affects the final result?

All we need is an easy explanation of the problem, so here it is.

I have a table with below structure and it’s data :

create table test_table
(
Item_index   int,
Item_name    varchar(50)
)

insert into test_table (Item_index,Item_name) values (0,'A')
insert into test_table (Item_index,Item_name) values (1,'B')
insert into test_table (Item_index,Item_name) values (0,'C')
insert into test_table (Item_index,Item_name) values (1,'D')
insert into test_table (Item_index,Item_name) values (0,'E')

I want to know why changing the column in order by section of the query , changes the result? In QUERY-1 , I used item_index and in the QUERY-2 I used item_name column in the order by section. I thought that both queries must generate the same result because I used item_index in both queries for partitioning! I’m completely confused now ! why the order by column should affect the final result?

QUERY-1:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_index) new_column
from test_table t;

RESULT:

Item_index  Item_name     new_column
----------- --------------------------
0           A                E
0           C                E
0           E                E
1           D                D
1           B                D

QUERY-2:

select t.*,
       max(t.Item_name)over(partition by t.item_index order by item_name) new_column
from test_table t;

RESULT:

Item_index  Item_name  new_column
----------- -----------------------
0           A             A
0           C             C
0           E             E
1           B             B
1           D             D

Can anybody explain how exactly these two queries are being executed and why each of them generates different result?

Thanks in advance

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

max(t.Item_name)over(partition by t.item_index order by item_index) new_column

Let’s take a group where t.item_index = 0. It is

Item_index Item_name
0 A
0 C
0 E

When order by item_index is applied then all rows have the same value, hence all of them are included into the frame, and all rows values are used for MAX() selection. So the value 'E' is returned for all rows.


max(t.Item_name)over(partition by t.item_index order by item_name)

Let’s take the same group.

Item_index Item_name
0 A
0 C
0 E

Now the sorting key differs, and when the window RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is applied then different rows are included into the frame to be investigated.

For 1st row only this row is included into the frame, and 'A' is the only value in the frame, so it is returned.

For 2nd row first 2 rows are included into the frame, the values 'A' and 'C' are compared, and 'C' is returned as maximal value in the frame.

For 3rd row all 3 rows are included into the frame, the values 'A', 'C' and 'E' are compared, and 'E' is returned as maximal value in the frame.

Method 2

The explanation for the different results is given in SQL Server’s documentation about window functions, the ORDER BY section:

ORDER BY

Defines the logical order of the rows within each partition of the result set. That is, it specifies the logical order in which the window function calculation is performed.

If it is specified, and a ROWS/RANGE is not specified, then default
RANGE UNBOUNDED PRECEDING AND CURRENT ROW is used as default for
window frame by the functions that can accept optional ROWS/RANGE
specification (for example min or max).

Note that MIN() and MAX() window aggregates accept an optional ROWS or RANGE specification.

When there is no such specification, they calculate the MIN and MAX over the whole partition. When there is, they calculate the MIN or MAX over the specified range. Since your two queries specify different orders/ranges, they yield different results.

If you wanted the MAX over the whole partition, then remove the ORDER BY range:

If it is not specified, the default order is ASC and window function will use all rows in partition.

Method 3

The windowing clause you use (in this case the default of "range between unbounded preceding and current row") operates on what you order by.
There is some logic to that: to be able to know what is "preceding" or "following" you have to talk about ordered data.

In query-1 you order by item_index, which is also what you partition by.
So let’s look at the partition with item_index = 0.
Every row in the partition wil have an item_index 0.
That means that that for each row the window will effectively be "All rows with an item_index <= 0".
So, basically all rows in the partition.
And the max(item_name) of that is ‘E’.

In query-2 you order by item_name.
For that same partition that means that the window for the ‘A’ row will be "All rows with an item_name <= ‘A’"
That is just the ‘A’ row, so the max(item_name) is ‘A’.
The window for the ‘C’ row will be "All rows with an item_name <= ‘C’"
That’s the ‘A’ and ‘C’ row, so the max(item_name) is ‘C’ .
The window for the ‘E’ row will be "All rows with an item_name <= ‘E’"
That’s the ‘A’, ‘C’ and ‘E’ row, so the max(item_name) is ‘E’ .

Hope this helps to visualise what happens.

Method 4

There is something you need to keep in mind for windowing functions: they are performed on every row. Thus windowing functions like lag and lead apply to the previous and next rows. By adding an order by, you are basically saying “in this grouping, up to and including the current row, what is the max value for column item_name”.

This doesn’t make sense when using max on the same column as you are ordering by (assuming asc), as it is always going to the same as the current column, but it makes sense for min and avg. And having it do anything else for max would be way too complicated.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply