All we need is an easy explanation of the problem, so here it is.
I have the following query;
SELECT TOP 100 ID FROM [dbo].[TableName] WITH (NOLOCK) WHERE TypeId = 2 AND DateTimeUTC < '2022-Aug-04 07:02:40' AND DateTimeUTC > '4/26/2022 7:36:36 AM' ORDER BY ID ASC
The table [dbo].[TableName] (Not its real name, btw) has just over 118 million rows.
I’ve created the following Index on this table;
CREATE INDEX [ix_TableName_DateTimeUTC_TypeId] ON [dbo].[TableName] (DateTimeUTC, TypeId) WITH FILLFACTOR = 90;
If I run this query (excluding the
ORDER BY), the query performs a SEEK on the above index, and completes instantly. However, as soon as I include the
ORDER BY, the query performs a SCAN instead on the PK, reading all 118+ million rows. As you can imagine, this tanks the performance and the query takes a long time to finish.
The simplest way to resolve this problem is to just remove the
ORDER BY clause altogether, however I don’t think that’s possible because the application (which makes this call) requires the data to be returned in order.
Any suggestions on how to improve this?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
I would change the index to look like this:
CREATE INDEX [TypeId_Id_DateTimeUTC] ON [dbo].[TableName] ( TypeId, Id, DateTimeUTC ) WITH ( FILLFACTOR = 100, SORT_IN_TEMPDB = ON );
The idea is to make the initial data location and sorting free, and also support the range predicate. I discuss this in some detail in these blog posts:
It is usually better, as a practical matter, to avoid a sort than a residual predicate.
You should use a consistent unambiguous format for datetime literals. It is weird having two entirely different formats for the
DateTimeUTC, TypeId is not the optimal order for that index.
Columns used in equality conditions should be listed first so if this index is specifically to optimise that query then TypeId should be listed first (
TypeId, DateTimeUTC). Otherwise best it can do is a range seek on the date part and a residual predicate.
If you do make that indexing change and still see the scan on the clustered index this is presumably because SQL Server thinks it is quicker to read them from a source that already has them in the desired order and discard the unmatching ones than it will be to sort them at run time. Due to the
TOP 100 it only needs to find the first 100 to match and then can stop the scan.
You may well be a similar case to the issue here where date is largely correlated with
id rather than being independent of it so it underestimates the rows that will need to be read in
id order before it finds 100 matching the predicate.
ID is an ascending identity column and given that your
DateTimeUTC predicate ends today likely the matching rows will all be at the end of the index not scattered evenly through out it so this is pretty much worst case.
Possible query hints to look at are
DISABLE_OPTIMIZER_ROWGOAL to remove the row goal effect from the
FORCESEEK to just tell it to use the seek anyway
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂