OrderBy int slow for large table join

All we need is an easy explanation of the problem, so here it is.

I have roughly 1.5 million records. This query works fine for smaller data, but it gets a second slower for every few hundred thousand records. For 1.5 million, the query is noticeably struggling.

I mocked up a whole new Code First project to verify the problem is independent of my implementation, and to provide this example query:

    SELECT TOP 1 Widgets.*
    FROM [WidgetSandbox].[dbo].[Widgets] Widgets
    INNER JOIN [WidgetSandbox].[dbo].[Status] Statuses ON Widgets.StatusId = Statuses.Id
    INNER JOIN [WidgetSandbox].[dbo].[Colors] Colors ON Widgets.ColorCode = Colors.ColorCode
    INNER JOIN [WidgetSandbox].[dbo].[Sizes] Sizes ON Widgets.SizeId = Sizes.Id
    WHERE Statuses.Name = 'Available'
    AND Colors.Name = 'Red'
    ORDER BY Sizes.DiameterInches

DiameterInches is an int, as a metaphor for “PriorityLevel” in my actual code.

If I comment out ORDER BY Sizes.DiameterInches, it returns immediately, but if I want to find “the smallest available red widget”, then it crawls.

Is there a better way?

enter image description here

Execution plan XML: https://gist.github.com/RobertBaldini/57c8b61d135cc5c84c38b2da243611ad

DDL: https://gist.github.com/RobertBaldini/3740c7bb85eea47d7fe63cb8602ac2d6

Repo (data loader takes several minutes): https://github.com/RobertBaldini/WidgetSandbox

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

You should retype the various Name columns from nvarchar(max) to nvarchar([right size]). It’s unlikely that names will be up to 2GB in length, and making them max sized prevents them being used as a key in an index. A good general rule of thumb is to avoid large object data types wherever possible.

You’ll probably need to make that change to the EF code, but in T-SQL:

-- Guessing at 100 characters maximum
ALTER TABLE dbo.[Status]
ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER TABLE dbo.Colors
ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER COLUMN Name nvarchar(100) NOT NULL;

ALTER TABLE dbo.Widgets
ALTER COLUMN Name nvarchar(100) NOT NULL;

I have also changed the column definition to NOT NULL there.

Anyway, given indexes:

ON dbo.Sizes (DiameterInches);

CREATE NONCLUSTERED INDEX IX_dbo_Widgets__SizeId_ColorCode_StatusId__Name
ON dbo.Widgets (SizeId, ColorCode, StatusId)

…you should end up with a plan that avoids lookups and sorts, something like:

Expected plan shape

With the small number of rows in Status and Colors, it is probably not worth indexing their Name columns right now, but that could change over time. In any case, if the name columns should be unique, you should constrain them to be so using a unique constraint or unique nonclustered index, for example:

ON dbo.Colors (Name);

ON dbo.[Status] (Name);

ON dbo.Sizes (Name);

ON dbo.Widgets (Name);

Second approach

With the same indexes, if you are able to change the SQL, you could also separate out the Color and Status lookups:

    @StatusId integer,
    @ColorCode nvarchar(6);

SELECT @StatusId = ST.Id 
FROM dbo.[Status] AS ST 
WHERE ST.Name = N'Available';

SELECT @ColorCode = C.ColorCode 
FROM dbo.Colors AS C 
WHERE C.Name = N'Red';

FROM dbo.Widgets AS W
JOIN dbo.Sizes AS S
    ON W.SizeId = S.Id
    W.StatusId = @StatusId
    AND W.ColorCode = @ColorCode

This makes for a simpler set of operations that also happen to make life easier for the optimizer:

Plan with lookups separate

Note also that Unicode string literals should be prefixed with N for correct data typing.

Oh, and you should also patch your instance. It is currently SQL Server 2008 Service Pack 1 – Service Pack 4 has been available for some time now.

Method 2

The problem is that TopN Sort in the top left. It has to pull in all possible rows to then find the smallest one.

If you have an index on DiameterInches, it’s likely to start searching on the smallest ones first, and give you vastly improved performance.

Of course, then you’ll need an index on Widgets.SizeId, so that it can find those easily. And it’ll need to check each one to see if it’s Red and Available. You might find it’s a little over-normalised. If “Available” means StatusId = ‘A’, then use that, for example.

A suitable index might be:

CREATE INDEX ixDiameterInches 
ON dbo.Sizes (DiameterInches, Id)

But also consider having:

CREATE INDEX ixSizeStatusColor 
ON dbo.Widgets (SizeId, StatusId, ColorCode);

…so that you can quickly find those Available Red small things. You may find that (StatusId, ColorCode, SizeId) works better if you don’t have many Available Red things.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply