All we need is an easy explanation of the problem, so here it is.
I have a table containing 10 years worth of ‘package scans’. Somebody scans a package and it records the date and username. Let’s pretend for now that retaining 10 years of data actually has a purpose.
I have a page to show a summary for the past week, so clearly I only want to read 1 week’s worth of data.
Here’s the query, to be run in SSMS twice, once with a hardcoded recent date and again with an old date in 2013. It’s originally a parameterized query, but in SSMS I’m replacing
@p0 with the date:
SELECT [t0].[VerifyDate], [t0].[PackageId], [t0].[Username] FROM [dbo].[PackageVerification] AS [t0] INNER JOIN [dbo].[Package] AS [t1] ON [t1].[PackageId] = [t0].[PackageId] WHERE ([t1].[PackageStatus] <> 99) AND ([t0].[VerifyDate] > @p0) ORDER BY [t0].[VerifyDate] DESC
Before I execute it, I’d like to introduce my date index.
Now my date index is not on my
PackageVerification table, but instead is on a ‘helper view’ which performs the same join seen above. The query above is able to magically use this indexed view because I have SCHEMABINDING enabled.
CREATE NONCLUSTERED INDEX [IX_Helper_PackageVerification_USER_SCAN_HISTORY] ON [dbo].[Helper_PackageVerification] ( [VerifyDate] DESC, [PackageStatus] ASC ) INCLUDE ( [VerifyDateDate], [Username] )
When I run the query in SSMS with an old and new date it uses scan or seek as expected. The threshold seems to be somewhere around 2015. So anything remotely recent should definitely be using a seek. Here’s the results of that:
When I run it as a parameterized query from my application I always get a full scan, which for some reason uses a parallelized plan.
At least it’s using my helper index.
I’m actually not sure why I don’t get parameter sniffing for this. I always pass a very recent date so I would have thought it may have preferred a scan but I’m fine with it choosing the above plan given the circumstances. There’s a million+ rows and it takes about 150ms.
Incidentally this is a SQL Azure database with 2vCores. Parameter sniffing is enabled and parameterization is set to simple.
If I change the query and run my application using
OPTION (RECOMPILE) I do get the desired SEEK and a very good performance of just a few ms. The recompile time seems to be negligible and frankly this is perfectly fine performance I can use.
When I look in query store I can verify OPTION RECOMPILE uses the seek for a recent date, and scan for an old date! Awesome.
However, and I’ve never tried this before – I thought how about improving it even further with
OPTION (OPTIMIZE FOR @p0 = '4/1/2021').
I was expecting this to also use the seek, but without the need for recompilation every time. I’d just periodically change the date passed to OPTIMIZE FOR – maybe to the beginning of the previous month.
However, this is the query in the query store.
And it goes and does a full scan of all 1+ million rows when setting the date parameter to 4/7/21!
So now I’m lost. I’ve tried to read about everything I can on the subject but haven’t come across this issue. RECOMPILE works, but OPTIMIZE FOR doesn’t seem to do anything when I’m expecting it to effectively simulate running the query in SSMS with hardcoded values.
This first plan is the only unexpected plan – it’s a scan and I want a seek.
OPTIMIZE FOR @p1 = ‘2021/4/1’ – https://www.brentozar.com/pastetheplan/?id=H1JB43AUu
OPTIMIZE FOR BOTH PARAMS – https://www.brentozar.com/pastetheplan/?id=rkV9U3AUu
OPTION RECOMPILE – https://www.brentozar.com/pastetheplan/?id=SJ5cS3CUd
These are to prove that the optimizer knows that recent dates should be a seek!
HARDCODED 2013 – SCAN – https://www.brentozar.com/pastetheplan/?id=BkeA42RLu
HARDCODED 2015 – SEEK – https://www.brentozar.com/pastetheplan/?id=S1c8r3R8O
I’m starting to wonder if this version doesn’t support OPTIMIZE FOR, even though I can’t find anything saying it wouldn’t
Edit: (After Paul’s answer)
I tried a few additional things. First here is the VIEW definition I didn’t include before. This does a JOIN and since it uses SCHEMABINDING the optimizer is able to substitute for it:
CREATE VIEW [dbo].[Helper_PackageVerification] WITH SCHEMABINDING AS
— Package Verification columns
[t0].Verfied, — spelling mistake from long ago!
[t0].VerifyDate, — this is non nullable in [t0] btw
— Package columns
FROM [dbo].[PackageVerification] AS [t0] INNER JOIN [dbo].[Package] AS [t1] ON [t1].[PackageId] = [t0].[PackageId]
WHERE (Verfied = 1 AND VerifyDate IS NOT NULL AND PackageStatus <> 99)
The CLUSTERED index is on
PackageVerificationId and the main NON CLUSTERED index is shown above. I actually created half a dozen convering indexes to see which it would pick.
PackageStatus <> 99. It was originally a parameter.
I tried adding NOT NULL to the filter on the view to see what would happen. That did actually give me a SEEK, but a useless one since the SEEK predicate was actually on
VerifyDate IS NOT NULL.
You can’t add a filtered index to an indexed view, so even though the view filters out NOT NULL dates it probably can’t be matched against. So that could be the ultimate reason I was unable to get my date to be used for the SEEK predicate?
- I didn’t in this case try using the helper index directly in the query, but I’d pretty much expect that to work with NOEXPAND as I’m doing that elsewhere.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
OPTIMIZE FOR is not the same as
OPTION (RECOMPILE). The former uses supplied parameter values in cardinality estimation for a plan that might be reused with other parameter values. The recompile option embeds the runtime parameter value, and produces a disposable plan that will never be reused.
OPTIMIZE FOR plan therefore needs to ensure correct operation for all possible values. The recompile plan can use additional optimizations that are only valid for the present value. It can also use optimizations that only work with literal values e.g. pushing a filter past a window function.
This matters in your case because when the
OPTIMIZE FOR plan matches the indexed view, it adds additional
IS NOT NULL residual predicates on VerifyDate and PackageStatus:
The recompile plan can remove this logic because the supplied values are known to be not null. The presence of these extra implied predicates are enough to prevent index matching for a seek. It’s usually best to ensure source columns are constrained to be not null, or explicitly rejected in the indexed view definition to minimize this sort of thing.
Now, the optimizer has a wide variety of plan choices for your queries. One indication of that is the number of statistics objects loaded – 17. Small differences in the path taken through the optimizer can produce different outcomes.
Automatic indexed view matching is a neat feature, technically, but it does have limitations. SQL Server needs to add some things and apply particular rewrites to achieve matching, which can have unexpected side-effects (note the reversed @p1 predicate above). The post-match plan is also not always completely cleaned up to match what a query written against the view would produce. These are not bugs, just implementation details.
I normally advise people to write queries against the view directly and specify a
NOEXPAND hint, where this is practical. You might well find that testing your queries written that way would produce the outcomes you are looking for.
Related articles I have written:
- Parameter Sniffing, Embedding, and the RECOMPILE Options
- Indexed Views and Statistics
- Another Reason to Use NOEXPAND hints in Enterprise Edition
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂