All we need is an easy explanation of the problem, so here it is.
I’m just looking to understand why this is happening, and my Google searches were failing me. We are on SQL Server 2016 SP1.
This is the situation: Vendor table that manages IDs by keeping track of current values for each table. A function can be called to return a block of IDs if you are doing an insert.
So we set up a temp table by selecting from the real table using
select into (we are cloning a set of data to be reinserted with a different property set).
Then we call the function and get new ids for the number of records (it just returns the max ID, so we do some math to get the next id).
Then we update the table as such:
update #temp set @nextId = Id = @nextId + 1
with the expectation that it will increment by one for each record and set the ids.
Instead, the same ID was set for every 4 records, then it would increment and the next 4 get the next id, etc. Why every 4 records? What went wrong?
Even more fun, if we put a clustered index on the table, everything works correctly.
I’m sure it has to do with the table being a heap…but not sure why.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
The documentation for the
UPDATE statement says (emphasis added):
Variable names can be used in UPDATE statements to show the old and new values affected, but this should be used only when the UPDATE statement affects a single record.
Some people have tried to make this "quirky update" technique work reliably for multiple rows by applying ever-greater constraints on its use. The fact remains that this relies on observed effects and undocumented behaviour, so you should not expect it to work in general, or to keep ‘working’ in future.
I can’t say exactly what went ‘wrong’ in your case, without a repro script or an execution plan, but ultimately it doesn’t really matter. If forced to guess, I would say your update ran at DOP 4 and four threads read the same variable value concurrently.
As mentioned by @PaulWhite in his answer, the "quirky update" method is undocumented and can be unreliable. It requires careful coding, and checking the execution plan to ensure everything works correctly.
And you can quite clearly see, from the execution plan you have now provided, that he was right: there are four threads:
This method can be exceedingly fiddly to get working correctly. Slight changes in execution plans can throw a complete wrench in it.
Some of the many undocumented rules mentioned by @JeffModen in a well-researched article
MAXDOP 1hint to turn off parallelism
- A clustered index
- Must use an anchor column
- No joins
This is why many advise against it, and given that so many methods to achieve the same thing are now available, it does seem pointless.
In your particular situation, an
IDENTITY column would probably be the simplest. But here is the
ROW_NUMBER method that also ensures correct results. You can put it into a CTE, and update the CTE directly.
I note that your IDs are not ordered, so I have used
ORDER BY (SELECT 1), you can use a different ordering if you want.
WITH cte AS ( SELECT *, rn = ROW_NUMBER() OVER (ORDER BY (SELECT 1)) FROM #temp t ) UPDATE cte SET Id = rn;
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂