All we need is an easy explanation of the problem, so here it is.
I’ve got a SQL table that needs to be updated daily.
There may or may not be queries against that table while the update is happening. It’s around 500,000 rows.
We have an issue where there is locking conflict when the job to update the table is running at the same time as a query against it.
So I have rewritten the process to update the table as follows:
ALTER procedure [dbo].[Table_Generate] as
declare @d datetime = getdate(), @c as int
--Check temp tables
IF OBJECT_ID('tempdb..#final') IS NOT NULL
DROP TABLE #final
IF OBJECT_ID('tempdb..#base') IS NOT NULL
DROP TABLE #base
--Get source data from linked server
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
into #base
from [LinkedServer].[Database].dbo.[A_View]
--Generate row_hash
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
hashbytes('SHA2_256',(
select
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode
from #base sub where sub.ID = main.ID for xml raw)) as row_hash
into #final
from #base main
select @c = count(*) from #final
if @c >0 begin
merge [The_Table_Staging] as target
using #final as source
on source.ID = target.ID
--New rows
when not matched by target then
insert ( RunDate,
ID,
Reference,
StartDate,
EndDate,
Description,
SomeCode,
Row_Hash
) values (
@d,
source.ID,
source.Reference,
source.StartDate,
source.EndDate,
source.Description,
source.SomeCode,
source.row_hash)
--Existing changed rows
when matched and source.row_hash != target.row_hash then update set
target.RunDate = @d
,target.Reference = source.Reference
,target.StartDate = source.StartDate
,target.EndDate = source.EndDate
,target.Description = source.Description
,target.SomeCode = source.SomeCode
,target.row_hash = source.row_hash
--Deleted rows
when not matched by source then delete;
--Existing unchanged rows
update [The_Table_Staging] set RunDate = @d where RunDate != @d
--Commit changes
begin transaction
exec sp_rename 'The_Table_Live', 'The_Table_Staging_T'
exec sp_rename 'The_Table_Staging', 'The_Table_Live'
exec sp_rename 'The_Table_Staging_T', 'The_Table_Staging'
commit transaction
end
The idea is to reduce unnecessary row updates, and also to minimise the locking of the live table.
I don’t really like doing a table rename, but doing an update/insert takes 5-10 seconds, whereas the table rename is virtually instantaneous.
So my questions is: is this approach OK, and/or could I improve it?
Thanks!
Edit to respond to JD:
Hi JD
Please no need to apologise – I’m here for constructive criticism.
- I didn’t know
MERGE
was problematic. I’ve never had issues with it myself, but thanks - I may rewrite this part into separate
INSERT/UPDATE/DELETE
statements - I normally agree. The reason I did this is that if I did a
TRUNCATE/INSERT
from staging at that point, it takes 6-10 seconds whereas thesp_rename
takes under a second. So less time locking the table - This doesn’t affect the table locking, as it takes the data into the staging table first. I have no choice but to use linked server or SSIS, and in this case I prefer linked server to keep all the SQL in one place
- I always use
XML
instead ofCONCAT
because otherwise ‘a’, ‘bc’ would hash the same as ‘ab’, ‘c’ which is not correct
All of the processing up to the populating of the live table from staging is fine – I just want to minimise the amount of time the final live table is locked for,
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Method 1
Unfortunately I see a lot of sins in your code at a quick glance:
-
There’s a multitude of known bugs with the
MERGE
statement that it really should be avoided altogether in production code. -
MERGE
is also known to be less performant than writing the individualINSERT
,UPDATE
, andDELETE
statements; which may be the reasoning for some of your blocking issues. -
While tempting, using the
sp_rename
function to minimize blocking issues can actually result in worse blocking issues, as discussed in Kendra Little’s Why You Should Switch in Staging Tables Instead of Renaming Them. (If I recall correctly, this discusses using partition switching as a better solution.) -
Linked Servers are also known to be performance bottlenecks at times too (for fixed cardinality estimates and bringing all the data across the network before processing it). If you were previously relying on it in the transaction of your code, it’s possible you can just keep that part outside of the transaction to minimize the lock time of your local tables.
-
Also, I do like the use of the
HASHBYTES()
function to generate a row hash, that’s something I’ve used in the past. But you may also find it more performant to call it on the row itself by usingCONCAT()
across every column with a safe column separator (e.g. double pipe||
) as the parameter to it as opposed to using it around a subquery that leverages XML. -
That being said, as noted by Erik Darling,
HASHBYTES()
itself is sometimes susceptible to performance issues at scale. If it is your bottleneck here, you can try materializing the results of the function in a computed column or indexed view (it is deterministic), or use an alternative method, such as CLR, as mentioned in the linked post.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0