All we need is an easy explanation of the problem, so here it is.
I have an "Basic" pricing tier Azure SQL database with a table of 7 columns, an ID column of int that is the clustered index and primary key, one datetime2(0) column, 3 varchar(100) columns and 2 varchar(MAX) columns, all nullable.
The table has no triggers, constraints or foreign keys.
Now I’m inserting a large amount of test data, I’m doing an
INSERT INTO table_name (<all columns, except the ID one>) values (<just some values, the ones for varchar(MAX) being 221 characters long>)` GO 680000
However the query has been running for 5 hours and only 290000 rows have been inserted.
I’m trying to find out why.
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
You’ll need to look at the waits for the session doing the insert to figure out what the bottleneck is. Given that you are on a "Basic" tier, your insert is probably being artificially throttled based on the service tier.
If you run a query like this…
SELECT * FROM sys.dm_exec_session_wait_stats WHERE session_id = <session doing the insert> ORDER BY wait_time_ms DESC
…I suspect you’ll see that the top wait is probably something like
HADR_THROTTLE_LOG_RATE_GOVERNOR. These wait types are caused specifically due to the artificial limits put on the rate at which you can write to the transaction log in Azure SQL DB, and is a common bottleneck on large inserts when using the Basic Tier. The Basic Tier is extremely limited on available system resources. Note: It’s possible to hit the limit on log rate without hitting the service tier’s DTU limit.
One solution is to simply use a higher service tier, which will allow you to have more DTUs (and thus more overall system resources) to use for your large insert. After your load is complete, you can then switch back to a lower service tier. I’ve written more about DTUs, and attempted to correlate DTUs back to traditional on-prem hardware you may be more familiar with –you can read that here.
There may be more options for improving throughput on a lower service tier, but to do so, you’ll need to look in detail at exactly what you’re doing, and what your resource bottleneck is.
Single row inserts (especially followed by an implicit commit) are going to generate a LOT more transaction log data than bulk inserts.
Using a transaction log backup as a rough and ready example of how much transaction log data gets written:
CREATE TABLE new_employees ( id_num int IDENTITY(1,1), fname varchar (20), minit char(1), lname varchar(30), lob_col varchar(max) ); set nocount on BACKUP LOG [demo_db] TO DISK = N'V:\SQL\Backups\demo_db_log_clear.bak' WITH NOFORMAT, NOINIT, NAME = N'demo_db-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO select getdate() go insert into new_employees ( fname ,minit ,lname ,lob_col ) values ('Andrew' ,'J' ,'Sayer' ,replicate('X',221) ); go 100000 select getdate() BACKUP LOG [demo_db] TO DISK = N'V:\SQL\Backups\demo_db_log_single.bak' WITH NOFORMAT, NOINIT, NAME = N'demo_db-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO select getdate() go insert into new_employees ( fname ,minit ,lname ,lob_col ) select top 100000 'Andrew' ,'J' ,'Sayer' ,replicate('X',221) FROM sys.all_columns ac cross join sys.all_columns ac2 go select getdate() go BACKUP LOG [demo_db] TO DISK = N'V:\SQL\Backups\demo_db_log_bulk.bak' WITH NOFORMAT, NOINIT, NAME = N'demo_db-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, STATS = 10 GO
(I’m using 100,000 rows as I got impatient waiting for the single value inserts to finish with your counts).
The results on my home machine:
Single row insert Time taken: 2021-08-16 20:49:41.510 to 2021-08-16 20:50:04.477 = 23 seconds Transaction log backup size: 50010 pages Bulk row insert Time taken: 2021-08-16 20:50:04.787 to 2021-08-16 20:50:05.177 = 0.4 seconds Transaction log backup size: 4601 pages
So it’s roughly 50 times faster and generates a tenth of the transaction log data.
Only thing to make sure is that the row generation source can generate enough rows, I just cross joined
sys.all_columns with itself which produces plenty in my pretty empty database.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂