All we need is an easy explanation of the problem, so here it is.
We recently migrated from LogShipping
standby/read-only setup to Multi Subnet AG setup with readable secondaries.
Generally on old setup we have select queries running for longer duration as the database in question is over 20 TB and has mix of read write workload on primary.
After moving to new setup of AG we have started seeing blocking which i am not able to understand. Why select queries on secondary are blocking other select queries in my readable secondary replica instance, even when the database being queried has
Below is what i have captured
Lead blocker is some long running
SELECTquery does not show any
specific waittype as particular, lets say SPID
SPID 129blocks a session ID
45( i am sure this is not a user
id) for almost 6 hours which is dependent on spid129 and wait type is
Here comes the problem when this
SPID 45just blocks all other select
queries now in that 6 hour duration.
I am not able to understand what is happening. Can someone help me troubleshoot or look in correct direction?
How to solve :
I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.
Queries against read-only secondary replicas implicitly run in snapshot isolation regardless of the session isolation level or RCSI setting. This avoids blocking due to DML changes. Read-only queries still acquire schema stability locks, which will block DDL operations by redo threads and visa-versa.
SPID 129 blocks a session ID 45 ( i am sure this is not a user id) for
almost 6 hours which is dependent on spid129 and wait type is
In your case, it seems a redo thread is waiting for a schema modification lock but is blocked by a very long-running query/transaction schema stability lock. Other queries are in turn blocked by the redo thread.
Take a look at the
SELECT query execution plan and duration of the transaction for remediation.
I found the same issue that was coming from an ALTER INDEX REBUILD activity.
I solved moving this activity in another time window.
Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂