Blocking on readable secondary replica

All we need is an easy explanation of the problem, so here it is.

We recently migrated from LogShipping standby/read-only setup to Multi Subnet AG setup with readable secondaries.

Generally on old setup we have select queries running for longer duration as the database in question is over 20 TB and has mix of read write workload on primary.

After moving to new setup of AG we have started seeing blocking which i am not able to understand. Why select queries on secondary are blocking other select queries in my readable secondary replica instance, even when the database being queried has RCSI enabled?

Below is what i have captured

  • Lead blocker is some long running SELECT query does not show any
    specific waittype as particular, lets say SPID 129

  • SPID 129 blocks a session ID 45 ( i am sure this is not a user
    id) for almost 6 hours which is dependent on spid129 and wait type is
    LCK_M_SCH_M

  • Here comes the problem when this SPID 45 just blocks all other select
    queries now in that 6 hour duration.

I am not able to understand what is happening. Can someone help me troubleshoot or look in correct direction?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Queries against read-only secondary replicas implicitly run in snapshot isolation regardless of the session isolation level or RCSI setting. This avoids blocking due to DML changes. Read-only queries still acquire schema stability locks, which will block DDL operations by redo threads and visa-versa.

SPID 129 blocks a session ID 45 ( i am sure this is not a user id) for
almost 6 hours which is dependent on spid129 and wait type is
LCK_M_SCH_M

In your case, it seems a redo thread is waiting for a schema modification lock but is blocked by a very long-running query/transaction schema stability lock. Other queries are in turn blocked by the redo thread.

Take a look at the SELECT query execution plan and duration of the transaction for remediation.

Method 2

I found the same issue that was coming from an ALTER INDEX REBUILD activity.
I solved moving this activity in another time window.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply