How to prevent primary node's SQL jobs from running when the primary node goes into resolving state?

All we need is an easy explanation of the problem, so here it is.

I have 2 nodes in an AG. Non-readable secondary, sync commit mode.

There are sql agent jobs setup on both machines. The schedules are turned off on secondary so those jobs wont run. If there is failover then we manually enable the schedules for jobs on secondary.

The schedules run various jobs almost every minute.

When the primary goes down (technically because of network issue), but the machine is running, how do I immediately prevent the scheduled sql agent jobs on primary from running?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Wrap T-SQL of your jobs into this code:

if  ((select primary_replica from sys.dm_hadr_availability_group_states) = @@SERVERNAME
    and (select primary_recovery_health_desc from sys.dm_hadr_availability_group_states)  = 'ONLINE'
    and (select synchronization_health_desc from sys.dm_hadr_availability_group_states)  = 'HEALTHY')
begin
    <T-SQL here>
end

This way T-SQL will only run if SQL Server is in primary role, online and synchronization is healthy

See the https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-hadr-availability-group-states-transact-sql?view=sql-server-ver16

Method 2

You can use the built in system function inside the job step to check if the replica is primary.

sys.fn_hadr_is_primary_replica(‘<dn_name>’)

If 1 is returned, do something, else exit the job.

IF COALESCE(sys.fn_hadr_is_primary_replica(‘<dn_name>’),1) = 1
BEGIN
    Do something…
END

COALESCE accounts for the function returning NULL, which will be the case if the database passed to the function is not part of an availability group.

Method 3

We keep our job definitions same across all the nodes in the AG (Primary and Secondary’s) including their schedules. But the only thing we add to our job’s is one step Test write status before actual job step.

The Test write status does this:

IF (DATABASEPROPERTYEX('ag_database', 'Updateability') <> 'READ_WRITE')
BEGIN
    PRINT 'EXITING GRACEFULLY';
    THROW 51000, 'This is not a writeable replica', 1;
END

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply