Why putting SQL Agent offline caused WSFC to fail over on passive node?

All we need is an easy explanation of the problem, so here it is.

I have 2 node Windows Failover Cluster with Quorum disk.
SQL agent is NOT a resource of the cluster.

I needed to enable service broker on the server; for that I need to set SQL Agent offline, run tsql statement then simply put it back online.

However, as soon as I stopped SQL Agent using SSMS, the Windows failed over to a passive node.
I thought, because SQL Agent is not listed as a resource in cluster manager, then I need to stop it from active node, do the change, and put it back online.

The questions are :

  1. why stopping service that is not a part of the cluster caused cluster to fail over?

  2. what would be the proper way to stop SQL Agent in my case? For maintenance for example

I simulated same actions on my test cluster and everything worked fine, cluster didn’t fail over. Same cluster structure, but without quorum.

UPDATE:
Right click on cluster name itself I can see SQL Agent under property type.
Does it mean all those resources are in the cluster even though they are not visible under "Roles"?

Why putting SQL Agent offline caused WSFC to fail over on passive node?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

If you look in Failover Cluster Manager, if you select the role for the Failover Cluster Instance (FCI) Role, then select the "Resources" tab at the bottom you’ll see that the role is actually built with both the SQL Server service and the SQL Server Agent service as resources under that role.

Why putting SQL Agent offline caused WSFC to fail over on passive node?

When you stopped the Agent service, the Windows Cluster detected that it stopped "unexpectedly" and it failed over to the other node.

Instead of stopping the service from the service from SSMS or the Services control panel, you’ll want to right click on the "SQL Server Agent" Resource in Failover Cluster Manager and stop the resource there. That will result in the WSFC understanding your intent, and it will not fail over. Instead, it will show the FCI Role as being partially online. To restart SQL Agent, again right click on the resource and bring it online.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply