Linked server connections to Multi-subnet failover cluster

All we need is an easy explanation of the problem, so here it is.

We have recently migrated our SQL Server distribution database to a multi subnet failover cluster, and have found that occasionally when adding articles/viewing replication status from the publisher we get errors like:

Cannot initialize the data source object of OLE DB provider
“SQLNCLI11” for linked server “repl_distributor”. (.Net SqlClient Data
Provider)

This appears to be due to the linked server connection talking to a multi subnet failover cluster (as described here)

Other than following the solution of implementing an ODBC connection on each node of our publisher (also a multi subnet failover cluster) is there anything else we can try to fix linked servers connecting to a multi subnet FCI?

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

The solution you linked to overcomes the problem with connecting to multi-subnet failover groups by using the MultiSubnetFailover=True connection string setting.

The way that setting deals with the problem is attempting many connections in parallel, rather than one after the other (the default behavior). This increases the chances that a connection will be made successfully before reaching the connection timeout limit.

If you don’t want to use that option (you mentioned not wanting the overhead of configuring ODBC connections on all your cluster servers), another solution would be to increase the (linked) server-level connection timeout setting to a value that’s high enough to deal with the serial connection attempts.

The default connection timeout on linked servers is 10 seconds (represented by “0” on the settings screen). You can increase the timeout to 20 seconds by doing the following:

EXEC master.dbo.sp_serveroption 
    @server=N'YourLinkedServerName', 
    @optname=N'connect timeout', 
    @optvalue=N'20'

If you need to change the connect timeout to something higher because seconds 20 seconds is too low:

EXEC master.dbo.sp_serveroption 
    @server=N'repl_distributor', 
    @optname=N'connect timeout', 
    @optvalue=N'60'

The docs on the default value for “connect timeout” are a little confusing.

sp_serveroption says

Time-out valuein seconds for connecting to a linked server.

If 0, use the sp_configure default.

sp_configure doesn’t support an option called “connect timeout” but has one called “remote login timeout” (see Server Configuration Options). The description from here sounds like the thing we’re looking for:

The remote login timeout option specifies the number of seconds to wait before returning from a failed attempt to log in to a remote server.

The default value for this option is 10 seconds.

In practice, I just created a linked server locally with the default settings, turned off the target instance, and tried to query it. Timed out at 10 seconds =)

Method 2

Adding to what jadarnel27 already said, the newest version of OLEDB drivers supports MultiSubnetFailover=True

https://blogs.msdn.microsoft.com/sqlnativeclient/2018/03/30/released-microsoft-ole-db-driver-for-sql-server/

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply