Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

All we need is an easy explanation of the problem, so here it is.

I am bit confused on checking the metrics for REDO Rate KB/Sec from Always on AG dashboard, which for some scenarios matches with perfmon counter Database replica : Redone Bytes/Sec ( hopefully it’s the correct counter for redo rate via perfmon), and sometimes not at all.

Most of the times when there is lot of activity ongoing, my belief is that AG dashboard and DMV shows REDO rate for e.g. 40 MB/Secs at given time which matches with perfmon counter.

However, during less activity time or nothing much to send over to sec, REDO Rate on Dashboard and dmv seems to be showing incorrect values compared to perfmon counter.

Not able to understand which value is correct and how to analyze. Any idea why or is it bug in dashboard?

Screenshot as requested:

Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

No transactions occurred around that time – no major activity on primary end. I am collecting that perfmon on secondary and primary since after failover new secondary we would still need those counters running. However the data pulled in screenshot perfmon data is from secondary.

How to solve :

I know you bored from this bug, So we are here to help you! Take a deep breath and look at the explanation of your problem. We have many solutions to this problem, But we recommend you to use the first method because it is tested & true method that will 100% work for you.

Method 1

Those two numbers are measuring slightly different things. You’re right that they are both measure redo, but they do it in different ways.

The Perfmon counter is updated in near-real-time – it’s the number of bytes redone in the last second:

Amount of log records redone in the last second to catch up the database replica

The AG dashboard is based on the sys.dm_hadr_database_replica_states DMV, specifically the redo_rate column:

Average Rate at which the log records are being redone on a given secondary database, in kilobytes (KB)/second.

So the AG dashboard is based on an average, but over what period? I suspect it’s "the last active period" based on the phrasing in the log_send_rate description from the same DMV:

Average rate at which primary replica instance sent data during last active period, in kilobytes (KB)/second.

Let’s try and see. I’ll open up a lab AG dashboard, and first thing I notice is that the redo rate is not zero, despite the fact that I haven’t used this thing in a couple weeks:

Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

Perfmon is flat on the secondary, as I’m not doing anything yet:

Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

Now I’ll insert some data into my test database:

INSERT INTO dbo.A
SELECT TOP (1000)
    REPLICATE(N'A', 50)
FROM master.dbo.spt_values;

Now Perfmon on the secondary shows a brief blip:

Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

And if I open up the AG dashboard, I can see the redo rate changed (from 3535 to 3873), but it didn’t drop back down to zero:

Availability Group- Redo Rate displayed on AG Dashboard vs Perfmon counter

So it looks like this DMV (and the dashboard) is only updated when redo is actually happening, and it holds the last value that it calculated.

Note: Use and implement method 1 because this method fully tested our system.
Thank you 🙂

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

Leave a Reply