Always On: Supervision of Availability Groups

Key counters and metrics to monitor for SQL Server Always On Availability Groups.

SQL Server Always On Availability Groups (AGs) provide high availability and disaster recovery for critical databases. Proactive monitoring is essential to ensure the stability and performance of these environments.

Key Performance Counters

Availability of Replicas

CounterObjectDescription
Database Replica StateAvailability Replica (SQLServer:Availability Replica)Replica status (1=Healthy, 2=Syncing, etc.).
Synchronization HealthDatabase Replica (SQLServer:Database Replica)Database synchronization status (0=Not synchronized, 1=Synchronized, 2=In progress).

Sync Latency

CounterObjectDescription
Log Send Queue (KB)Database Replica (SQLServer:Database Replica)Size of the log sending queue (in KB). A high value indicates a sync delay.
Redo Queue (KB)Database Replica (SQLServer:Database Replica)Size of the log replay queue (in KB). A high value may indicate a bottleneck on the secondary replica.
Estimated Recovery Time (ms)Database Replica (SQLServer:Database Replica)Estimated time for recovery in the event of a failover.

Replica Performance

CounterObjectDescription
Bytes Received/secAvailability Replica (SQLServer:Availability Replica)Data rate received by the secondary replica.
Bytes Sent/secAvailability Replica (SQLServer:Availability Replica)Data rate sent by the primary replica.
Flow Control Time (ms)Database Replica (SQLServer:Database Replica)Time spent in flow control (in ms). A high value may indicate network or performance issues.

Use of Resources

CounterObjectDescription
CPU Usage (%)Processor (SQLServer:Processor)CPU usage by SQL Server.
Memory Grants PendingSQLServer:Memory ManagerNumber of pending memory requests.
Disk Read/Write Latency (ms)PhysicalDisk (SQLServer:PhysicalDisk)Latency of reads/writes to disk.

Connections and Transactions

CounterObjectDescription
Transactions/secDatabases (SQLServer:Databases)Number of transactions per second.
Active TransactionsDatabases (SQLServer:Databases)Number of active transactions.
Log File(s) Used Size (KB)Databases (SQLServer:Databases)Used size of log files.
  • Critical threshold for Log Send Queue: > 100 MB for more than 5 minutes.
  • Critical threshold for Redo Queue: > 50 MB for more than 5 minutes.
  • Sync latency: > 30 seconds.
  • Estimated recovery time: > 1 minute.

References

[Need consulting for SQL Server ? Contact me]