Always On: Supervision of Availability Groups
Key counters and metrics to monitor for SQL Server Always On Availability Groups.
SQL Server Always On Availability Groups (AGs) provide high availability and disaster recovery for critical databases. Proactive monitoring is essential to ensure the stability and performance of these environments.
Availability of Replicas
| Counter | Object | Description |
|---|
| Database Replica State | Availability Replica (SQLServer:Availability Replica) | Replica status (1=Healthy, 2=Syncing, etc.). |
| Synchronization Health | Database Replica (SQLServer:Database Replica) | Database synchronization status (0=Not synchronized, 1=Synchronized, 2=In progress). |
Sync Latency
| Counter | Object | Description |
|---|
| Log Send Queue (KB) | Database Replica (SQLServer:Database Replica) | Size of the log sending queue (in KB). A high value indicates a sync delay. |
| Redo Queue (KB) | Database Replica (SQLServer:Database Replica) | Size of the log replay queue (in KB). A high value may indicate a bottleneck on the secondary replica. |
| Estimated Recovery Time (ms) | Database Replica (SQLServer:Database Replica) | Estimated time for recovery in the event of a failover. |
| Counter | Object | Description |
|---|
| Bytes Received/sec | Availability Replica (SQLServer:Availability Replica) | Data rate received by the secondary replica. |
| Bytes Sent/sec | Availability Replica (SQLServer:Availability Replica) | Data rate sent by the primary replica. |
| Flow Control Time (ms) | Database Replica (SQLServer:Database Replica) | Time spent in flow control (in ms). A high value may indicate network or performance issues. |
Use of Resources
| Counter | Object | Description |
|---|
| CPU Usage (%) | Processor (SQLServer:Processor) | CPU usage by SQL Server. |
| Memory Grants Pending | SQLServer:Memory Manager | Number of pending memory requests. |
| Disk Read/Write Latency (ms) | PhysicalDisk (SQLServer:PhysicalDisk) | Latency of reads/writes to disk. |
Connections and Transactions
| Counter | Object | Description |
|---|
| Transactions/sec | Databases (SQLServer:Databases) | Number of transactions per second. |
| Active Transactions | Databases (SQLServer:Databases) | Number of active transactions. |
| Log File(s) Used Size (KB) | Databases (SQLServer:Databases) | Used size of log files. |
Recommended Alerts
- Critical threshold for Log Send Queue: > 100 MB for more than 5 minutes.
- Critical threshold for Redo Queue: > 50 MB for more than 5 minutes.
- Sync latency: > 30 seconds.
- Estimated recovery time: > 1 minute.
References
[Need consulting for SQL Server ? Contact me]