Home » Network High Availability: HSRP, VRRP, GLBP, Redundancy Design, Failover Testing และ SLA Guarantee
Network High Availability: HSRP, VRRP, GLBP, Redundancy Design, Failover Testing และ SLA Guarantee
Network High Availability: HSRP, VRRP, GLBP, Redundancy Design, Failover Testing และ SLA Guarantee
Network High Availability ออกแบบ network ให้ทำงานต่อเนื่องแม้มี component failure HSRP (Hot Standby Router Protocol) ให้ gateway redundancy แบบ Cisco, VRRP (Virtual Router Redundancy Protocol) เป็นมาตรฐาน multi-vendor, GLBP (Gateway Load Balancing Protocol) กระจาย load ระหว่าง gateways, Redundancy Design วางแผนสำรองทุกจุด, Failover Testing ทดสอบว่า failover ทำงานจริง และ SLA Guarantee รับประกันเวลาทำงาน
Downtime มีราคาแพงมาก: average cost of network downtime = $5,600/minute (Gartner) → 1 ชั่วโมง = $336,000 สำหรับ enterprise ขนาดกลาง High availability ต้อง “design for failure”: ทุก single point of failure (SPOF) ต้องมี redundancy — dual switches, dual links, dual power, dual ISPs, gateway redundancy (HSRP/VRRP) เป้าหมาย: 99.99% uptime = downtime ไม่เกิน 52 นาที/ปี
Availability Levels
| Level |
Uptime % |
Downtime/Year |
Requires |
| 99% (Two 9s) |
99% |
3.65 days |
Basic redundancy — some planned downtime ok |
| 99.9% (Three 9s) |
99.9% |
8.76 hours |
Redundant components, automated failover |
| 99.99% (Four 9s) |
99.99% |
52.6 minutes |
Full redundancy, no single point of failure |
| 99.999% (Five 9s) |
99.999% |
5.26 minutes |
Carrier-grade: geo-redundancy, hitless upgrades |
HSRP (Hot Standby Router Protocol)
| Feature |
รายละเอียด |
| Vendor |
Cisco proprietary |
| How |
Active/Standby: one router forwards traffic, other waits in standby |
| Virtual IP |
Shared virtual IP as default gateway → clients point to VIP, not physical IP |
| Election |
Highest priority (default 100) → highest IP wins Active role |
| Preemption |
Optional: higher priority router takes Active back when it recovers |
| Tracking |
Track uplink interface → if uplink fails, reduce priority → trigger failover |
| Timers |
Hello 3s, Hold 10s → customize: hello 1s, hold 3s for faster failover |
| HSRPv2 |
Supports IPv6, more groups (0-4095 vs 0-255), millisecond timers |
VRRP vs HSRP vs GLBP
| Feature |
HSRP |
VRRP |
GLBP |
| Standard |
Cisco proprietary |
RFC 5798 (multi-vendor) |
Cisco proprietary |
| Model |
Active/Standby |
Master/Backup |
AVG + multiple AVF (load balancing) |
| Load Balancing |
No (one active only) |
No (one master only) |
Yes (multiple forwarders share load) |
| Virtual MAC |
0000.0c07.acXX |
0000.5e00.01XX |
0007.b400.XXYY |
| Preemption |
Disabled by default |
Enabled by default |
Enabled by default |
| Multicast |
224.0.0.2 (v1) / 224.0.0.102 (v2) |
224.0.0.18 |
224.0.0.102 |
| Best For |
Cisco-only environments |
Multi-vendor environments |
Cisco environments needing load balancing |
Redundancy Design Layers
| Layer |
Redundancy Method |
Failover Time |
| Default Gateway |
HSRP/VRRP/GLBP — virtual gateway IP |
3-10 seconds (tunable to < 1s with BFD) |
| Layer 2 Links |
EtherChannel/LAG — multiple links bundled |
Sub-second (link failure in bundle) |
| Layer 3 Routing |
ECMP, fast convergence (BFD + tuned timers) |
50ms-3s (BFD: 50ms, routing: 1-3s) |
| WAN Links |
Dual ISP, SD-WAN multi-link, MPLS + internet backup |
Seconds (SD-WAN) to minutes (BGP reconvergence) |
| Power |
Dual power supplies, UPS, generator |
0 (seamless: dual PSU) to seconds (UPS transfer) |
| Devices |
VSS, StackWise, MC-LAG — dual chassis as one logical |
Sub-second (stateful switchover) |
Failover Testing
| Test |
How |
Verify |
| Gateway Failover |
Shutdown active HSRP/VRRP router → verify standby takes over |
Ping gateway continuously → count dropped pings (should be < 5) |
| Link Failure |
Disconnect uplink cable → verify traffic routes via alternate path |
Traceroute before/after → verify path change, measure failover time |
| Device Failure |
Power off primary switch/router → verify redundant device takes over |
End-to-end application test, verify users don’t notice outage |
| ISP Failover |
Simulate primary ISP failure → verify traffic switches to backup |
External monitoring (ThousandEyes) → verify internet access continuous |
| Power Failure |
Disconnect one PSU → verify device continues on second PSU |
Check SNMP alerts, device health, no service interruption |
| Scheduled Testing |
Monthly or quarterly failover tests during maintenance window |
Document results, fix any failures, update runbooks |
ทิ้งท้าย: High Availability = Design for Failure, Test Regularly
Network High Availability Cost of Downtime: $5,600/minute → 99.99% uptime = max 52 minutes/year Gateway: HSRP (Cisco), VRRP (standard, multi-vendor), GLBP (Cisco, load balancing) Redundancy: gateway (HSRP/VRRP), links (LAG), routing (ECMP+BFD), WAN (dual ISP), power (dual PSU), devices (VSS/stack) HSRP/VRRP: virtual IP gateway, active/standby, priority-based election, interface tracking, tunable timers Testing: monthly failover tests — gateway, link, device, ISP, power → document and fix failures Key: redundancy without testing is just hope — test every failover path regularly to guarantee SLA
อ่านเพิ่มเติมเกี่ยวกับ Network Capacity Planning Bandwidth Estimation Growth Forecasting และ Network Troubleshooting Methodology OSI Layer Wireshark ที่ siamlancard.com หรือจาก icafeforex.com และ siam2r.com