Home » Network Troubleshooting Methodology: OSI Layer, Packet Capture, Root Cause Analysis
Network Troubleshooting Methodology: OSI Layer, Packet Capture, Root Cause Analysis
Network Troubleshooting Methodology: OSI Layer, Packet Capture, Root Cause Analysis
Network Troubleshooting เป็นทักษะสำคัญที่ network engineer ทุกคนต้องมี การใช้ OSI Layer approach แบ่งปัญหาเป็นชั้นๆ ตั้งแต่ Physical ถึง Application ช่วยจำกัดขอบเขตได้รวดเร็ว Packet Capture ด้วย Wireshark/tcpdump ให้ evidence ที่ชัดเจนว่าเกิดอะไรขึ้นจริงบน wire และ Root Cause Analysis (RCA) ช่วยหาสาเหตุที่แท้จริง ไม่ใช่แค่ symptoms เพื่อป้องกันปัญหาซ้ำ
Network engineers ส่วนใหญ่ troubleshoot แบบ random (shotgun approach): ลองนั่นลองนี่ไปเรื่อยจนแก้ได้ ซึ่งเสียเวลาและอาจทำให้ปัญหาแย่ลง Structured methodology ช่วยแก้ปัญหาเร็วขึ้น 3-5× และลดโอกาสที่จะทำให้ระบบพังเพิ่ม
Troubleshooting Methodologies
| Method |
Approach |
Best For |
| Top-Down (Layer 7→1) |
เริ่มจาก Application → ลงไป Physical |
Application-specific issues (app ใช้ไม่ได้) |
| Bottom-Up (Layer 1→7) |
เริ่มจาก Physical → ขึ้นไป Application |
Connectivity issues (เชื่อมต่อไม่ได้เลย) |
| Divide and Conquer |
เริ่มจาก Layer 3 (Network) → ขึ้นหรือลง |
Most efficient (experienced engineers) |
| Follow the Path |
Trace packet path จาก source → destination |
Intermittent issues, routing problems |
| Compare and Swap |
เปรียบเทียบกับ working system → swap components |
Hardware failures, quick isolation |
OSI Layer Troubleshooting
| Layer |
Check |
Tools |
Common Issues |
| L1 Physical |
Cable, LEDs, power, SFP |
Cable tester, OTDR, show interface |
Bad cable, duplex mismatch, SFP failure, power |
| L2 Data Link |
MAC, VLAN, STP, ARP |
show mac address-table, show spanning-tree |
VLAN mismatch, STP blocking, MAC flapping, ARP issues |
| L3 Network |
IP, subnet, routing, ACL |
ping, traceroute, show ip route, show access-list |
Wrong IP/mask, missing route, ACL blocking, MTU |
| L4 Transport |
TCP/UDP, ports, firewall |
telnet/nc port test, show conn (firewall) |
Port blocked, firewall rule, NAT issue, TCP reset |
| L7 Application |
DNS, HTTP, app config |
nslookup, curl, app logs |
DNS failure, certificate error, app misconfiguration |
Essential Troubleshooting Commands
| Command |
Purpose |
Platform |
| ping / ping -t |
Basic connectivity test (ICMP) |
All |
| traceroute / tracert |
Path discovery (hop-by-hop) |
All |
| show interface |
Interface status, errors, counters |
Cisco/Juniper/Arista |
| show ip route |
Routing table (verify route exists) |
Cisco |
| show mac address-table |
MAC → port mapping (verify L2 forwarding) |
Cisco switches |
| show arp |
ARP table (IP → MAC resolution) |
All routers |
| show spanning-tree |
STP state (root, blocking, forwarding) |
Cisco switches |
| show log |
System logs (errors, warnings) |
All |
| nslookup / dig |
DNS resolution test |
All OS |
| netstat / ss |
Active connections, listening ports |
Linux/Windows |
Packet Capture
| Tool |
Platform |
Use Case |
| Wireshark |
Windows/Mac/Linux (GUI) |
Deep packet analysis, protocol decode, filtering |
| tcpdump |
Linux/Mac (CLI) |
Quick capture on servers, remote devices |
| SPAN/Mirror Port |
Switch |
Copy traffic from port/VLAN → capture port |
| RSPAN/ERSPAN |
Switch (remote) |
Remote SPAN across VLANs/sites |
| Network TAP |
Physical device |
Non-intrusive capture (production traffic) |
| EPC (Embedded Packet Capture) |
Cisco IOS-XE |
Capture on router/switch itself |
Wireshark Filters
| Filter |
Purpose |
| ip.addr == 10.0.0.1 |
Traffic to/from specific IP |
| tcp.port == 443 |
HTTPS traffic |
| tcp.flags.syn == 1 && tcp.flags.ack == 0 |
TCP SYN (new connections only) |
| tcp.flags.reset == 1 |
TCP RST (connection resets — troubleshoot failures) |
| tcp.analysis.retransmission |
TCP retransmissions (packet loss indicator) |
| dns |
DNS queries and responses |
| icmp |
ICMP (ping, unreachable, TTL exceeded) |
| frame.time_delta > 1 |
Packets with > 1 second delay (latency issues) |
Root Cause Analysis (RCA)
| Technique |
How |
| 5 Whys |
ถาม “ทำไม?” 5 ครั้งจนถึงสาเหตุที่แท้จริง |
| Fishbone Diagram |
แยกสาเหตุเป็น categories: People, Process, Technology, Environment |
| Timeline Analysis |
สร้าง timeline ของ events → หา correlation ระหว่าง change กับ problem |
| Change Analysis |
“อะไรเปลี่ยนไป?” — เปรียบเทียบ before vs after |
| Fault Tree |
สร้าง tree ของ possible causes → test + eliminate ทีละ branch |
Common Network Issues and Quick Fixes
| Symptom |
Likely Cause |
Quick Check |
| No connectivity |
Cable, port down, IP/VLAN mismatch |
show interface, check cable, verify VLAN |
| Intermittent drops |
Duplex mismatch, CRC errors, STP issues |
show interface counters, show spanning-tree |
| Slow performance |
Congestion, QoS, MTU, TCP window |
show interface utilization, packet capture |
| Can’t reach remote site |
Routing, ACL, firewall, WAN link down |
traceroute, show ip route, check firewall logs |
| DNS not resolving |
DNS server down, wrong DNS config |
nslookup, check DNS server, verify /etc/resolv.conf |
| One-way audio (VoIP) |
NAT/firewall blocking RTP, codec mismatch |
Check NAT, open RTP ports, verify codec |
Troubleshooting Process
| Step |
Action |
| 1. Define problem |
What exactly is not working? Who is affected? When did it start? |
| 2. Gather information |
Logs, show commands, topology diagram, recent changes |
| 3. Analyze |
OSI layer approach, compare with working state, identify scope |
| 4. Plan solution |
Identify possible causes → plan fix → assess risk of fix |
| 5. Implement |
Apply fix (one change at a time, with rollback plan) |
| 6. Verify |
Test if problem is resolved, monitor for recurrence |
| 7. Document |
RCA report: problem, root cause, fix, prevention |
ทิ้งท้าย: Structured Troubleshooting = Faster Resolution
Network Troubleshooting Methods: top-down, bottom-up, divide-and-conquer (most efficient) OSI approach: L1 physical → L2 switching → L3 routing → L4 transport → L7 application Packet Capture: Wireshark + tcpdump + SPAN port (evidence-based troubleshooting) Key filters: tcp.analysis.retransmission, tcp.flags.reset, dns, frame.time_delta RCA: 5 Whys, change analysis, timeline correlation Process: define → gather → analyze → plan → implement → verify → document
อ่านเพิ่มเติมเกี่ยวกับ Network Observability Telemetry gNMI และ Network Automation Python Netmiko ที่ siamlancard.com หรือจาก icafeforex.com และ siam2r.com