Home » Network Observability: OpenTelemetry, Prometheus, Grafana, eBPF และ Streaming Telemetry
Network Observability: OpenTelemetry, Prometheus, Grafana, eBPF และ Streaming Telemetry
Network Observability: OpenTelemetry, Prometheus, Grafana, eBPF และ Streaming Telemetry
Network Observability ก้าวข้าม traditional monitoring ด้วยการรวม metrics, logs, traces เข้าด้วยกัน OpenTelemetry เป็น open standard สำหรับ collecting telemetry data, Prometheus เป็น time-series database สำหรับ metrics, Grafana เป็น visualization platform สำหรับ dashboards, eBPF เป็น Linux kernel technology สำหรับ deep packet/flow observability และ Streaming Telemetry แทนที่ SNMP polling ด้วย push-based real-time data
Traditional network monitoring ใช้ SNMP polling ทุก 5 นาที ซึ่งไม่เพียงพอสำหรับ modern networks: microbursts ที่เกิดในมิลลิวินาทีจะไม่ถูกจับ, SNMP มี overhead สูงเมื่อ poll หลายพัน devices, ไม่มี context (metrics อย่างเดียวไม่บอกว่าทำไม) Network observability แก้ด้วย streaming telemetry (real-time push), eBPF (kernel-level visibility), และ correlation ระหว่าง metrics/logs/traces
Monitoring vs Observability
| Feature |
Traditional Monitoring |
Network Observability |
| Data Collection |
SNMP polling (pull, 5-min intervals) |
Streaming telemetry (push, sub-second) |
| Data Types |
Metrics only (CPU, bandwidth, errors) |
Metrics + Logs + Traces + Flows |
| Alerting |
Threshold-based (static) |
Anomaly detection + correlation (dynamic) |
| Visibility |
Device-level (interface up/down, CPU) |
Application-level (flow, latency, path) |
| Troubleshooting |
Manual (check device by device) |
Correlated (trace request across network) |
| Scale |
Limited (SNMP overhead) |
Scalable (gRPC streaming, distributed) |
Three Pillars of Observability
| Pillar |
คืออะไร |
Network Example |
| Metrics |
Numeric measurements over time (counters, gauges, histograms) |
Interface utilization, packet drops, latency, CPU/memory |
| Logs |
Timestamped event records |
Syslog, config changes, BGP state changes, firewall logs |
| Traces |
End-to-end request path tracking |
Packet path through network (traceroute++), flow records |
Streaming Telemetry
| Feature |
SNMP |
Streaming Telemetry |
| Model |
Pull (manager polls agent) |
Push (device streams to collector) |
| Transport |
UDP (unreliable) |
gRPC/TCP (reliable, efficient) |
| Data Model |
MIBs (vendor-specific, inconsistent) |
YANG models (structured, vendor-neutral) |
| Encoding |
ASN.1/BER (complex) |
Protocol Buffers / JSON (efficient, modern) |
| Frequency |
Minutes (5-min typical) |
Seconds or sub-second (real-time) |
| Overhead |
High at scale (N devices × M OIDs) |
Lower (device pushes only changed data) |
| Subscription |
N/A (poll-based) |
Dial-in (collector requests) / Dial-out (device initiates) |
OpenTelemetry (OTel)
| Feature |
รายละเอียด |
| คืออะไร |
Open standard + SDK + Collector สำหรับ collecting metrics, logs, traces |
| OTel Collector |
Agent ที่ receive, process, export telemetry data (vendor-neutral pipeline) |
| Receivers |
รับ data จากหลาย sources (OTLP, Prometheus, SNMP, syslog, NetFlow) |
| Processors |
Transform, filter, batch, enrich data ก่อน export |
| Exporters |
ส่ง data ไปหลาย backends (Prometheus, Jaeger, Elasticsearch, Datadog) |
| Network Use |
Collect SNMP + streaming telemetry + syslog → unified pipeline → Prometheus/Grafana |
Prometheus
| Feature |
รายละเอียด |
| คืออะไร |
Open-source time-series database + monitoring system (CNCF graduated) |
| Data Model |
Metric name + labels (key-value pairs) + timestamp + value |
| PromQL |
Query language สำหรับ aggregate, filter, calculate metrics |
| Pull Model |
Prometheus scrapes /metrics endpoints (HTTP pull) |
| Pushgateway |
สำหรับ short-lived jobs ที่ push metrics (network devices push via exporter) |
| SNMP Exporter |
Translate SNMP data → Prometheus metrics (bridge legacy SNMP devices) |
| Alertmanager |
Handle alerts from Prometheus rules → route to Slack, PagerDuty, email |
Grafana
| Feature |
รายละเอียด |
| คืออะไร |
Open-source visualization + dashboarding platform |
| Data Sources |
Prometheus, InfluxDB, Elasticsearch, Loki, MySQL, CloudWatch, etc. |
| Dashboards |
Customizable dashboards ด้วย panels (graphs, tables, heatmaps, gauges) |
| Alerting |
Alert rules ใน Grafana → notify via Slack, Teams, PagerDuty, webhook |
| Loki |
Log aggregation system (เหมือน Prometheus แต่สำหรับ logs) |
| Tempo |
Distributed tracing backend (trace storage + query) |
| Network Plugin |
Network topology visualization, flow analysis dashboards |
eBPF for Network
| Feature |
รายละเอียด |
| คืออะไร |
Extended Berkeley Packet Filter — run programs ใน Linux kernel (safe, fast) |
| Network Visibility |
Capture + analyze packets at kernel level (faster than tcpdump, no overhead) |
| Flow Tracking |
Track TCP/UDP flows, connection states, retransmissions, latency per-flow |
| DNS Monitoring |
Capture DNS queries/responses at kernel → detect anomalies |
| Cilium Hubble |
eBPF-based network observability for Kubernetes (flow visualization + service map) |
| Pixie |
eBPF-based auto-instrumentation (capture HTTP, SQL, gRPC without code changes) |
| Advantage |
No agent overhead, kernel-level accuracy, no sampling (see every packet/flow) |
Observability Stack
| Component |
Tool |
Role |
| Collection |
OTel Collector + SNMP Exporter + syslog |
Gather metrics, logs, flows from network devices |
| Metrics Storage |
Prometheus / Mimir / InfluxDB |
Store time-series metrics |
| Log Storage |
Loki / Elasticsearch |
Store and index logs |
| Trace Storage |
Tempo / Jaeger |
Store distributed traces |
| Visualization |
Grafana |
Dashboards, alerts, correlation |
| Alerting |
Alertmanager / Grafana Alerting |
Route alerts to teams |
ทิ้งท้าย: Network Observability = See Everything, Understand Everything
Network Observability Three Pillars: metrics (numbers) + logs (events) + traces (paths) → correlated Streaming Telemetry: push-based, gRPC, YANG models, sub-second granularity (replaces SNMP polling) OpenTelemetry: vendor-neutral collection pipeline (receivers → processors → exporters) Prometheus: time-series DB + PromQL + Alertmanager (metrics storage + querying) Grafana: visualization + dashboards + Loki (logs) + Tempo (traces) eBPF: kernel-level network visibility (flow tracking, DNS, latency — no overhead) Stack: OTel Collector → Prometheus + Loki + Tempo → Grafana (unified observability)
อ่านเพิ่มเติมเกี่ยวกับ Network Packet Analysis Wireshark tcpdump และ Network Automation Ansible Terraform ที่ siamlancard.com หรือจาก icafeforex.com และ siam2r.com