Observability สำหรับ Kubernetes สอนติดตั้ง Monitoring Stack บน K8s 2026

April 11, 2026

2 Views

SaveSavedRemoved 0

Observability Challenge ใน Kubernetes

Kubernetes เปลี่ยนวิธีการ Deploy application ไปอย่างสิ้นเชิง แต่ก็สร้าง Challenge ใหม่ในการ Monitor ระบบ เพราะ K8s มีลักษณะเฉพาะที่ทำให้ Traditional monitoring tools ไม่เพียงพอ:

Ephemeral Pods: Pod ถูกสร้างและทำลายตลอดเวลา IP เปลี่ยนทุกครั้งที่ Restart ไม่เหมือน VM ที่มี IP คงที่ การ Monitor แบบ IP-based จึงใช้ไม่ได้
Dynamic Scaling: จำนวน Pod เปลี่ยนแปลงตาม Load (HPA) จาก 3 Pods อาจเพิ่มเป็น 50 Pods ในช่วง Peak ต้องมี Dynamic service discovery
Multi-layer Complexity: มีหลาย Layer ที่ต้อง Monitor — Node, kubelet, Container runtime, Pod, Container, Application ปัญหาอาจเกิดที่ Layer ไหนก็ได้
Distributed Microservices: Request เดียวอาจผ่าน 10+ Services ถ้ามีปัญหาต้อง Trace ว่า Service ไหนช้า
Log Aggregation: แต่ละ Pod มี Log แยก เมื่อมี 500 Pods ต้องรวม Logs มาดูที่เดียว

นี่คือเหตุผลที่ Observability (ไม่ใช่แค่ Monitoring) จึงสำคัญ Observability ประกอบด้วย 3 Pillars:

Pillar	คืออะไร	เครื่องมือ	ตอบคำถาม
Metrics	ตัวเลขที่วัดค่าได้ (CPU, Memory, Request rate, Error rate)	Prometheus + Grafana	“อะไร” เกิดขึ้น? ระบบ Healthy ไหม?
Logs	บันทึกเหตุการณ์ที่เกิดขึ้น (Text-based)	Loki + Promtail	“ทำไม” ถึงเกิดขึ้น? Error message คืออะไร?
Traces	เส้นทาง Request ที่ผ่านหลาย Services	Tempo / Jaeger	“ที่ไหน” ช้า? Service ไหนเป็นปัญหา?

Prometheus Operator — Metrics Collection

ServiceMonitor และ PodMonitor

Prometheus Operator ทำให้การ Setup Prometheus ใน K8s ง่ายขึ้นมาก ไม่ต้องแก้ Config file เอง แค่สร้าง Custom Resource:

# ServiceMonitor — Monitor Service ที่มี metrics endpoint
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-monitor
  labels:
    release: kube-prometheus-stack  # ต้อง Match กับ Prometheus selector
spec:
  selector:
    matchLabels:
      app: my-app  # Match กับ Service labels
  endpoints:
    - port: metrics   # Port name ใน Service
      interval: 15s   # Scrape ทุก 15 วินาที
      path: /metrics   # Endpoint path
  namespaceSelector:
    matchNames:
      - production

---
# PodMonitor — Monitor Pod โดยตรง (ไม่ต้องมี Service)
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: my-batch-job-monitor
spec:
  selector:
    matchLabels:
      app: batch-processor
  podMetricsEndpoints:
    - port: metrics
      interval: 30s

kube-state-metrics

kube-state-metrics เป็น Service ที่ Generate metrics จาก Kubernetes API objects — Pod status, Deployment replicas, Node conditions, PVC status ฯลฯ ข้อมูลเหล่านี้ Prometheus เก็บไม่ได้จาก Node Exporter เพราะเป็นข้อมูลระดับ Kubernetes:

# Metrics ที่ kube-state-metrics ให้:
# kube_pod_status_phase{pod="web-xyz", phase="Running"}  1
# kube_pod_container_status_restarts_total{pod="web-xyz"} 5
# kube_deployment_status_replicas_available{deployment="web"} 3
# kube_deployment_spec_replicas{deployment="web"} 3
# kube_node_status_condition{node="node1", condition="Ready", status="true"} 1
# kube_persistentvolumeclaim_status_phase{pvc="data", phase="Bound"} 1

# ติดตั้ง (มาพร้อมกับ kube-prometheus-stack แล้ว)
helm install kube-state-metrics prometheus-community/kube-state-metrics -n monitoring

Node Exporter DaemonSet

Node Exporter ทำงานเป็น DaemonSet (Pod บนทุก Node) เพื่อ Collect metrics ระดับ OS — CPU usage, Memory usage, Disk I/O, Network I/O, Filesystem usage:

# Node Exporter metrics ที่สำคัญ:
# node_cpu_seconds_total — CPU usage per core
# node_memory_MemAvailable_bytes — Available memory
# node_filesystem_avail_bytes — Disk space available
# node_disk_io_time_seconds_total — Disk I/O time
# node_network_receive_bytes_total — Network RX bytes
# node_network_transmit_bytes_total — Network TX bytes

# PromQL queries ที่ใช้บ่อย:
# CPU usage per node (%)
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Memory usage per node (%)
(1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100

# Disk usage per node (%)
(1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100

kube-prometheus-stack — ติดตั้ง Full Monitoring Stack

kube-prometheus-stack เป็น Helm chart ที่รวมทุกอย่างไว้ในที่เดียว: Prometheus Operator, Prometheus, Grafana, Alertmanager, Node Exporter, kube-state-metrics ติดตั้งครั้งเดียวได้ทุกอย่าง:

# เพิ่ม Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# สร้าง Namespace
kubectl create namespace monitoring

# ติดตั้ง kube-prometheus-stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack   --namespace monitoring   --set grafana.adminPassword="YourSecurePassword"   --set prometheus.prometheusSpec.retention=30d   --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi   --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.storageClassName=standard   --set alertmanager.alertmanagerSpec.storage.volumeClaimTemplate.spec.resources.requests.storage=10Gi   --set grafana.persistence.enabled=true   --set grafana.persistence.size=5Gi

# ตรวจสอบ
kubectl get pods -n monitoring
# NAME                                                     READY   STATUS
# alertmanager-kube-prometheus-stack-alertmanager-0         2/2     Running
# kube-prometheus-stack-grafana-xxx                         3/3     Running
# kube-prometheus-stack-kube-state-metrics-xxx              1/1     Running
# kube-prometheus-stack-operator-xxx                        1/1     Running
# kube-prometheus-stack-prometheus-node-exporter-xxx        1/1     Running  (DaemonSet)
# prometheus-kube-prometheus-stack-prometheus-0             2/2     Running

# เข้า Grafana UI
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80
# เปิด http://localhost:3000 — admin / YourSecurePassword

Grafana Dashboards สำหรับ K8s (Pre-built)

kube-prometheus-stack มาพร้อม Dashboard หลายสิบอัน พร้อมใช้ทันที:

Dashboard	แสดงอะไร	ใช้เมื่อไหร่
Kubernetes / Compute Resources / Cluster	CPU/Memory ภาพรวมทั้ง Cluster	ดู Capacity planning
Kubernetes / Compute Resources / Namespace	CPU/Memory แยกตาม Namespace	ดูว่า Team ไหนใช้ Resource เท่าไหร่
Kubernetes / Compute Resources / Pod	CPU/Memory ของ Pod แต่ละตัว	Debug performance ของ App เฉพาะ
Kubernetes / Networking / Cluster	Network traffic ทั้ง Cluster	ดู Bandwidth usage
Node Exporter / Nodes	CPU, Memory, Disk, Network ของ Node	ดู Infrastructure health
CoreDNS	DNS query rate, errors, latency	Debug DNS issues

Loki สำหรับ Pod Logs

Grafana Loki เป็น Log aggregation system ที่ออกแบบมาสำหรับ K8s โดยเฉพาะ คล้ายกับ Elasticsearch แต่เบากว่ามาก เพราะ Index เฉพาะ Labels ไม่ได้ Full-text index:

# ติดตั้ง Loki + Promtail
helm install loki grafana/loki-stack   --namespace monitoring   --set promtail.enabled=true   --set loki.persistence.enabled=true   --set loki.persistence.size=50Gi

# Promtail DaemonSet จะถูกสร้างบนทุก Node
# Promtail อ่าน Container logs จาก /var/log/pods/ แล้วส่งไป Loki
# Labels ที่ Promtail เพิ่มให้อัตโนมัติ:
# - namespace, pod, container, node, stream (stdout/stderr)

# ตรวจสอบ
kubectl get pods -n monitoring -l app=promtail
kubectl get pods -n monitoring -l app=loki

LogQL — Query Logs ใน Grafana

# ดู Logs ของ Namespace production
{namespace="production"}

# ดู Logs ของ Pod เฉพาะ
{namespace="production", pod=~"web-app-.*"}

# ค้นหา Error
{namespace="production"} |= "error"
{namespace="production"} |= "ERROR" or |= "Exception"

# ค้นหาด้วย Regex
{namespace="production"} |~ "status=(4|5)\d{2}"

# Filter + Parse + Aggregate
{namespace="production", container="nginx"}
  | json
  | status >= 500
  | line_format "{{.method}} {{.path}} {{.status}}"

# นับ Error ต่อนาที (Metric from logs)
rate({namespace="production"} |= "error" [1m])

# Top 10 Error messages
topk(10, sum by (message) (count_over_time({namespace="production"} |= "error" | json | keep message [1h])))

Tempo สำหรับ Distributed Traces

Grafana Tempo เป็น Distributed tracing backend ที่เก็บ Traces จาก Application ที่ใช้ OpenTelemetry, Jaeger, หรือ Zipkin:

# ติดตั้ง Tempo
helm install tempo grafana/tempo   --namespace monitoring   --set tempo.storage.trace.backend=local   --set tempo.storage.trace.local.path=/var/tempo/traces   --set persistence.enabled=true   --set persistence.size=20Gi

# Configure Grafana ให้เชื่อมกับ Tempo
# Data Sources -> Add data source -> Tempo
# URL: http://tempo.monitoring.svc.cluster.local:3100

# Application ต้องส่ง Traces ไป Tempo
# ตัวอย่าง: Python app ใช้ OpenTelemetry
# pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp

# OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo.monitoring:4317
# OTEL_SERVICE_NAME=my-app

เชื่อม Metrics + Logs + Traces (Correlations)

Grafana สามารถเชื่อม 3 Pillars เข้าด้วยกัน ทำให้ Debug ได้เร็ว:

# ใน Grafana:
# 1. เห็น Metrics ว่า Error rate สูงขึ้น → คลิก "Explore Logs"
# 2. Grafana จะ Query Loki ด้วย Labels เดียวกัน (namespace, pod)
# 3. เห็น Log ที่มี TraceID → คลิก TraceID
# 4. Grafana จะ Query Tempo แสดง Trace ของ Request นั้น
# 5. เห็นว่า Service ไหนช้า หรือ Error ที่ไหน

# ต้อง Configure Derived Fields ใน Loki Data Source:
# Name: TraceID
# Regex: traceID=(\w+)
# Internal link: Tempo

Custom Metrics — Application Metrics Exposure

นอกจาก Infrastructure metrics แล้ว Application ควร Expose custom metrics ด้วย:

# Python (Flask + prometheus_client)
from prometheus_client import Counter, Histogram, generate_latest
from flask import Flask, Response

app = Flask(__name__)

# Custom metrics
REQUEST_COUNT = Counter('app_requests_total', 'Total requests', ['method', 'endpoint', 'status'])
REQUEST_LATENCY = Histogram('app_request_duration_seconds', 'Request latency', ['endpoint'])

@app.before_request
def before_request():
    request.start_time = time.time()

@app.after_request
def after_request(response):
    latency = time.time() - request.start_time
    REQUEST_COUNT.labels(request.method, request.endpoint, response.status_code).inc()
    REQUEST_LATENCY.labels(request.endpoint).observe(latency)
    return response

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')

---
# Go (Gin + prometheus)
import "github.com/prometheus/client_golang/prometheus/promhttp"

r := gin.Default()
r.GET("/metrics", gin.WrapH(promhttp.Handler()))

---
# Node.js (Express + prom-client)
const client = require('prom-client');
const collectDefaultMetrics = client.collectDefaultMetrics;
collectDefaultMetrics();

app.get('/metrics', async (req, res) => {
    res.set('Content-Type', client.register.contentType);
    res.end(await client.register.metrics());
});

Alerting สำหรับ K8s

Monitoring ที่ไม่มี Alerting ก็เหมือนกล้องวงจรปิดที่ไม่มีคนดู ต้องตั้ง Alert ให้แจ้งเตือนเมื่อมีปัญหา:

# PrometheusRule — Alert rules สำหรับ K8s
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
  namespace: monitoring
  labels:
    release: kube-prometheus-stack
spec:
  groups:
    - name: kubernetes-pod-alerts
      rules:
        # Pod Restart มากผิดปกติ
        - alert: PodRestartingTooMuch
          expr: increase(kube_pod_container_status_restarts_total[1h]) > 5
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} restarted {{ $value }} times in 1 hour"
            description: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is restarting frequently"

        # Pod ใช้ Memory เกิน 90% ของ Limit
        - alert: PodHighMemoryUsage
          expr: |
            container_memory_working_set_bytes{container!=""} /
            container_spec_memory_limit_bytes{container!=""} > 0.9
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.pod }} memory usage > 90%"

        # PVC ใช้ Disk เกิน 85%
        - alert: PVCAlmostFull
          expr: |
            kubelet_volume_stats_used_bytes /
            kubelet_volume_stats_capacity_bytes > 0.85
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "PVC {{ $labels.persistentvolumeclaim }} is {{ $value | humanizePercentage }} full"

    - name: kubernetes-node-alerts
      rules:
        # Node NotReady
        - alert: NodeNotReady
          expr: kube_node_status_condition{condition="Ready", status="true"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.node }} is NotReady"

        # Node CPU สูง
        - alert: NodeHighCPU
          expr: |
            100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "Node {{ $labels.instance }} CPU usage > 85%"

        # Node Disk เหลือน้อย
        - alert: NodeDiskAlmostFull
          expr: |
            (1 - node_filesystem_avail_bytes{mountpoint="/"} /
            node_filesystem_size_bytes{mountpoint="/"}) * 100 > 85
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Node {{ $labels.instance }} disk usage > 85%"

Alertmanager Configuration

# values.yaml สำหรับ kube-prometheus-stack
alertmanager:
  config:
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname', 'namespace']
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 4h
      receiver: 'slack-notifications'
      routes:
        - match:
            severity: critical
          receiver: 'slack-critical'
          repeat_interval: 1h
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
            channel: '#alerts'
            title: '{{ .CommonLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
      - name: 'slack-critical'
        slack_configs:
          - api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
            channel: '#alerts-critical'
            title: 'CRITICAL: {{ .CommonLabels.alertname }}'
            text: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'

Grafana OnCall Integration

Grafana OnCall เป็น Incident management tool ที่เชื่อมกับ Alertmanager ทำ On-call rotation, Escalation, และ Notification ผ่านหลายช่องทาง:

# ติดตั้ง Grafana OnCall
helm install grafana-oncall grafana/oncall   --namespace monitoring   --set base_url=https://oncall.company.com   --set grafana.url=http://kube-prometheus-stack-grafana.monitoring

# Features:
# - On-call schedules (ใครเวรเมื่อไหร่)
# - Escalation policies (ถ้าคนแรกไม่ตอบ ส่งคนถัดไป)
# - Multiple notification channels (Slack, Telegram, SMS, Phone call)
# - Alert grouping (รวม Alert ที่เกี่ยวข้องเป็น Incident เดียว)
# - Maintenance windows (ปิด Alert ระหว่างทำ Maintenance)

Cost of Observability ใน K8s

Observability stack ใช้ Resource ไม่น้อย ต้องวางแผน Capacity:

Component	CPU Request	Memory Request	Storage	หมายเหตุ
Prometheus	500m – 2 cores	2-8 GB	50-500 GB	ขึ้นกับจำนวน Time series
Grafana	100m – 500m	256 MB – 1 GB	5-10 GB	Dashboard + data source configs
Alertmanager	50m – 200m	128-512 MB	5 GB	เบามาก
Node Exporter (per node)	50m	64 MB	–	DaemonSet บนทุก Node
kube-state-metrics	100m – 500m	128-512 MB	–	ขึ้นกับจำนวน Objects
Loki	500m – 2 cores	1-4 GB	50-200 GB	ขึ้นกับ Log volume
Promtail (per node)	50m – 200m	128-512 MB	–	DaemonSet บนทุก Node
Tempo	500m – 1 core	1-2 GB	20-100 GB	ขึ้นกับ Trace volume

ตัวอย่าง: Cluster 10 Nodes ต้องการ Resource สำหรับ Observability stack ประมาณ 4-6 CPU cores + 10-20 GB RAM + 200-800 GB Storage คิดเป็น ~10-15% ของ Cluster capacity

OpenTelemetry Operator — Auto-instrumentation

OpenTelemetry Operator สามารถ Auto-inject instrumentation เข้า Application โดยไม่ต้องแก้ Code:

# ติดตั้ง OpenTelemetry Operator
helm install opentelemetry-operator open-telemetry/opentelemetry-operator   --namespace monitoring

# สร้าง Instrumentation resource
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: my-instrumentation
  namespace: production
spec:
  exporter:
    endpoint: http://tempo.monitoring:4317
  propagators:
    - tracecontext
    - baggage
  sampler:
    type: parentbased_traceidratio
    argument: "0.25"  # Sample 25% ของ Traces
  python:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:latest
  java:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:latest
  nodejs:
    image: ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:latest

---
# เพิ่ม Annotation ใน Deployment เพื่อ Enable auto-instrumentation
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-python-app
spec:
  template:
    metadata:
      annotations:
        instrumentation.opentelemetry.io/inject-python: "true"
    spec:
      containers:
        - name: app
          image: my-python-app:latest
# Operator จะ Inject sidecar/init-container อัตโนมัติ
# Application จะเริ่มส่ง Traces ไป Tempo โดยไม่ต้องแก้ Code!

Observability-as-Code

จัดการ Observability stack ด้วย GitOps — ทุกอย่างอยู่ใน Git:

# โครงสร้าง Repository
observability/
├── base/
│   ├── kustomization.yaml
│   ├── kube-prometheus-stack/
│   │   └── values.yaml
│   ├── loki/
│   │   └── values.yaml
│   └── tempo/
│       └── values.yaml
├── alerts/
│   ├── pod-alerts.yaml          # PrometheusRule
│   ├── node-alerts.yaml
│   └── app-alerts.yaml
├── dashboards/
│   ├── app-overview.json        # Grafana Dashboard JSON
│   ├── slo-dashboard.json
│   └── cost-dashboard.json
├── servicemonitors/
│   ├── app-a-monitor.yaml
│   ├── app-b-monitor.yaml
│   └── nginx-monitor.yaml
└── oncall/
    ├── schedules.yaml
    └── escalation-policies.yaml

# ใช้ ArgoCD หรือ Flux CD ในการ Deploy
# เมื่อมีการ Push changes → GitOps controller จะ Sync อัตโนมัติ
# Benefits:
# - Version control สำหรับ Alert rules (ดูประวัติการเปลี่ยนแปลง)
# - Code review ก่อน Apply (PR review)
# - Rollback ได้ถ้ามีปัญหา
# - Reproducible ข้าม Environment (dev → staging → prod)

Grafana Dashboard as Code

# ConfigMap สำหรับ Grafana Dashboard
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-dashboard
  namespace: monitoring
  labels:
    grafana_dashboard: "1"  # Grafana sidecar จะ Auto-load
data:
  app-overview.json: |
    {
      "dashboard": {
        "title": "Application Overview",
        "panels": [
          {
            "title": "Request Rate",
            "type": "timeseries",
            "targets": [
              { "expr": "rate(app_requests_total[5m])", "legendFormat": "{{method}} {{endpoint}}" }
            ]
          },
          {
            "title": "Error Rate",
            "type": "stat",
            "targets": [
              { "expr": "rate(app_requests_total{status=~'5..'}[5m]) / rate(app_requests_total[5m]) * 100" }
            ]
          },
          {
            "title": "P99 Latency",
            "type": "gauge",
            "targets": [
              { "expr": "histogram_quantile(0.99, rate(app_request_duration_seconds_bucket[5m]))" }
            ]
          }
        ]
      }
    }

สรุป — Observability Stack สำหรับ K8s

Observability ใน Kubernetes ไม่ใช่แค่ “ติดตั้ง Prometheus แล้วจบ” แต่ต้องครบทั้ง 3 Pillars: Metrics (Prometheus + Grafana) เพื่อรู้ว่า “อะไร” เกิดขึ้น, Logs (Loki + Promtail) เพื่อรู้ว่า “ทำไม” ถึงเกิดขึ้น, และ Traces (Tempo + OpenTelemetry) เพื่อรู้ว่า “ที่ไหน” เป็นปัญหา

เริ่มต้นด้วย kube-prometheus-stack เพราะมาพร้อม Prometheus, Grafana, Alertmanager, Node Exporter และ Dashboard พร้อมใช้ แล้วค่อยเพิ่ม Loki สำหรับ Logs และ Tempo สำหรับ Traces ตาม Maturity ของทีม

สิ่งสำคัญที่สุดคือ Alerting ที่ดี — ต้องแจ้งเตือนเมื่อมีปัญหาจริง ไม่ใช่ Alert ทุก 5 นาทีจนไม่มีใครสนใจ (Alert fatigue) และ Observability-as-Code — จัดการทุกอย่างผ่าน Git เพื่อ Version control, Code review, และ Reproducibility

.
.
.
.
.

Siam2R.com — Portfolio งาน IT · XM Signal — Free Forex EA Download

Observability สำหรับ Kubernetes สอนติดตั้ง Monitoring Stack บน K8s 2026

Observability Challenge ใน Kubernetes

Prometheus Operator — Metrics Collection

ServiceMonitor และ PodMonitor

kube-state-metrics

Node Exporter DaemonSet

kube-prometheus-stack — ติดตั้ง Full Monitoring Stack

Grafana Dashboards สำหรับ K8s (Pre-built)

Loki สำหรับ Pod Logs

LogQL — Query Logs ใน Grafana

Tempo สำหรับ Distributed Traces

เชื่อม Metrics + Logs + Traces (Correlations)

Custom Metrics — Application Metrics Exposure

Alerting สำหรับ K8s

Alertmanager Configuration

Grafana OnCall Integration

Cost of Observability ใน K8s

OpenTelemetry Operator — Auto-instrumentation

Observability-as-Code

Grafana Dashboard as Code

สรุป — Observability Stack สำหรับ K8s

PowerShell สำหรับ Network Engineer สอนใช้ PowerShell จัดการ Network และ Windows Server 2026

PowerShell สำหรับ Network Engineer สอนใช้ PowerShell จัดการ Network และ Windows Server 2026

Wireshark สำหรับ VoIP Troubleshooting สอนวิเคราะห์ SIP และ RTP สำหรับ IT 2026

Change Management คืออะไร? กระบวนการจัดการ Change Request สำหรับ IT 2026

IT Capacity Planning คืออะไร? วางแผนทรัพยากร IT ให้เพียงพอโดยไม่สิ้นเปลือง 2026

© 2026 SiamLancard — จำหน่ายการ์ดแลน อุปกรณ์ Server และเครื่องพิมพ์ใบเสร็จ

Shopping cart

Observability สำหรับ Kubernetes สอนติดตั้ง Monitoring Stack บน K8s 2026

Observability Challenge ใน Kubernetes

Prometheus Operator — Metrics Collection

ServiceMonitor และ PodMonitor

kube-state-metrics

Node Exporter DaemonSet

kube-prometheus-stack — ติดตั้ง Full Monitoring Stack

Grafana Dashboards สำหรับ K8s (Pre-built)

Loki สำหรับ Pod Logs

LogQL — Query Logs ใน Grafana

Tempo สำหรับ Distributed Traces

เชื่อม Metrics + Logs + Traces (Correlations)

Custom Metrics — Application Metrics Exposure

Alerting สำหรับ K8s

Alertmanager Configuration

Grafana OnCall Integration

Cost of Observability ใน K8s

OpenTelemetry Operator — Auto-instrumentation

Observability-as-Code

Grafana Dashboard as Code

สรุป — Observability Stack สำหรับ K8s

บทความที่เกี่ยวข้อง

PowerShell สำหรับ Network Engineer สอนใช้ PowerShell จัดการ Network และ Windows Server 2026

PowerShell สำหรับ Network Engineer สอนใช้ PowerShell จัดการ Network และ Windows Server 2026

Wireshark สำหรับ VoIP Troubleshooting สอนวิเคราะห์ SIP และ RTP สำหรับ IT 2026

Change Management คืออะไร? กระบวนการจัดการ Change Request สำหรับ IT 2026

IT Capacity Planning คืออะไร? วางแผนทรัพยากร IT ให้เพียงพอโดยไม่สิ้นเปลือง 2026

© 2026 SiamLancard — จำหน่ายการ์ดแลน อุปกรณ์ Server และเครื่องพิมพ์ใบเสร็จ

Shopping cart