1.8 KiB
1.8 KiB
Observability
Two parallel stacks cover metrics and logs.
Metrics — Prometheus + Grafana
Deployed via the kube-prometheus-stack Helm chart (ArgoCD-managed), running in the prometheus namespace.
- Prometheus scrapes all nodes, pods, and K8s control plane components
- Grafana dashboards: cluster overview, node resource usage, Longhorn, ArgoCD, Traefik
- Alertmanager routes alerts to Ntfy (self-hosted push notifications) via a custom webhook bridge
- Node Exporter runs on all VMs including docker-host11 and the edge VPS (Ansible-deployed)
- Goldilocks + VPA analyse actual resource usage and recommend request/limit values
Logs + Fleet — Elastic Stack (ECK)
Deployed via the ECK operator (Elastic Cloud on Kubernetes), running in the elastic-system namespace.
| Component | Purpose |
|---|---|
| Elasticsearch | Log storage and search (single-node, 15 Gi heap) |
| Kibana | Log exploration and dashboards |
| Fleet Server | Manages Elastic Agent enrollment and policies |
| Elastic Agent (DaemonSet) | Ships logs and metrics from every cluster node |
| Elastic Agent (standalone) | Runs on docker-host11 and the edge VPS |
The Elastic Agent DaemonSet tolerates the control-plane NoSchedule taint so logs are collected from server nodes as well as agents.
Alerts from Elasticsearch rules are bridged to Ntfy via a small CronJob (elastic-ntfy-bridge) that polls the Elasticsearch alerts API and forwards new alerts as push notifications.
Alerting Flow
Prometheus Alertmanager ──► Ntfy (push notification)
▲
Elasticsearch alert rule ──► elastic-ntfy-bridge CronJob ─┘
All alerts land in the same Ntfy topic, accessible on mobile and desktop.