1.6 KiB
Observability
Two parallel stacks: Prometheus for metrics, Elastic for logs.
Metrics
kube-prometheus-stack runs in the prometheus namespace (ArgoCD-managed). Prometheus scrapes all nodes, pods, and control plane components. Grafana has dashboards for cluster overview, node resources, Longhorn, ArgoCD, and Traefik.
Node Exporter is deployed via Ansible on every VM including docker-host11 and the edge VPS, so coverage isn't limited to what's inside Kubernetes.
Goldilocks and VPA run alongside and analyze actual resource usage to suggest better request/limit values.
Alertmanager routes alerts to Ntfy via a custom webhook bridge.
Logs and fleet management
The ECK operator (Elastic Cloud on Kubernetes) manages the Elastic stack in the elastic-system namespace:
| Component | Purpose |
|---|---|
| Elasticsearch | Log storage and search (single-node, 15 Gi heap) |
| Kibana | Log exploration and dashboards |
| Fleet Server | Manages Elastic Agent enrollment and policies |
| Elastic Agent (DaemonSet) | Ships logs and metrics from every cluster node |
| Elastic Agent (standalone) | Runs on docker-host11 and the edge VPS |
The DaemonSet tolerates the control-plane NoSchedule taint so server nodes are covered too.
Elastic alert rules are bridged to Ntfy via elastic-ntfy-bridge, a small CronJob that polls the Elasticsearch alerts API and forwards new alerts as push notifications.
Alerting flow
Prometheus Alertmanager ──► Ntfy (push notification)
▲
Elasticsearch alert rule ──► elastic-ntfy-bridge CronJob ─┘
Both sources land in the same Ntfy topic.