7.4 KiB
Homelab
17-node Kubernetes cluster on five bare-metal Proxmox hosts, provisioned with Terraform and Ansible, managed through ArgoCD GitOps. Runs my home automation, media stack, photo backup, documents, and a few side projects.
Architecture
graph TB
subgraph ext[" External"]
CF["Cloudflare CDN"]
Admin["Remote Admin"]
end
subgraph vps["Edge VPS"]
WG["WireGuard VPN Gateway"]
TraefikVPS["Traefik"]
Pangolin["Pangolin Tunnel Server"]
end
subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
subgraph cp["Control Plane x3 — HA etcd + kube-vip"]
S["k3s-server"]
end
subgraph workers["Worker Nodes x14"]
W["k3s-agent"]
end
DH["docker-host — Intel QuickSync GPU"]
NFS["NFS Server — dedicated storage node"]
end
subgraph k8s["Kubernetes"]
subgraph platform["Platform"]
direction LR
MetalLB
Traefik
Longhorn
ArgoCD
Prometheus
ECK["Elastic Stack"]
Istio["Istio Ambient"]
end
subgraph apps["Applications"]
direction LR
Immich
VW["Vaultwarden"]
HA["Home Assistant"]
Media["Arr Stack + Jellyfin"]
Other["Paperless, N8n, Ntfy ..."]
end
end
Admin -->|WireGuard VPN| WG
WG -->|tunnel| k8s
CF -->|Cloudflare tunnel| k8s
TraefikVPS --> Pangolin
Pangolin -->|Newt client| k8s
cp --- workers
workers --- Longhorn
NFS -->|NFS mount| Media
DH -->|Docker| Media
Hardware
| Layer | Host | Role | Resources |
|---|---|---|---|
| Physical | aya01 |
Proxmox node + NFS server | Dedicated storage — no VMs |
| Physical | lulu |
Proxmox node | k3s agents |
| Physical | inko01 |
Proxmox node | k3s server + agents + docker host |
| Physical | naruto01 |
Proxmox node | k3s server + agents |
| Physical | mii01 |
Proxmox node | k3s server + agents |
| VM | k3s-server-{10,11,12} |
K3s control plane (HA etcd + kube-vip VIP) | 2 vCPU · 4 GB RAM · 64 GB |
| VM | k3s-agent-{10…23} |
K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
| VM | docker-host11 |
Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
| VM | docker-lb |
Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
| VPS | mii |
Edge node (Netcup) | WireGuard · Traefik · Pangolin |
All VMs run Debian 12 on virtio network bridges, provisioned from cloud-init templates via Terraform + Ansible.
Platform Stack
| Component | How deployed | Purpose |
|---|---|---|
| ArgoCD | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
| ArgoCD Image Updater | Helm | Watches registries, commits updated image tags back to Git |
| kube-vip | DaemonSet on control plane | HA VIP for the K8s API server |
| Traefik | k3s built-in | Ingress controller, fronted by MetalLB |
| MetalLB | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
| Cert-Manager | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
| Sealed Secrets | Helm (ArgoCD) | Encrypts secrets for safe storage in Git |
| Longhorn | Helm (ArgoCD) | Distributed block storage (RWO + RWX) across all 14 agents |
| CloudNativePG | Operator (ArgoCD) | HA PostgreSQL — used by Immich |
| Elastic Stack (ECK) | Operator (ArgoCD) | Elasticsearch + Kibana + Fleet + Elastic Agents for observability |
| Kube-Prometheus-Stack | Helm (ArgoCD) | Prometheus + Grafana monitoring |
| Goldilocks + VPA | Helm (ArgoCD) | Resource usage analysis and request/limit rightsizing |
| Istio (Ambient) | Helm (ArgoCD) | Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet |
| K3s Upgrade Controller | Operator (ArgoCD) | Automated rolling K3s version upgrades |
| mii-wireguard | Manifest (ArgoCD) | WireGuard pod — connects cluster to edge VPS, masquerades service CIDR |
| Newt | Deployment (ArgoCD) | Pangolin tunnel client for VPS-proxied services |
| Cloudflared | Deployment ×2 (ArgoCD) | Cloudflare tunnel — exposes selected services to the internet |
Applications
| Service | Description | Notable tech |
|---|---|---|
| Immich | Photo & video backup (self-hosted Google Photos) | CloudNativePG · Redis · ML pod |
| Vaultwarden | Bitwarden-compatible password manager | – |
| Paperless-ngx | Document management + OCR | – |
| Home Assistant | Home automation hub | – |
| N8n | Workflow automation | – |
| Ntfy | Self-hosted push notifications | – |
| Stirling PDF | PDF tools | – |
| Karakeep | Bookmark manager | – |
| Gitea | Self-hosted Git (source of truth for ArgoCD) | Docker on docker-host11 |
| Gitea Runner | CI/CD runner | – |
| Zeroclaw | Per-user instances (×3) via Kustomize overlays | – |
| Arr Stack | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
| Download clients | VPN-isolated download clients (×2) | Gluetun sidecar |
| Jellyfin | Media server with hardware transcoding | Docker · Intel QuickSync |
Design notes
Everything goes through Git. ArgoCD owns the cluster state; nothing gets kubectl apply'd directly. ArgoCD Image Updater handles the image update loop: when a new tag appears in the registry, it commits the change back to Git and ArgoCD picks it up from there.
Secrets are committed to Git too, encrypted via Sealed Secrets. Only the in-cluster controller holds the decryption key.
No ports are open on the home router. Internal load balancing goes through MetalLB + Traefik. External access uses Cloudflare tunnels or a WireGuard VPN routed through the edge VPS.
Longhorn handles block storage by replicating volumes across all 14 agent nodes. The media library lives on a dedicated NFS host instead — latency matters when Jellyfin is reading large video files, and NFS is simpler for that.
Metrics go to Prometheus + Grafana. Logs and fleet management go to Elastic Stack via the ECK operator, with Elastic Agents running as a DaemonSet so every node is covered.
All VMs are provisioned with Terraform and configured by Ansible. Rebuilding from scratch doesn't require remembering anything.
Repo layout
ansible-homelab/
├── roles/
│ ├── common/ # base OS config, SSH hardening, node-exporter
│ ├── k3s_server/ # control plane install + NoSchedule taint
│ ├── k3s_agent/ # worker node install
│ ├── kube_vip/ # kube-vip DaemonSet + TLS SAN config
│ ├── docker_host/ # Docker + Intel QuickSync GPU passthrough
│ ├── proxmox/ # Proxmox node setup
│ └── edge_vps/ # VPS: WireGuard, Traefik, Pangolin, Elastic Agent
└── playbooks/
argocd-homelab/
├── infrastructure/ # MetalLB, Longhorn, Cert-Manager, ECK, Istio, ...
├── services/ # Immich, Vaultwarden, arr-stack, Home Assistant, ...
└── cluster-apps/ # ArgoCD App-of-Apps root + ApplicationSets