# Homelab A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps. ![k3s](https://img.shields.io/badge/k3s-v1.34-orange?logo=kubernetes) ![nodes](https://img.shields.io/badge/nodes-17-blue) ![ArgoCD](https://img.shields.io/badge/GitOps-ArgoCD-red?logo=argo) ![Ansible](https://img.shields.io/badge/provisioned-Ansible-black?logo=ansible) --- ## Architecture ```mermaid graph TB subgraph ext[" External"] CF["Cloudflare\nCDN + DNS"] Admin["Remote Admin"] end subgraph vps["Edge VPS"] WG["WireGuard\nVPN Gateway"] TraefikVPS["Traefik\nReverse Proxy"] Pangolin["Pangolin\nTunnel Server"] end subgraph proxmox["Proxmox Cluster — 5 physical nodes"] subgraph cp["Control Plane ×3 (HA etcd)"] S["k3s-server"] end LB["nginx\nLoad Balancer"] subgraph workers["Worker Nodes ×14"] W["k3s-agent"] end DH["docker-host\nIntel QuickSync GPU"] NFS["NFS Server\nDedicated storage node"] end subgraph k8s["Kubernetes"] subgraph platform["Platform layer"] direction LR MetalLB["MetalLB"] Traefik["Traefik"] Longhorn["Longhorn"] ArgoCD["ArgoCD"] Prometheus["Prometheus\n+ Grafana"] ECK["Elastic Stack\n(ECK)"] Istio["Istio\nAmbient"] end subgraph apps["Applications"] direction LR Immich["Immich"] VW["Vaultwarden"] HA["Home Assistant"] Media["Arr Stack\n+ Jellyfin"] Other["Paperless · N8n\nNtfy · Gitea · …"] end end Admin -->|WireGuard VPN| WG WG -->|tunnel| k8s CF -->|Cloudflare tunnel| k8s TraefikVPS --> Pangolin Pangolin -->|Newt client| k8s LB --> cp cp --- workers workers --- Longhorn NFS -->|NFS mount| Media DH -->|Jellyfin\nDocker| Media ``` --- ## Hardware | Layer | Host | Role | Resources | |-------|------|------|-----------| | Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs | | Physical | `lulu` | Proxmox node | k3s agents | | Physical | `inko01` | Proxmox node | k3s server + agents + docker host | | Physical | `naruto01` | Proxmox node | k3s server + agents + LB | | Physical | `mii01` | Proxmox node | k3s server + agents | | VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd) | 2 vCPU · 4 GB RAM · 64 GB | | VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB | | VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync | | VM | `k3s-loadbalancer` | nginx LB fronting control plane | 1 vCPU · 2 GB RAM | | VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM | | VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin | All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-init templates via **Terraform + Ansible**. --- ## Platform Stack | Component | How deployed | Purpose | |-----------|-------------|---------| | **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git | | **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git | | **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB | | **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool | | **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) | | **Sealed Secrets** | Helm (ArgoCD) | Encrypts secrets for safe storage in Git | | **Longhorn** | Helm (ArgoCD) | Distributed block storage (RWO + RWX) across all 14 agents | | **CloudNativePG** | Operator (ArgoCD) | HA PostgreSQL — used by Immich | | **Elastic Stack (ECK)** | Operator (ArgoCD) | Elasticsearch + Kibana + Fleet + Elastic Agents for observability | | **Kube-Prometheus-Stack** | Helm (ArgoCD) | Prometheus + Grafana monitoring | | **Goldilocks + VPA** | Helm (ArgoCD) | Resource usage analysis and request/limit rightsizing | | **Istio (Ambient)** | Helm (ArgoCD) | Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet | | **K3s Upgrade Controller** | Operator (ArgoCD) | Automated rolling K3s version upgrades | | **mii-wireguard** | Manifest (ArgoCD) | WireGuard pod — connects cluster to edge VPS, masquerades service CIDR | | **Newt** | Deployment (ArgoCD) | Pangolin tunnel client for VPS-proxied services | | **Cloudflared** | Deployment ×2 (ArgoCD) | Cloudflare tunnel — exposes selected services to the internet | --- ## Applications | Service | Description | Notable tech | |---------|-------------|--------------| | **Immich** | Photo & video backup (self-hosted Google Photos) | CloudNativePG · Redis · ML pod | | **Vaultwarden** | Bitwarden-compatible password manager | – | | **Paperless-ngx** | Document management + OCR | – | | **Home Assistant** | Home automation hub | – | | **N8n** | Workflow automation | – | | **Ntfy** | Self-hosted push notifications | – | | **Stirling PDF** | PDF tools | – | | **Karakeep** | Bookmark manager | – | | **Gitea** | Self-hosted Git (source of truth for ArgoCD) | Docker on docker-host11 | | **Gitea Runner** | CI/CD runner | – | | **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | – | | **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr | | **qBittorrent** | Torrent clients (×2) | Gluetun VPN sidecar · ProtonVPN | | **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync | --- ## Key Design Decisions **GitOps end-to-end.** Every cluster resource is declared in Git and applied by ArgoCD. Nothing is `kubectl apply`'d by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically. **Secrets in Git, safely.** Sealed Secrets lets encrypted `SealedSecret` manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them. **No cloud dependency for ingress.** MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router. **Distributed storage without a SAN.** Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency. **Observability from day one.** Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster. **Provisioning is reproducible.** Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management. --- ## Repository Layout ``` ansible-homelab/ # Ansible roles + playbooks for all VM provisioning ├── roles/ │ ├── common/ # Base OS config, SSH hardening, node-exporter │ ├── k3s_server/ # HA control plane install + taint config │ ├── k3s_agent/ # Worker node install │ ├── k3s_loadbalancer/ # nginx LB config │ ├── kube_vip/ # VIP setup │ ├── docker_host/ # Docker + GPU passthrough │ ├── proxmox/ # Proxmox node config │ └── edge_vps/ # VPS services (WireGuard, Traefik, Pangolin) └── playbooks/ # Top-level playbooks per host group argocd-homelab/ # All Kubernetes manifests (ArgoCD App-of-Apps) ├── infrastructure/ # Platform: MetalLB, Longhorn, Cert-Manager, ECK, … ├── services/ # Applications: Immich, Vaultwarden, arr-stack, … └── cluster-apps/ # ArgoCD ApplicationSets + root app ```