Compare commits

...

2 Commits

Author SHA1 Message Date
Tuan-Dat Tran
3ac7d91101 humanize README: intro, design notes, remove mechanical formatting 2026-04-28 18:52:33 +02:00
Tuan-Dat Tran
a187b648e7 fix nginx LB -> kube-vip, mermaid labels, abstract VPN details 2026-04-28 18:49:54 +02:00

View File

@@ -1,6 +1,6 @@
# Homelab # Homelab
A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps. 17-node Kubernetes cluster on five bare-metal Proxmox hosts, provisioned with Terraform and Ansible, managed through ArgoCD GitOps. Runs my home automation, media stack, photo backup, documents, and a few side projects.
![k3s](https://img.shields.io/badge/k3s-v1.34-orange?logo=kubernetes) ![k3s](https://img.shields.io/badge/k3s-v1.34-orange?logo=kubernetes)
![nodes](https://img.shields.io/badge/nodes-17-blue) ![nodes](https://img.shields.io/badge/nodes-17-blue)
@@ -14,46 +14,45 @@ A production-grade homelab running on bare-metal Proxmox, with a 17-node Kuberne
```mermaid ```mermaid
graph TB graph TB
subgraph ext[" External"] subgraph ext[" External"]
CF["Cloudflare\nCDN + DNS"] CF["Cloudflare CDN"]
Admin["Remote Admin"] Admin["Remote Admin"]
end end
subgraph vps["Edge VPS"] subgraph vps["Edge VPS"]
WG["WireGuard\nVPN Gateway"] WG["WireGuard VPN Gateway"]
TraefikVPS["Traefik\nReverse Proxy"] TraefikVPS["Traefik"]
Pangolin["Pangolin\nTunnel Server"] Pangolin["Pangolin Tunnel Server"]
end end
subgraph proxmox["Proxmox Cluster — 5 physical nodes"] subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
subgraph cp["Control Plane ×3 (HA etcd)"] subgraph cp["Control Plane x3 HA etcd + kube-vip"]
S["k3s-server"] S["k3s-server"]
end end
LB["nginx\nLoad Balancer"] subgraph workers["Worker Nodes x14"]
subgraph workers["Worker Nodes ×14"]
W["k3s-agent"] W["k3s-agent"]
end end
DH["docker-host\nIntel QuickSync GPU"] DH["docker-hostIntel QuickSync GPU"]
NFS["NFS Server\nDedicated storage node"] NFS["NFS Server — dedicated storage node"]
end end
subgraph k8s["Kubernetes"] subgraph k8s["Kubernetes"]
subgraph platform["Platform layer"] subgraph platform["Platform"]
direction LR direction LR
MetalLB["MetalLB"] MetalLB
Traefik["Traefik"] Traefik
Longhorn["Longhorn"] Longhorn
ArgoCD["ArgoCD"] ArgoCD
Prometheus["Prometheus\n+ Grafana"] Prometheus
ECK["Elastic Stack\n(ECK)"] ECK["Elastic Stack"]
Istio["Istio\nAmbient"] Istio["Istio Ambient"]
end end
subgraph apps["Applications"] subgraph apps["Applications"]
direction LR direction LR
Immich["Immich"] Immich
VW["Vaultwarden"] VW["Vaultwarden"]
HA["Home Assistant"] HA["Home Assistant"]
Media["Arr Stack\n+ Jellyfin"] Media["Arr Stack + Jellyfin"]
Other["Paperless · N8n\nNtfy · Gitea · …"] Other["Paperless, N8n, Ntfy ..."]
end end
end end
@@ -62,11 +61,10 @@ graph TB
CF -->|Cloudflare tunnel| k8s CF -->|Cloudflare tunnel| k8s
TraefikVPS --> Pangolin TraefikVPS --> Pangolin
Pangolin -->|Newt client| k8s Pangolin -->|Newt client| k8s
LB --> cp
cp --- workers cp --- workers
workers --- Longhorn workers --- Longhorn
NFS -->|NFS mount| Media NFS -->|NFS mount| Media
DH -->|Jellyfin\nDocker| Media DH -->|Docker| Media
``` ```
--- ---
@@ -78,16 +76,15 @@ graph TB
| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs | | Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs |
| Physical | `lulu` | Proxmox node | k3s agents | | Physical | `lulu` | Proxmox node | k3s agents |
| Physical | `inko01` | Proxmox node | k3s server + agents + docker host | | Physical | `inko01` | Proxmox node | k3s server + agents + docker host |
| Physical | `naruto01` | Proxmox node | k3s server + agents + LB | | Physical | `naruto01` | Proxmox node | k3s server + agents |
| Physical | `mii01` | Proxmox node | k3s server + agents | | Physical | `mii01` | Proxmox node | k3s server + agents |
| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd) | 2 vCPU · 4 GB RAM · 64 GB | | VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd + kube-vip VIP) | 2 vCPU · 4 GB RAM · 64 GB |
| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB | | VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync | | VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
| VM | `k3s-loadbalancer` | nginx LB fronting control plane | 1 vCPU · 2 GB RAM |
| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM | | VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin | | VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin |
All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-init templates via **Terraform + Ansible**. All VMs run Debian 12 on `virtio` network bridges, provisioned from cloud-init templates via Terraform + Ansible.
--- ---
@@ -97,6 +94,7 @@ All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-in
|-----------|-------------|---------| |-----------|-------------|---------|
| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git | | **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git | | **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git |
| **kube-vip** | DaemonSet on control plane | HA VIP for the K8s API server |
| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB | | **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB |
| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool | | **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) | | **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
@@ -130,44 +128,43 @@ All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-in
| **Gitea Runner** | CI/CD runner | | | **Gitea Runner** | CI/CD runner | |
| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | | | **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | |
| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr | | **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
| **qBittorrent** | Torrent clients (×2) | Gluetun VPN sidecar · ProtonVPN | | **qBittorrent** | Torrent clients (×2) with VPN isolation | Gluetun sidecar |
| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync | | **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync |
--- ---
## Key Design Decisions ## Design notes
**GitOps end-to-end.** Every cluster resource is declared in Git and applied by ArgoCD. Nothing is `kubectl apply`'d by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically. Everything goes through Git. ArgoCD owns the cluster state; nothing gets `kubectl apply`'d directly. ArgoCD Image Updater handles the image update loop: when a new tag appears in the registry, it commits the change back to Git and ArgoCD picks it up from there.
**Secrets in Git, safely.** Sealed Secrets lets encrypted `SealedSecret` manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them. Secrets are committed to Git too, encrypted via Sealed Secrets. Only the in-cluster controller holds the decryption key.
**No cloud dependency for ingress.** MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router. No ports are open on the home router. Internal load balancing goes through MetalLB + Traefik. External access uses Cloudflare tunnels or a WireGuard VPN routed through the edge VPS.
**Distributed storage without a SAN.** Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency. Longhorn handles block storage by replicating volumes across all 14 agent nodes. The media library lives on a dedicated NFS host instead — latency matters when Jellyfin is reading large video files, and NFS is simpler for that.
**Observability from day one.** Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster. Metrics go to Prometheus + Grafana. Logs and fleet management go to Elastic Stack via the ECK operator, with Elastic Agents running as a DaemonSet so every node is covered.
**Provisioning is reproducible.** Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management. All VMs are provisioned with Terraform and configured by Ansible. Rebuilding from scratch doesn't require remembering anything.
--- ---
## Repository Layout ## Repo layout
``` ```
ansible-homelab/ # Ansible roles + playbooks for all VM provisioning ansible-homelab/
├── roles/ ├── roles/
│ ├── common/ # Base OS config, SSH hardening, node-exporter │ ├── common/ # base OS config, SSH hardening, node-exporter
│ ├── k3s_server/ # HA control plane install + taint config │ ├── k3s_server/ # control plane install + NoSchedule taint
│ ├── k3s_agent/ # Worker node install │ ├── k3s_agent/ # worker node install
│ ├── k3s_loadbalancer/ # nginx LB config │ ├── kube_vip/ # kube-vip DaemonSet + TLS SAN config
│ ├── kube_vip/ # VIP setup │ ├── docker_host/ # Docker + Intel QuickSync GPU passthrough
│ ├── docker_host/ # Docker + GPU passthrough │ ├── proxmox/ # Proxmox node setup
── proxmox/ # Proxmox node config ── edge_vps/ # VPS: WireGuard, Traefik, Pangolin, Elastic Agent
│ └── edge_vps/ # VPS services (WireGuard, Traefik, Pangolin) └── playbooks/
└── playbooks/ # Top-level playbooks per host group
argocd-homelab/ # All Kubernetes manifests (ArgoCD App-of-Apps) argocd-homelab/
├── infrastructure/ # Platform: MetalLB, Longhorn, Cert-Manager, ECK, ├── infrastructure/ # MetalLB, Longhorn, Cert-Manager, ECK, Istio, ...
├── services/ # Applications: Immich, Vaultwarden, arr-stack, ├── services/ # Immich, Vaultwarden, arr-stack, Home Assistant, ...
└── cluster-apps/ # ArgoCD ApplicationSets + root app └── cluster-apps/ # ArgoCD App-of-Apps root + ApplicationSets
``` ```