171 lines
7.4 KiB
Markdown
171 lines
7.4 KiB
Markdown
# Homelab
|
||
|
||
17-node Kubernetes cluster on five bare-metal Proxmox hosts, provisioned with Terraform and Ansible, managed through ArgoCD GitOps. Runs my home automation, media stack, photo backup, documents, and a few side projects.
|
||
|
||

|
||

|
||

|
||

|
||
|
||
---
|
||
|
||
## Architecture
|
||
|
||
```mermaid
|
||
graph TB
|
||
subgraph ext[" External"]
|
||
CF["Cloudflare CDN"]
|
||
Admin["Remote Admin"]
|
||
end
|
||
|
||
subgraph vps["Edge VPS"]
|
||
WG["WireGuard VPN Gateway"]
|
||
TraefikVPS["Traefik"]
|
||
Pangolin["Pangolin Tunnel Server"]
|
||
end
|
||
|
||
subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
|
||
subgraph cp["Control Plane x3 — HA etcd + kube-vip"]
|
||
S["k3s-server"]
|
||
end
|
||
subgraph workers["Worker Nodes x14"]
|
||
W["k3s-agent"]
|
||
end
|
||
DH["docker-host — Intel QuickSync GPU"]
|
||
NFS["NFS Server — dedicated storage node"]
|
||
end
|
||
|
||
subgraph k8s["Kubernetes"]
|
||
subgraph platform["Platform"]
|
||
direction LR
|
||
MetalLB
|
||
Traefik
|
||
Longhorn
|
||
ArgoCD
|
||
Prometheus
|
||
ECK["Elastic Stack"]
|
||
Istio["Istio Ambient"]
|
||
end
|
||
subgraph apps["Applications"]
|
||
direction LR
|
||
Immich
|
||
VW["Vaultwarden"]
|
||
HA["Home Assistant"]
|
||
Media["Arr Stack + Jellyfin"]
|
||
Other["Paperless, N8n, Ntfy ..."]
|
||
end
|
||
end
|
||
|
||
Admin -->|WireGuard VPN| WG
|
||
WG -->|tunnel| k8s
|
||
CF -->|Cloudflare tunnel| k8s
|
||
TraefikVPS --> Pangolin
|
||
Pangolin -->|Newt client| k8s
|
||
cp --- workers
|
||
workers --- Longhorn
|
||
NFS -->|NFS mount| Media
|
||
DH -->|Docker| Media
|
||
```
|
||
|
||
---
|
||
|
||
## Hardware
|
||
|
||
| Layer | Host | Role | Resources |
|
||
|-------|------|------|-----------|
|
||
| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs |
|
||
| Physical | `lulu` | Proxmox node | k3s agents |
|
||
| Physical | `inko01` | Proxmox node | k3s server + agents + docker host |
|
||
| Physical | `naruto01` | Proxmox node | k3s server + agents |
|
||
| Physical | `mii01` | Proxmox node | k3s server + agents |
|
||
| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd + kube-vip VIP) | 2 vCPU · 4 GB RAM · 64 GB |
|
||
| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
|
||
| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
|
||
| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
|
||
| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin |
|
||
|
||
All VMs run Debian 12 on `virtio` network bridges, provisioned from cloud-init templates via Terraform + Ansible.
|
||
|
||
---
|
||
|
||
## Platform Stack
|
||
|
||
| Component | How deployed | Purpose |
|
||
|-----------|-------------|---------|
|
||
| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
|
||
| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git |
|
||
| **kube-vip** | DaemonSet on control plane | HA VIP for the K8s API server |
|
||
| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB |
|
||
| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
|
||
| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
|
||
| **Sealed Secrets** | Helm (ArgoCD) | Encrypts secrets for safe storage in Git |
|
||
| **Longhorn** | Helm (ArgoCD) | Distributed block storage (RWO + RWX) across all 14 agents |
|
||
| **CloudNativePG** | Operator (ArgoCD) | HA PostgreSQL — used by Immich |
|
||
| **Elastic Stack (ECK)** | Operator (ArgoCD) | Elasticsearch + Kibana + Fleet + Elastic Agents for observability |
|
||
| **Kube-Prometheus-Stack** | Helm (ArgoCD) | Prometheus + Grafana monitoring |
|
||
| **Goldilocks + VPA** | Helm (ArgoCD) | Resource usage analysis and request/limit rightsizing |
|
||
| **Istio (Ambient)** | Helm (ArgoCD) | Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet |
|
||
| **K3s Upgrade Controller** | Operator (ArgoCD) | Automated rolling K3s version upgrades |
|
||
| **mii-wireguard** | Manifest (ArgoCD) | WireGuard pod — connects cluster to edge VPS, masquerades service CIDR |
|
||
| **Newt** | Deployment (ArgoCD) | Pangolin tunnel client for VPS-proxied services |
|
||
| **Cloudflared** | Deployment ×2 (ArgoCD) | Cloudflare tunnel — exposes selected services to the internet |
|
||
|
||
---
|
||
|
||
## Applications
|
||
|
||
| Service | Description | Notable tech |
|
||
|---------|-------------|--------------|
|
||
| **Immich** | Photo & video backup (self-hosted Google Photos) | CloudNativePG · Redis · ML pod |
|
||
| **Vaultwarden** | Bitwarden-compatible password manager | – |
|
||
| **Paperless-ngx** | Document management + OCR | – |
|
||
| **Home Assistant** | Home automation hub | – |
|
||
| **N8n** | Workflow automation | – |
|
||
| **Ntfy** | Self-hosted push notifications | – |
|
||
| **Stirling PDF** | PDF tools | – |
|
||
| **Karakeep** | Bookmark manager | – |
|
||
| **Gitea** | Self-hosted Git (source of truth for ArgoCD) | Docker on docker-host11 |
|
||
| **Gitea Runner** | CI/CD runner | – |
|
||
| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | – |
|
||
| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
|
||
| **Download clients** | VPN-isolated download clients (×2) | Gluetun sidecar |
|
||
| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync |
|
||
|
||
---
|
||
|
||
## Design notes
|
||
|
||
Everything goes through Git. ArgoCD owns the cluster state; nothing gets `kubectl apply`'d directly. ArgoCD Image Updater handles the image update loop: when a new tag appears in the registry, it commits the change back to Git and ArgoCD picks it up from there.
|
||
|
||
Secrets are committed to Git too, encrypted via Sealed Secrets. Only the in-cluster controller holds the decryption key.
|
||
|
||
No ports are open on the home router. Internal load balancing goes through MetalLB + Traefik. External access uses Cloudflare tunnels or a WireGuard VPN routed through the edge VPS.
|
||
|
||
Longhorn handles block storage by replicating volumes across all 14 agent nodes. The media library lives on a dedicated NFS host instead — latency matters when Jellyfin is reading large video files, and NFS is simpler for that.
|
||
|
||
Metrics go to Prometheus + Grafana. Logs and fleet management go to Elastic Stack via the ECK operator, with Elastic Agents running as a DaemonSet so every node is covered.
|
||
|
||
All VMs are provisioned with Terraform and configured by Ansible. Rebuilding from scratch doesn't require remembering anything.
|
||
|
||
---
|
||
|
||
## Repo layout
|
||
|
||
```
|
||
ansible-homelab/
|
||
├── roles/
|
||
│ ├── common/ # base OS config, SSH hardening, node-exporter
|
||
│ ├── k3s_server/ # control plane install + NoSchedule taint
|
||
│ ├── k3s_agent/ # worker node install
|
||
│ ├── kube_vip/ # kube-vip DaemonSet + TLS SAN config
|
||
│ ├── docker_host/ # Docker + Intel QuickSync GPU passthrough
|
||
│ ├── proxmox/ # Proxmox node setup
|
||
│ └── edge_vps/ # VPS: WireGuard, Traefik, Pangolin, Elastic Agent
|
||
└── playbooks/
|
||
|
||
argocd-homelab/
|
||
├── infrastructure/ # MetalLB, Longhorn, Cert-Manager, ECK, Istio, ...
|
||
├── services/ # Immich, Vaultwarden, arr-stack, Home Assistant, ...
|
||
└── cluster-apps/ # ArgoCD App-of-Apps root + ApplicationSets
|
||
```
|