commit 8b7554630537f2286c6887bca0e2f6dcd5eb0a6b Author: Tuan-Dat Tran Date: Tue Apr 28 00:36:53 2026 +0200 init Signed-off-by: Tuan-Dat Tran diff --git a/README.md b/README.md new file mode 100644 index 0000000..2ed4cd4 --- /dev/null +++ b/README.md @@ -0,0 +1,173 @@ +# Homelab + +A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps. + +![k3s](https://img.shields.io/badge/k3s-v1.34-orange?logo=kubernetes) +![nodes](https://img.shields.io/badge/nodes-17-blue) +![ArgoCD](https://img.shields.io/badge/GitOps-ArgoCD-red?logo=argo) +![Ansible](https://img.shields.io/badge/provisioned-Ansible-black?logo=ansible) + +--- + +## Architecture + +```mermaid +graph TB + subgraph ext[" External"] + CF["Cloudflare\nCDN + DNS"] + Admin["Remote Admin"] + end + + subgraph vps["Edge VPS"] + WG["WireGuard\nVPN Gateway"] + TraefikVPS["Traefik\nReverse Proxy"] + Pangolin["Pangolin\nTunnel Server"] + end + + subgraph proxmox["Proxmox Cluster — 5 physical nodes"] + subgraph cp["Control Plane ×3 (HA etcd)"] + S["k3s-server"] + end + LB["nginx\nLoad Balancer"] + subgraph workers["Worker Nodes ×14"] + W["k3s-agent"] + end + DH["docker-host\nIntel QuickSync GPU"] + NFS["NFS Server\nDedicated storage node"] + end + + subgraph k8s["Kubernetes"] + subgraph platform["Platform layer"] + direction LR + MetalLB["MetalLB"] + Traefik["Traefik"] + Longhorn["Longhorn"] + ArgoCD["ArgoCD"] + Prometheus["Prometheus\n+ Grafana"] + ECK["Elastic Stack\n(ECK)"] + Istio["Istio\nAmbient"] + end + subgraph apps["Applications"] + direction LR + Immich["Immich"] + VW["Vaultwarden"] + HA["Home Assistant"] + Media["Arr Stack\n+ Jellyfin"] + Other["Paperless · N8n\nNtfy · Gitea · …"] + end + end + + Admin -->|WireGuard VPN| WG + WG -->|tunnel| k8s + CF -->|Cloudflare tunnel| k8s + TraefikVPS --> Pangolin + Pangolin -->|Newt client| k8s + LB --> cp + cp --- workers + workers --- Longhorn + NFS -->|NFS mount| Media + DH -->|Jellyfin\nDocker| Media +``` + +--- + +## Hardware + +| Layer | Host | Role | Resources | +|-------|------|------|-----------| +| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs | +| Physical | `lulu` | Proxmox node | k3s agents | +| Physical | `inko01` | Proxmox node | k3s server + agents + docker host | +| Physical | `naruto01` | Proxmox node | k3s server + agents + LB | +| Physical | `mii01` | Proxmox node | k3s server + agents | +| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd) | 2 vCPU · 4 GB RAM · 64 GB | +| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB | +| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync | +| VM | `k3s-loadbalancer` | nginx LB fronting control plane | 1 vCPU · 2 GB RAM | +| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM | +| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin | + +All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-init templates via **Terraform + Ansible**. + +--- + +## Platform Stack + +| Component | How deployed | Purpose | +|-----------|-------------|---------| +| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git | +| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git | +| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB | +| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool | +| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) | +| **Sealed Secrets** | Helm (ArgoCD) | Encrypts secrets for safe storage in Git | +| **Longhorn** | Helm (ArgoCD) | Distributed block storage (RWO + RWX) across all 14 agents | +| **CloudNativePG** | Operator (ArgoCD) | HA PostgreSQL — used by Immich | +| **Elastic Stack (ECK)** | Operator (ArgoCD) | Elasticsearch + Kibana + Fleet + Elastic Agents for observability | +| **Kube-Prometheus-Stack** | Helm (ArgoCD) | Prometheus + Grafana monitoring | +| **Goldilocks + VPA** | Helm (ArgoCD) | Resource usage analysis and request/limit rightsizing | +| **Istio (Ambient)** | Helm (ArgoCD) | Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet | +| **K3s Upgrade Controller** | Operator (ArgoCD) | Automated rolling K3s version upgrades | +| **mii-wireguard** | Manifest (ArgoCD) | WireGuard pod — connects cluster to edge VPS, masquerades service CIDR | +| **Newt** | Deployment (ArgoCD) | Pangolin tunnel client for VPS-proxied services | +| **Cloudflared** | Deployment ×2 (ArgoCD) | Cloudflare tunnel — exposes selected services to the internet | + +--- + +## Applications + +| Service | Description | Notable tech | +|---------|-------------|--------------| +| **Immich** | Photo & video backup (self-hosted Google Photos) | CloudNativePG · Redis · ML pod | +| **Vaultwarden** | Bitwarden-compatible password manager | – | +| **Paperless-ngx** | Document management + OCR | – | +| **Home Assistant** | Home automation hub | – | +| **N8n** | Workflow automation | – | +| **Ntfy** | Self-hosted push notifications | – | +| **Stirling PDF** | PDF tools | – | +| **Karakeep** | Bookmark manager | – | +| **Gitea** | Self-hosted Git (source of truth for ArgoCD) | Docker on docker-host11 | +| **Gitea Runner** | CI/CD runner | – | +| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | – | +| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr | +| **qBittorrent** | Torrent clients (×2) | Gluetun VPN sidecar · ProtonVPN | +| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync | + +--- + +## Key Design Decisions + +**GitOps end-to-end.** Every cluster resource is declared in Git and applied by ArgoCD. Nothing is `kubectl apply`'d by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically. + +**Secrets in Git, safely.** Sealed Secrets lets encrypted `SealedSecret` manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them. + +**No cloud dependency for ingress.** MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router. + +**Distributed storage without a SAN.** Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency. + +**Observability from day one.** Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster. + +**Provisioning is reproducible.** Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management. + +--- + +## Repository Layout + +``` +ansible-homelab/ # Ansible roles + playbooks for all VM provisioning +├── roles/ +│ ├── common/ # Base OS config, SSH hardening, node-exporter +│ ├── k3s_server/ # HA control plane install + taint config +│ ├── k3s_agent/ # Worker node install +│ ├── k3s_loadbalancer/ # nginx LB config +│ ├── kube_vip/ # VIP setup +│ ├── docker_host/ # Docker + GPU passthrough +│ ├── proxmox/ # Proxmox node config +│ └── edge_vps/ # VPS services (WireGuard, Traefik, Pangolin) +└── playbooks/ # Top-level playbooks per host group + +argocd-homelab/ # All Kubernetes manifests (ArgoCD App-of-Apps) +├── infrastructure/ # Platform: MetalLB, Longhorn, Cert-Manager, ECK, … +├── services/ # Applications: Immich, Vaultwarden, arr-stack, … +└── cluster-apps/ # ArgoCD ApplicationSets + root app +```