Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>
This commit is contained in:
Tuan-Dat Tran
2026-04-28 00:36:53 +02:00
commit 8b75546305

173
README.md Normal file
View File

@@ -0,0 +1,173 @@
# Homelab
A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps.
![k3s](https://img.shields.io/badge/k3s-v1.34-orange?logo=kubernetes)
![nodes](https://img.shields.io/badge/nodes-17-blue)
![ArgoCD](https://img.shields.io/badge/GitOps-ArgoCD-red?logo=argo)
![Ansible](https://img.shields.io/badge/provisioned-Ansible-black?logo=ansible)
---
## Architecture
```mermaid
graph TB
subgraph ext[" External"]
CF["Cloudflare\nCDN + DNS"]
Admin["Remote Admin"]
end
subgraph vps["Edge VPS"]
WG["WireGuard\nVPN Gateway"]
TraefikVPS["Traefik\nReverse Proxy"]
Pangolin["Pangolin\nTunnel Server"]
end
subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
subgraph cp["Control Plane ×3 (HA etcd)"]
S["k3s-server"]
end
LB["nginx\nLoad Balancer"]
subgraph workers["Worker Nodes ×14"]
W["k3s-agent"]
end
DH["docker-host\nIntel QuickSync GPU"]
NFS["NFS Server\nDedicated storage node"]
end
subgraph k8s["Kubernetes"]
subgraph platform["Platform layer"]
direction LR
MetalLB["MetalLB"]
Traefik["Traefik"]
Longhorn["Longhorn"]
ArgoCD["ArgoCD"]
Prometheus["Prometheus\n+ Grafana"]
ECK["Elastic Stack\n(ECK)"]
Istio["Istio\nAmbient"]
end
subgraph apps["Applications"]
direction LR
Immich["Immich"]
VW["Vaultwarden"]
HA["Home Assistant"]
Media["Arr Stack\n+ Jellyfin"]
Other["Paperless · N8n\nNtfy · Gitea · …"]
end
end
Admin -->|WireGuard VPN| WG
WG -->|tunnel| k8s
CF -->|Cloudflare tunnel| k8s
TraefikVPS --> Pangolin
Pangolin -->|Newt client| k8s
LB --> cp
cp --- workers
workers --- Longhorn
NFS -->|NFS mount| Media
DH -->|Jellyfin\nDocker| Media
```
---
## Hardware
| Layer | Host | Role | Resources |
|-------|------|------|-----------|
| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs |
| Physical | `lulu` | Proxmox node | k3s agents |
| Physical | `inko01` | Proxmox node | k3s server + agents + docker host |
| Physical | `naruto01` | Proxmox node | k3s server + agents + LB |
| Physical | `mii01` | Proxmox node | k3s server + agents |
| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd) | 2 vCPU · 4 GB RAM · 64 GB |
| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
| VM | `k3s-loadbalancer` | nginx LB fronting control plane | 1 vCPU · 2 GB RAM |
| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin |
All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-init templates via **Terraform + Ansible**.
---
## Platform Stack
| Component | How deployed | Purpose |
|-----------|-------------|---------|
| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git |
| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB |
| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
| **Sealed Secrets** | Helm (ArgoCD) | Encrypts secrets for safe storage in Git |
| **Longhorn** | Helm (ArgoCD) | Distributed block storage (RWO + RWX) across all 14 agents |
| **CloudNativePG** | Operator (ArgoCD) | HA PostgreSQL — used by Immich |
| **Elastic Stack (ECK)** | Operator (ArgoCD) | Elasticsearch + Kibana + Fleet + Elastic Agents for observability |
| **Kube-Prometheus-Stack** | Helm (ArgoCD) | Prometheus + Grafana monitoring |
| **Goldilocks + VPA** | Helm (ArgoCD) | Resource usage analysis and request/limit rightsizing |
| **Istio (Ambient)** | Helm (ArgoCD) | Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet |
| **K3s Upgrade Controller** | Operator (ArgoCD) | Automated rolling K3s version upgrades |
| **mii-wireguard** | Manifest (ArgoCD) | WireGuard pod — connects cluster to edge VPS, masquerades service CIDR |
| **Newt** | Deployment (ArgoCD) | Pangolin tunnel client for VPS-proxied services |
| **Cloudflared** | Deployment ×2 (ArgoCD) | Cloudflare tunnel — exposes selected services to the internet |
---
## Applications
| Service | Description | Notable tech |
|---------|-------------|--------------|
| **Immich** | Photo & video backup (self-hosted Google Photos) | CloudNativePG · Redis · ML pod |
| **Vaultwarden** | Bitwarden-compatible password manager | |
| **Paperless-ngx** | Document management + OCR | |
| **Home Assistant** | Home automation hub | |
| **N8n** | Workflow automation | |
| **Ntfy** | Self-hosted push notifications | |
| **Stirling PDF** | PDF tools | |
| **Karakeep** | Bookmark manager | |
| **Gitea** | Self-hosted Git (source of truth for ArgoCD) | Docker on docker-host11 |
| **Gitea Runner** | CI/CD runner | |
| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | |
| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
| **qBittorrent** | Torrent clients (×2) | Gluetun VPN sidecar · ProtonVPN |
| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync |
---
## Key Design Decisions
**GitOps end-to-end.** Every cluster resource is declared in Git and applied by ArgoCD. Nothing is `kubectl apply`'d by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically.
**Secrets in Git, safely.** Sealed Secrets lets encrypted `SealedSecret` manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them.
**No cloud dependency for ingress.** MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router.
**Distributed storage without a SAN.** Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency.
**Observability from day one.** Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster.
**Provisioning is reproducible.** Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management.
---
## Repository Layout
```
ansible-homelab/ # Ansible roles + playbooks for all VM provisioning
├── roles/
│ ├── common/ # Base OS config, SSH hardening, node-exporter
│ ├── k3s_server/ # HA control plane install + taint config
│ ├── k3s_agent/ # Worker node install
│ ├── k3s_loadbalancer/ # nginx LB config
│ ├── kube_vip/ # VIP setup
│ ├── docker_host/ # Docker + GPU passthrough
│ ├── proxmox/ # Proxmox node config
│ └── edge_vps/ # VPS services (WireGuard, Traefik, Pangolin)
└── playbooks/ # Top-level playbooks per host group
argocd-homelab/ # All Kubernetes manifests (ArgoCD App-of-Apps)
├── infrastructure/ # Platform: MetalLB, Longhorn, Cert-Manager, ECK, …
├── services/ # Applications: Immich, Vaultwarden, arr-stack, …
└── cluster-apps/ # ArgoCD ApplicationSets + root app
```