Files
homelab-docs/README.md

7.6 KiB
Raw Blame History

Homelab

A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps.

k3s nodes ArgoCD Ansible


Architecture

graph TB
    subgraph ext[" External"]
        CF["Cloudflare CDN"]
        Admin["Remote Admin"]
    end

    subgraph vps["Edge VPS"]
        WG["WireGuard VPN Gateway"]
        TraefikVPS["Traefik"]
        Pangolin["Pangolin Tunnel Server"]
    end

    subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
        subgraph cp["Control Plane x3  —  HA etcd + kube-vip"]
            S["k3s-server"]
        end
        subgraph workers["Worker Nodes x14"]
            W["k3s-agent"]
        end
        DH["docker-host — Intel QuickSync GPU"]
        NFS["NFS Server — dedicated storage node"]
    end

    subgraph k8s["Kubernetes"]
        subgraph platform["Platform"]
            direction LR
            MetalLB
            Traefik
            Longhorn
            ArgoCD
            Prometheus
            ECK["Elastic Stack"]
            Istio["Istio Ambient"]
        end
        subgraph apps["Applications"]
            direction LR
            Immich
            VW["Vaultwarden"]
            HA["Home Assistant"]
            Media["Arr Stack + Jellyfin"]
            Other["Paperless, N8n, Ntfy ..."]
        end
    end

    Admin -->|WireGuard VPN| WG
    WG -->|tunnel| k8s
    CF -->|Cloudflare tunnel| k8s
    TraefikVPS --> Pangolin
    Pangolin -->|Newt client| k8s
    cp --- workers
    workers --- Longhorn
    NFS -->|NFS mount| Media
    DH -->|Docker| Media

Hardware

Layer Host Role Resources
Physical aya01 Proxmox node + NFS server Dedicated storage — no VMs
Physical lulu Proxmox node k3s agents
Physical inko01 Proxmox node k3s server + agents + docker host
Physical naruto01 Proxmox node k3s server + agents
Physical mii01 Proxmox node k3s server + agents
VM k3s-server-{10,11,12} K3s control plane (HA etcd + kube-vip VIP) 2 vCPU · 4 GB RAM · 64 GB
VM k3s-agent-{10…23} K3s worker nodes ×14 2 vCPU · 4 GB RAM · 128 GB
VM docker-host11 Docker host w/ GPU passthrough 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync
VM docker-lb Caddy reverse proxy (LAN only) 1 vCPU · 2 GB RAM
VPS mii Edge node (Netcup) WireGuard · Traefik · Pangolin

All VMs run Debian 12 on virtio network bridges, provisioned from cloud-init templates via Terraform + Ansible.


Platform Stack

Component How deployed Purpose
ArgoCD Helm (App-of-Apps) GitOps CD — all cluster state driven from Git
ArgoCD Image Updater Helm Watches registries, commits updated image tags back to Git
kube-vip DaemonSet on control plane HA VIP for the K8s API server
Traefik k3s built-in Ingress controller, fronted by MetalLB
MetalLB Helm (ArgoCD) Bare-metal load balancer, assigns IPs from reserved pool
Cert-Manager Helm (ArgoCD) Automated TLS via Let's Encrypt DNS-01 (Cloudflare API)
Sealed Secrets Helm (ArgoCD) Encrypts secrets for safe storage in Git
Longhorn Helm (ArgoCD) Distributed block storage (RWO + RWX) across all 14 agents
CloudNativePG Operator (ArgoCD) HA PostgreSQL — used by Immich
Elastic Stack (ECK) Operator (ArgoCD) Elasticsearch + Kibana + Fleet + Elastic Agents for observability
Kube-Prometheus-Stack Helm (ArgoCD) Prometheus + Grafana monitoring
Goldilocks + VPA Helm (ArgoCD) Resource usage analysis and request/limit rightsizing
Istio (Ambient) Helm (ArgoCD) Service mesh — ztunnel DaemonSet on all nodes (L4); no Waypoint proxies yet
K3s Upgrade Controller Operator (ArgoCD) Automated rolling K3s version upgrades
mii-wireguard Manifest (ArgoCD) WireGuard pod — connects cluster to edge VPS, masquerades service CIDR
Newt Deployment (ArgoCD) Pangolin tunnel client for VPS-proxied services
Cloudflared Deployment ×2 (ArgoCD) Cloudflare tunnel — exposes selected services to the internet

Applications

Service Description Notable tech
Immich Photo & video backup (self-hosted Google Photos) CloudNativePG · Redis · ML pod
Vaultwarden Bitwarden-compatible password manager
Paperless-ngx Document management + OCR
Home Assistant Home automation hub
N8n Workflow automation
Ntfy Self-hosted push notifications
Stirling PDF PDF tools
Karakeep Bookmark manager
Gitea Self-hosted Git (source of truth for ArgoCD) Docker on docker-host11
Gitea Runner CI/CD runner
Zeroclaw Per-user instances (×3) via Kustomize overlays
Arr Stack Media automation suite Prowlarr · Sonarr · Radarr · Unpackarr
qBittorrent Torrent clients (×2) with VPN isolation Gluetun sidecar
Jellyfin Media server with hardware transcoding Docker · Intel QuickSync

Key Design Decisions

GitOps end-to-end. Every cluster resource is declared in Git and applied by ArgoCD. Nothing is kubectl apply'd by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically.

Secrets in Git, safely. Sealed Secrets lets encrypted SealedSecret manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them.

No cloud dependency for ingress. MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router.

Distributed storage without a SAN. Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency.

Observability from day one. Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster.

Provisioning is reproducible. Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management.


Repository Layout

ansible-homelab/          # Ansible roles + playbooks for all VM provisioning
├── roles/
│   ├── common/           # Base OS config, SSH hardening, node-exporter
│   ├── k3s_server/       # HA control plane install + taint config
│   ├── k3s_agent/        # Worker node install
│   ├── kube_vip/         # HA VIP (kube-vip DaemonSet on control plane nodes)
│   ├── docker_host/      # Docker + GPU passthrough
│   ├── proxmox/          # Proxmox node config
│   └── edge_vps/         # VPS services (WireGuard, Traefik, Pangolin)
└── playbooks/            # Top-level playbooks per host group

argocd-homelab/           # All Kubernetes manifests (ArgoCD App-of-Apps)
├── infrastructure/       # Platform: MetalLB, Longhorn, Cert-Manager, ECK, …
├── services/             # Applications: Immich, Vaultwarden, arr-stack, …
└── cluster-apps/         # ArgoCD ApplicationSets + root app