Compare commits
4 Commits
4563ef83f1
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
40fa132e0d | ||
|
|
c48ced6207 | ||
|
|
3ac7d91101 | ||
|
|
a187b648e7 |
97
README.md
97
README.md
@@ -1,6 +1,6 @@
|
|||||||
# Homelab
|
# Homelab
|
||||||
|
|
||||||
A production-grade homelab running on bare-metal Proxmox, with a 17-node Kubernetes cluster managed entirely through GitOps.
|
17-node Kubernetes cluster on five bare-metal Proxmox hosts, provisioned with Terraform and Ansible, managed through ArgoCD GitOps. Runs my home automation, media stack, photo backup, documents, and a few side projects.
|
||||||
|
|
||||||

|

|
||||||

|

|
||||||
@@ -14,46 +14,45 @@ A production-grade homelab running on bare-metal Proxmox, with a 17-node Kuberne
|
|||||||
```mermaid
|
```mermaid
|
||||||
graph TB
|
graph TB
|
||||||
subgraph ext[" External"]
|
subgraph ext[" External"]
|
||||||
CF["Cloudflare\nCDN + DNS"]
|
CF["Cloudflare CDN"]
|
||||||
Admin["Remote Admin"]
|
Admin["Remote Admin"]
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph vps["Edge VPS"]
|
subgraph vps["Edge VPS"]
|
||||||
WG["WireGuard\nVPN Gateway"]
|
WG["WireGuard VPN Gateway"]
|
||||||
TraefikVPS["Traefik\nReverse Proxy"]
|
TraefikVPS["Traefik"]
|
||||||
Pangolin["Pangolin\nTunnel Server"]
|
Pangolin["Pangolin Tunnel Server"]
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
|
subgraph proxmox["Proxmox Cluster — 5 physical nodes"]
|
||||||
subgraph cp["Control Plane ×3 (HA etcd)"]
|
subgraph cp["Control Plane x3 — HA etcd + kube-vip"]
|
||||||
S["k3s-server"]
|
S["k3s-server"]
|
||||||
end
|
end
|
||||||
LB["nginx\nLoad Balancer"]
|
subgraph workers["Worker Nodes x14"]
|
||||||
subgraph workers["Worker Nodes ×14"]
|
|
||||||
W["k3s-agent"]
|
W["k3s-agent"]
|
||||||
end
|
end
|
||||||
DH["docker-host\nIntel QuickSync GPU"]
|
DH["docker-host — Intel QuickSync GPU"]
|
||||||
NFS["NFS Server\nDedicated storage node"]
|
NFS["NFS Server — dedicated storage node"]
|
||||||
end
|
end
|
||||||
|
|
||||||
subgraph k8s["Kubernetes"]
|
subgraph k8s["Kubernetes"]
|
||||||
subgraph platform["Platform layer"]
|
subgraph platform["Platform"]
|
||||||
direction LR
|
direction LR
|
||||||
MetalLB["MetalLB"]
|
MetalLB
|
||||||
Traefik["Traefik"]
|
Traefik
|
||||||
Longhorn["Longhorn"]
|
Longhorn
|
||||||
ArgoCD["ArgoCD"]
|
ArgoCD
|
||||||
Prometheus["Prometheus\n+ Grafana"]
|
Prometheus
|
||||||
ECK["Elastic Stack\n(ECK)"]
|
ECK["Elastic Stack"]
|
||||||
Istio["Istio\nAmbient"]
|
Istio["Istio Ambient"]
|
||||||
end
|
end
|
||||||
subgraph apps["Applications"]
|
subgraph apps["Applications"]
|
||||||
direction LR
|
direction LR
|
||||||
Immich["Immich"]
|
Immich
|
||||||
VW["Vaultwarden"]
|
VW["Vaultwarden"]
|
||||||
HA["Home Assistant"]
|
HA["Home Assistant"]
|
||||||
Media["Arr Stack\n+ Jellyfin"]
|
Media["Arr Stack + Jellyfin"]
|
||||||
Other["Paperless · N8n\nNtfy · Gitea · …"]
|
Other["Paperless, N8n, Ntfy ..."]
|
||||||
end
|
end
|
||||||
end
|
end
|
||||||
|
|
||||||
@@ -62,11 +61,10 @@ graph TB
|
|||||||
CF -->|Cloudflare tunnel| k8s
|
CF -->|Cloudflare tunnel| k8s
|
||||||
TraefikVPS --> Pangolin
|
TraefikVPS --> Pangolin
|
||||||
Pangolin -->|Newt client| k8s
|
Pangolin -->|Newt client| k8s
|
||||||
LB --> cp
|
|
||||||
cp --- workers
|
cp --- workers
|
||||||
workers --- Longhorn
|
workers --- Longhorn
|
||||||
NFS -->|NFS mount| Media
|
NFS -->|NFS mount| Media
|
||||||
DH -->|Jellyfin\nDocker| Media
|
DH -->|Docker| Media
|
||||||
```
|
```
|
||||||
|
|
||||||
---
|
---
|
||||||
@@ -78,16 +76,15 @@ graph TB
|
|||||||
| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs |
|
| Physical | `aya01` | Proxmox node + NFS server | Dedicated storage — no VMs |
|
||||||
| Physical | `lulu` | Proxmox node | k3s agents |
|
| Physical | `lulu` | Proxmox node | k3s agents |
|
||||||
| Physical | `inko01` | Proxmox node | k3s server + agents + docker host |
|
| Physical | `inko01` | Proxmox node | k3s server + agents + docker host |
|
||||||
| Physical | `naruto01` | Proxmox node | k3s server + agents + LB |
|
| Physical | `naruto01` | Proxmox node | k3s server + agents |
|
||||||
| Physical | `mii01` | Proxmox node | k3s server + agents |
|
| Physical | `mii01` | Proxmox node | k3s server + agents |
|
||||||
| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd) | 2 vCPU · 4 GB RAM · 64 GB |
|
| VM | `k3s-server-{10,11,12}` | K3s control plane (HA etcd + kube-vip VIP) | 2 vCPU · 4 GB RAM · 64 GB |
|
||||||
| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
|
| VM | `k3s-agent-{10…23}` | K3s worker nodes ×14 | 2 vCPU · 4 GB RAM · 128 GB |
|
||||||
| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
|
| VM | `docker-host11` | Docker host w/ GPU passthrough | 2 vCPU · 4 GB RAM · 192 GB · Intel QuickSync |
|
||||||
| VM | `k3s-loadbalancer` | nginx LB fronting control plane | 1 vCPU · 2 GB RAM |
|
|
||||||
| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
|
| VM | `docker-lb` | Caddy reverse proxy (LAN only) | 1 vCPU · 2 GB RAM |
|
||||||
| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin |
|
| VPS | `mii` | Edge node (Netcup) | WireGuard · Traefik · Pangolin |
|
||||||
|
|
||||||
All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-init templates via **Terraform + Ansible**.
|
All VMs run Debian 12 on `virtio` network bridges, provisioned from cloud-init templates via Terraform + Ansible.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
@@ -97,6 +94,7 @@ All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-in
|
|||||||
|-----------|-------------|---------|
|
|-----------|-------------|---------|
|
||||||
| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
|
| **ArgoCD** | Helm (App-of-Apps) | GitOps CD — all cluster state driven from Git |
|
||||||
| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git |
|
| **ArgoCD Image Updater** | Helm | Watches registries, commits updated image tags back to Git |
|
||||||
|
| **kube-vip** | DaemonSet on control plane | HA VIP for the K8s API server |
|
||||||
| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB |
|
| **Traefik** | k3s built-in | Ingress controller, fronted by MetalLB |
|
||||||
| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
|
| **MetalLB** | Helm (ArgoCD) | Bare-metal load balancer, assigns IPs from reserved pool |
|
||||||
| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
|
| **Cert-Manager** | Helm (ArgoCD) | Automated TLS via Let's Encrypt DNS-01 (Cloudflare API) |
|
||||||
@@ -130,44 +128,43 @@ All VMs run **Debian 12** on `virtio` network bridges, provisioned from cloud-in
|
|||||||
| **Gitea Runner** | CI/CD runner | – |
|
| **Gitea Runner** | CI/CD runner | – |
|
||||||
| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | – |
|
| **Zeroclaw** | Per-user instances (×3) via Kustomize overlays | – |
|
||||||
| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
|
| **Arr Stack** | Media automation suite | Prowlarr · Sonarr · Radarr · Unpackarr |
|
||||||
| **qBittorrent** | Torrent clients (×2) | Gluetun VPN sidecar · ProtonVPN |
|
| **Download clients** | VPN-isolated download clients (×2) | Gluetun sidecar |
|
||||||
| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync |
|
| **Jellyfin** | Media server with hardware transcoding | Docker · Intel QuickSync |
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Key Design Decisions
|
## Design notes
|
||||||
|
|
||||||
**GitOps end-to-end.** Every cluster resource is declared in Git and applied by ArgoCD. Nothing is `kubectl apply`'d by hand. ArgoCD Image Updater closes the loop by writing image tag updates back to Git automatically.
|
Everything goes through Git. ArgoCD owns the cluster state; nothing gets `kubectl apply`'d directly. ArgoCD Image Updater handles the image update loop: when a new tag appears in the registry, it commits the change back to Git and ArgoCD picks it up from there.
|
||||||
|
|
||||||
**Secrets in Git, safely.** Sealed Secrets lets encrypted `SealedSecret` manifests live in the same repo as everything else. Only the in-cluster controller can decrypt them.
|
Secrets are committed to Git too, encrypted via Sealed Secrets. Only the in-cluster controller holds the decryption key.
|
||||||
|
|
||||||
**No cloud dependency for ingress.** MetalLB + Traefik handles all internal load balancing. External access goes through Cloudflare tunnels or a WireGuard VPN — no ports open on the home router.
|
No ports are open on the home router. Internal load balancing goes through MetalLB + Traefik. External access uses Cloudflare tunnels or a WireGuard VPN routed through the edge VPS.
|
||||||
|
|
||||||
**Distributed storage without a SAN.** Longhorn replicates volumes across all 14 agent nodes. NFS on a dedicated bare-metal host serves the media library to Jellyfin with low latency.
|
Longhorn handles block storage by replicating volumes across all 14 agent nodes. The media library lives on a dedicated NFS host instead — latency matters when Jellyfin is reading large video files, and NFS is simpler for that.
|
||||||
|
|
||||||
**Observability from day one.** Prometheus + Grafana for metrics, Elastic Stack (via ECK operator) for logs and fleet management. Elastic Agents run as a DaemonSet across the whole cluster.
|
Metrics go to Prometheus + Grafana. Logs and fleet management go to Elastic Stack via the ECK operator, with Elastic Agents running as a DaemonSet so every node is covered.
|
||||||
|
|
||||||
**Provisioning is reproducible.** Proxmox VMs are created via Terraform (Proxmox provider), then configured by Ansible roles — from base OS hardening to k3s installation and kubeconfig management.
|
All VMs are provisioned with Terraform and configured by Ansible. Rebuilding from scratch doesn't require remembering anything.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Repository Layout
|
## Repo layout
|
||||||
|
|
||||||
```
|
```
|
||||||
ansible-homelab/ # Ansible roles + playbooks for all VM provisioning
|
ansible-homelab/
|
||||||
├── roles/
|
├── roles/
|
||||||
│ ├── common/ # Base OS config, SSH hardening, node-exporter
|
│ ├── common/ # base OS config, SSH hardening, node-exporter
|
||||||
│ ├── k3s_server/ # HA control plane install + taint config
|
│ ├── k3s_server/ # control plane install + NoSchedule taint
|
||||||
│ ├── k3s_agent/ # Worker node install
|
│ ├── k3s_agent/ # worker node install
|
||||||
│ ├── k3s_loadbalancer/ # nginx LB config
|
│ ├── kube_vip/ # kube-vip DaemonSet + TLS SAN config
|
||||||
│ ├── kube_vip/ # VIP setup
|
│ ├── docker_host/ # Docker + Intel QuickSync GPU passthrough
|
||||||
│ ├── docker_host/ # Docker + GPU passthrough
|
│ ├── proxmox/ # Proxmox node setup
|
||||||
│ ├── proxmox/ # Proxmox node config
|
│ └── edge_vps/ # VPS: WireGuard, Traefik, Pangolin, Elastic Agent
|
||||||
│ └── edge_vps/ # VPS services (WireGuard, Traefik, Pangolin)
|
└── playbooks/
|
||||||
└── playbooks/ # Top-level playbooks per host group
|
|
||||||
|
|
||||||
argocd-homelab/ # All Kubernetes manifests (ArgoCD App-of-Apps)
|
argocd-homelab/
|
||||||
├── infrastructure/ # Platform: MetalLB, Longhorn, Cert-Manager, ECK, …
|
├── infrastructure/ # MetalLB, Longhorn, Cert-Manager, ECK, Istio, ...
|
||||||
├── services/ # Applications: Immich, Vaultwarden, arr-stack, …
|
├── services/ # Immich, Vaultwarden, arr-stack, Home Assistant, ...
|
||||||
└── cluster-apps/ # ArgoCD ApplicationSets + root app
|
└── cluster-apps/ # ArgoCD App-of-Apps root + ApplicationSets
|
||||||
```
|
```
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# Networking
|
# Networking
|
||||||
|
|
||||||
## IP Layout
|
## IP layout
|
||||||
|
|
||||||
| Segment | Range | Purpose |
|
| Segment | Range | Purpose |
|
||||||
|---------|-------|---------|
|
|---------|-------|---------|
|
||||||
@@ -12,7 +12,7 @@
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Traffic Flows
|
## Traffic flows
|
||||||
|
|
||||||
### Public services (Cloudflare tunnel)
|
### Public services (Cloudflare tunnel)
|
||||||
|
|
||||||
@@ -20,7 +20,7 @@
|
|||||||
User → Cloudflare (CDN + DDoS) → Cloudflared pod (×2, in-cluster) → Traefik → Service
|
User → Cloudflare (CDN + DDoS) → Cloudflared pod (×2, in-cluster) → Traefik → Service
|
||||||
```
|
```
|
||||||
|
|
||||||
Cloudflare acts as both CDN and the TLS termination point for public services. No ports are forwarded on the home router.
|
Cloudflare handles CDN and TLS termination. No ports are forwarded on the home router.
|
||||||
|
|
||||||
### VPS-proxied services (Pangolin tunnel)
|
### VPS-proxied services (Pangolin tunnel)
|
||||||
|
|
||||||
@@ -38,7 +38,7 @@ Admin → WireGuard client → Edge VPS (WireGuard server)
|
|||||||
→ K8s service CIDR (10.43.0.0/16)
|
→ K8s service CIDR (10.43.0.0/16)
|
||||||
```
|
```
|
||||||
|
|
||||||
The `mii-wireguard` pod acts as the WireGuard client inside the cluster. It masquerades the K8s service CIDR so all cluster services are reachable over the VPN — no split-DNS required.
|
The `mii-wireguard` pod is the WireGuard client inside the cluster. It masquerades the K8s service CIDR so all cluster services are reachable over the VPN without split-DNS.
|
||||||
|
|
||||||
### Gitea → ArgoCD webhook
|
### Gitea → ArgoCD webhook
|
||||||
|
|
||||||
@@ -46,7 +46,7 @@ The `mii-wireguard` pod acts as the WireGuard client inside the cluster. It masq
|
|||||||
Gitea (docker-host11) → push webhook → ArgoCD (in-cluster) → reconcile manifests
|
Gitea (docker-host11) → push webhook → ArgoCD (in-cluster) → reconcile manifests
|
||||||
```
|
```
|
||||||
|
|
||||||
ArgoCD polls on a schedule and also receives webhooks from the self-hosted Gitea instance on git push.
|
ArgoCD polls on a schedule and also receives webhooks on git push.
|
||||||
|
|
||||||
### ArgoCD Image Updater → Gitea
|
### ArgoCD Image Updater → Gitea
|
||||||
|
|
||||||
@@ -63,7 +63,7 @@ Keeps image versions in Git without a human in the loop.
|
|||||||
```
|
```
|
||||||
Prowlarr (indexer aggregator)
|
Prowlarr (indexer aggregator)
|
||||||
→ Sonarr / Radarr (request management)
|
→ Sonarr / Radarr (request management)
|
||||||
→ qBittorrent + Gluetun sidecar (download over ProtonVPN)
|
→ download client + Gluetun sidecar (VPN-isolated)
|
||||||
→ Unpackarr (extract archives)
|
→ Unpackarr (extract archives)
|
||||||
→ NFS share on aya01
|
→ NFS share on aya01
|
||||||
→ Jellyfin (on docker-host11, hardware transcoding via Intel QuickSync)
|
→ Jellyfin (on docker-host11, hardware transcoding via Intel QuickSync)
|
||||||
@@ -71,14 +71,14 @@ Prowlarr (indexer aggregator)
|
|||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Certificate Management
|
## Certificate management
|
||||||
|
|
||||||
Cert-Manager handles all TLS automatically via **Let's Encrypt DNS-01** using the Cloudflare API. No HTTP-01 challenges — DNS-01 works for internal-only domains and wildcard certs.
|
Cert-Manager handles all TLS automatically via Let's Encrypt DNS-01 using the Cloudflare API. DNS-01 works for internal-only domains and wildcard certs without exposing any HTTP endpoint.
|
||||||
|
|
||||||
The edge VPS (Traefik) uses Netcup DNS API for its own certs.
|
The edge VPS uses the Netcup DNS API for its own certs.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Service Mesh
|
## Service mesh
|
||||||
|
|
||||||
Istio runs in **Ambient mode** (no sidecars). The `ztunnel` DaemonSet runs on every node and handles transparent L4 proxying for all pods in the mesh. Waypoint proxies (L7) are not yet deployed.
|
Istio runs in Ambient mode — no sidecars. The `ztunnel` DaemonSet runs on every node and handles transparent L4 proxying for all pods in the mesh. Waypoint proxies (L7) are not yet deployed.
|
||||||
|
|||||||
@@ -1,24 +1,24 @@
|
|||||||
# Observability
|
# Observability
|
||||||
|
|
||||||
Two parallel stacks cover metrics and logs.
|
Two parallel stacks: Prometheus for metrics, Elastic for logs.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Metrics — Prometheus + Grafana
|
## Metrics
|
||||||
|
|
||||||
Deployed via the **kube-prometheus-stack** Helm chart (ArgoCD-managed), running in the `prometheus` namespace.
|
kube-prometheus-stack runs in the `prometheus` namespace (ArgoCD-managed). Prometheus scrapes all nodes, pods, and control plane components. Grafana has dashboards for cluster overview, node resources, Longhorn, ArgoCD, and Traefik.
|
||||||
|
|
||||||
- **Prometheus** scrapes all nodes, pods, and K8s control plane components
|
Node Exporter is deployed via Ansible on every VM including `docker-host11` and the edge VPS, so coverage isn't limited to what's inside Kubernetes.
|
||||||
- **Grafana** dashboards: cluster overview, node resource usage, Longhorn, ArgoCD, Traefik
|
|
||||||
- **Alertmanager** routes alerts to Ntfy (self-hosted push notifications) via a custom webhook bridge
|
Goldilocks and VPA run alongside and analyze actual resource usage to suggest better request/limit values.
|
||||||
- **Node Exporter** runs on all VMs including docker-host11 and the edge VPS (Ansible-deployed)
|
|
||||||
- **Goldilocks + VPA** analyse actual resource usage and recommend request/limit values
|
Alertmanager routes alerts to Ntfy via a custom webhook bridge.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Logs + Fleet — Elastic Stack (ECK)
|
## Logs and fleet management
|
||||||
|
|
||||||
Deployed via the **ECK operator** (Elastic Cloud on Kubernetes), running in the `elastic-system` namespace.
|
The ECK operator (Elastic Cloud on Kubernetes) manages the Elastic stack in the `elastic-system` namespace:
|
||||||
|
|
||||||
| Component | Purpose |
|
| Component | Purpose |
|
||||||
|-----------|---------|
|
|-----------|---------|
|
||||||
@@ -28,13 +28,13 @@ Deployed via the **ECK operator** (Elastic Cloud on Kubernetes), running in the
|
|||||||
| Elastic Agent (DaemonSet) | Ships logs and metrics from every cluster node |
|
| Elastic Agent (DaemonSet) | Ships logs and metrics from every cluster node |
|
||||||
| Elastic Agent (standalone) | Runs on docker-host11 and the edge VPS |
|
| Elastic Agent (standalone) | Runs on docker-host11 and the edge VPS |
|
||||||
|
|
||||||
The Elastic Agent DaemonSet tolerates the control-plane `NoSchedule` taint so logs are collected from server nodes as well as agents.
|
The DaemonSet tolerates the control-plane `NoSchedule` taint so server nodes are covered too.
|
||||||
|
|
||||||
Alerts from Elasticsearch rules are bridged to Ntfy via a small CronJob (`elastic-ntfy-bridge`) that polls the Elasticsearch alerts API and forwards new alerts as push notifications.
|
Elastic alert rules are bridged to Ntfy via `elastic-ntfy-bridge`, a small CronJob that polls the Elasticsearch alerts API and forwards new alerts as push notifications.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Alerting Flow
|
## Alerting flow
|
||||||
|
|
||||||
```
|
```
|
||||||
Prometheus Alertmanager ──► Ntfy (push notification)
|
Prometheus Alertmanager ──► Ntfy (push notification)
|
||||||
@@ -42,4 +42,4 @@ Prometheus Alertmanager ──► Ntfy (push notification)
|
|||||||
Elasticsearch alert rule ──► elastic-ntfy-bridge CronJob ─┘
|
Elasticsearch alert rule ──► elastic-ntfy-bridge CronJob ─┘
|
||||||
```
|
```
|
||||||
|
|
||||||
All alerts land in the same Ntfy topic, accessible on mobile and desktop.
|
Both sources land in the same Ntfy topic.
|
||||||
|
|||||||
@@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Three storage tiers serve different workloads:
|
Three storage tiers, each doing a different job:
|
||||||
|
|
||||||
| Tier | System | Access | Used by |
|
| Tier | System | Access | Used by |
|
||||||
|------|--------|--------|---------|
|
|------|--------|--------|---------|
|
||||||
@@ -14,46 +14,30 @@ Three storage tiers serve different workloads:
|
|||||||
|
|
||||||
## Longhorn
|
## Longhorn
|
||||||
|
|
||||||
Longhorn provides distributed block storage across all 14 agent nodes. Each volume is replicated (default: 3 replicas) across different nodes.
|
Longhorn gives distributed block storage across all 14 agent nodes. Each volume is replicated (default: 3 replicas) across different nodes, using the local disk on each agent (128 GB each).
|
||||||
|
|
||||||
- **RWO** (ReadWriteOnce) — used for most services (Vaultwarden, Paperless, etc.)
|
RWO (ReadWriteOnce) covers most services. RWX (ReadWriteMany) is used where multiple pods need access to the same volume. Snapshots and backups are available through the Longhorn UI.
|
||||||
- **RWX** (ReadWriteMany) — used where multiple pods need shared access
|
|
||||||
- Volumes are backed by the local disk on each agent node (128 GB each)
|
|
||||||
- Longhorn manager runs as a DaemonSet; the CSI plugin integrates with the K8s storage layer
|
|
||||||
- Snapshots and backups are supported via the Longhorn UI
|
|
||||||
|
|
||||||
Control plane nodes (`k3s-server-*`) are tainted `NoSchedule` — Longhorn manager tolerates this taint and runs everywhere, but user workloads are pushed to agent nodes only.
|
Control plane nodes are tainted `NoSchedule` — Longhorn manager tolerates this and runs everywhere, but user workloads stay on agent nodes.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## CloudNativePG
|
## CloudNativePG
|
||||||
|
|
||||||
The CNPG operator manages HA PostgreSQL clusters as first-class Kubernetes resources. Currently used by:
|
CloudNativePG manages HA PostgreSQL clusters as Kubernetes resources. Immich uses it for its primary database (photos, albums, users, ML embeddings). CNPG handles streaming replication, failover, and scheduled backups, with data stored on Longhorn PVCs.
|
||||||
|
|
||||||
- **Immich** — primary database (photos, albums, users, ML embeddings)
|
|
||||||
|
|
||||||
CNPG handles streaming replication, failover, and scheduled backups. Data is stored on Longhorn PVCs.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## NFS
|
## NFS
|
||||||
|
|
||||||
A dedicated physical node (`aya01`) runs a bare-metal NFS server. This serves the media library to Jellyfin.
|
`aya01` is a dedicated bare-metal NFS server. Jellyfin mounts the share from `docker-host11` to access movies, TV shows, and music. Keeping the media library on a separate host means the Jellyfin VM can be rebuilt without touching the data.
|
||||||
|
|
||||||
- Movies, TV shows, and music live on `aya01`
|
NFS is not used for K8s workloads — Longhorn handles all PVC-backed storage.
|
||||||
- `docker-host11` (where Jellyfin runs) mounts the NFS share
|
|
||||||
- Separating media storage from the compute host means the Jellyfin VM can be rebuilt without touching the library
|
|
||||||
- NFS is not used for K8s workloads — Longhorn handles all PVC-backed storage
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Secret Storage
|
## Secrets
|
||||||
|
|
||||||
Kubernetes secrets are managed with **Sealed Secrets** (Bitnami). The workflow:
|
Kubernetes secrets go through Sealed Secrets (Bitnami). The workflow: create a regular `Secret`, encrypt it with `kubeseal` using the cluster's public key into a `SealedSecret`, then commit that to Git. Only the in-cluster controller can decrypt it.
|
||||||
|
|
||||||
1. Create a regular K8s `Secret`
|
Ansible secrets (VM credentials, API tokens) are encrypted with Ansible Vault and live in `vars/group_vars/*/secrets_*.yaml`.
|
||||||
2. Encrypt it with `kubeseal` using the cluster's public key → produces a `SealedSecret`
|
|
||||||
3. Commit the `SealedSecret` to Git — it is safe to store publicly
|
|
||||||
4. The in-cluster Sealed Secrets controller decrypts it into a regular `Secret` at apply time
|
|
||||||
|
|
||||||
Ansible secrets (VM credentials, API tokens) are encrypted with **Ansible Vault** and stored in `vars/group_vars/*/secrets_*.yaml`.
|
|
||||||
|
|||||||
Reference in New Issue
Block a user