4.0 KiB
4.0 KiB
Changelog
Technical evolution of the infrastructure stack, tracking the migration from standalone Docker hosts to a fully automated, GitOps-managed Kubernetes cluster.
Phase 5: GitOps & Cluster Hardening (July 2025 - Present)
Shifted control plane management to ArgoCD and expanded storage capabilities.
- GitOps Implementation:
- Deployed ArgoCD in an App-of-Apps pattern to manage cluster state (
89c51aa). - Integrated Sealed Secrets (implied via vault diffs) and Cert-Manager for automated TLS management (
76000f8). - Migrated core services (Traefik, MetalLB) to Helm charts managed via ArgoCD manifests.
- Deployed ArgoCD in an App-of-Apps pattern to manage cluster state (
- Storage Architecture:
- Implemented Longhorn with iSCSI support for distributed block storage (
48aec11). - Added NFS Provisioner (
e1a2248) for ReadWriteMany volumes capabilities.
- Implemented Longhorn with iSCSI support for distributed block storage (
- Networking:
- Centralized primary server IP logic (
97a5d6c) to support HA control plane capability. - Replaced Netcup DNS webhooks with Cloudflare for Caddy ACME challenges (
9cb90a8).
- Centralized primary server IP logic (
- Observability:
- Added healthcheck definitions to Docker Compose services (
0e8e07e) and K3s probes.
- Added healthcheck definitions to Docker Compose services (
Phase 4: IaaC Refactoring & Proxmox API Integration (Nov 2024 - June 2025)
Refactored Ansible roles for modularity and implemented Proxmox API automation for "click-less" provisioning.
- Proxmox Automation:
- Developed
roles/proxmoxto interface with Proxmox API: automated VM creation, cloning from templates, and Cloud-Init injection (f2ea03b). - Configured PCI Passthrough (
591342f) and hardware acceleration for media transcoding nodes. - Added cron-based VM state reconciliation (
a1da69a).
- Developed
- Ansible Restructuring:
- Inventory Refactor: Moved from root-level inventory files to a hierarchical
vars/structure (609e000). - Linting Pipeline: Integrated
ansible-lintandpre-commithooks (6eef96b) to enforce YAML standards and best practices. - Vault Security: Configured
.gitattributesto enableansible-vault viewfor cleartext diffs in git (c3905ed).
- Inventory Refactor: Moved from root-level inventory files to a hierarchical
- Identity Management:
- Deployed Keycloak (
42196a3) for OIDC/SAML authentication across the stack.
- Deployed Keycloak (
Phase 3: The Kubernetes Migration (Sep 2024 - Oct 2024)
Architectural pivot from Docker Compose to K3s.
- Control Plane Setup:
- Bootstrapped K3s cluster with dedicated server/agent split.
- Configured HAProxy/Nginx load balancers (
51a49d0) for API server high availability.
- Node Provisioning:
- Standardized node bootstrapping (kernel modules, sysctl params) for K3s compatibility.
- Deployed specialized storage nodes for Longhorn (
7d58de9).
- Decommissioning:
- Drained and removed legacy Docker hosts (
0aed818). - Migrated stateful workloads (Postgres) to cluster-managed resources.
- Drained and removed legacy Docker hosts (
Phase 2: Docker Service Expansion (2023 - 2024)
Vertical scaling of Docker hosts and introduction of the monitoring stack.
- Service Stack:
- Deployed the *arr suite (Sonarr, Radarr, etc.) and Jellyfin with hardware mapping (
3d7f143). - Integrated Paperless-ngx with Redis and Tika consumption (
3f88065). - Self-hosted Gitea and GitLab (later removed) for source control.
- Deployed the *arr suite (Sonarr, Radarr, etc.) and Jellyfin with hardware mapping (
- Observability V1:
- Deployed Prometheus and Grafana stack (
b3ae5ef). - Added Node Exporter and SmartCTL Exporter (
0a361d9) to bare metal hosts. - Implemented Uptime Kuma for external availability monitoring.
- Deployed Prometheus and Grafana stack (
- Reverse Proxy:
- Transitioned ingress from Traefik v2 to Nginx Proxy Manager, then to Caddy for simpler configuration management (
a9af3c7,1a1b8cb).
- Transitioned ingress from Traefik v2 to Nginx Proxy Manager, then to Caddy for simpler configuration management (
Phase 1: Genesis & Networking (Late 2022)
Initial infrastructure bring-up.
- Base Configuration:
- Established Ansible role structure for baseline system hardening (SSH, users, packages).
- Configured Wireguard mesh for secure inter-node communication (
2ba4259). - Set up Backblaze B2 offsite backups via Restic/Rclone (
b371e24).
- Network:
- Experimented with macvlan Docker networks for direct container IP assignment.