214 Commits

Author SHA1 Message Date
Tuan-Dat Tran
e10e449333 feat(proxmox): per-node CPU type based on hardware capabilities
Add proxmox_node_cpu map — aya01 (Celeron N5105, no AVX2) stays at
x86-64-v2-AES; inko01/lulu/mii01/naruto01 (all AVX2-capable) use x86-64-v3.
Task looks up cpu type by vm.node with x86-64-v2-AES as fallback.
2026-06-04 23:32:18 +02:00
Tuan-Dat Tran
f57ca9ac44 fix(proxmox): correct VM node assignments and upgrade CPU to x86-64-v3
- docker-host11, k3s-server11, k3s-agent21 moved from inko01 → aya01
- CPU type x86-64-v2-AES → x86-64-v3 to enable AVX2 (required by vLLM CPU image)
2026-06-04 23:19:08 +02:00
Tuan-Dat Tran
6325941078 docs: add raspberry-pi ansible management plan and spec 2026-06-04 01:45:16 +02:00
Tuan-Dat Tran
36f944d1c4 feat(edge_vps): add vps playbook 2026-06-04 01:45:16 +02:00
Tuan-Dat Tran
cce6aba4cd fix(edge_vps): fix wireguard route template and update elastic/vps vars 2026-06-04 01:45:16 +02:00
Tuan-Dat Tran
f873256f65 feat(edge_vps): add traefik dynamic config template 2026-06-04 01:45:01 +02:00
Tuan-Dat Tran
a331265bde feat(edge_vps): add pangolin/gerbil/traefik stack with versioned images 2026-06-04 01:44:55 +02:00
Tuan-Dat Tran
a905b25190 fix(raspberry_pi): switch zigbee2mqtt adapter from ezsp to ember 2026-06-03 20:06:21 +02:00
Tuan-Dat Tran
25cc5ac271 fix(inventory): remove undefined k3s_storage group 2026-06-03 19:53:43 +02:00
Tuan-Dat Tran
2b857903a7 fix(raspberry_pi): use /dev/ttyUSB0 and set ezsp adapter for SONOFF MG21 2026-06-03 19:50:30 +02:00
Tuan-Dat Tran
eb4e8445fc fix(raspberry_pi): isolate z2m to own compose dir, fix port conflict 2026-06-03 19:43:35 +02:00
Tuan-Dat Tran
3799dc16d9 fix(raspberry_pi): install docker-compose-plugin before starting stack 2026-06-03 08:31:21 +02:00
Tuan-Dat Tran
585c01ca62 feat(raspberry_pi): wire up role tasks 2026-06-03 08:27:16 +02:00
Tuan-Dat Tran
14b93bf4f5 feat(raspberry_pi): add zigbee2mqtt deploy task 2026-06-03 08:26:04 +02:00
Tuan-Dat Tran
42e790656d feat(raspberry_pi): add zigbee2mqtt and mosquitto templates 2026-06-03 03:12:20 +02:00
Tuan-Dat Tran
da92fb0ccc feat(raspberry_pi): add directory setup task 2026-06-03 03:11:17 +02:00
Tuan-Dat Tran
d655cc54e2 fix(raspberry_pi): remove host condition from handler 2026-06-03 03:03:20 +02:00
Tuan-Dat Tran
9115d30c59 feat(raspberry_pi): add defaults, handlers, and secrets placeholder 2026-06-03 03:01:20 +02:00
Tuan-Dat Tran
8dcb429573 docs: add zigbee2mqtt implementation plan for naruto 2026-06-03 02:57:22 +02:00
Tuan-Dat Tran
29cc38872c docs: add zigbee2mqtt design spec for naruto 2026-06-03 02:54:18 +02:00
Tuan-Dat Tran
f6e2ce8c1a fix(common): replace deprecated apt_repository with deb822_repository 2026-06-03 02:31:33 +02:00
Tuan-Dat Tran
956836dc67 fix(common): replace deprecated ansible_ fact references with ansible_facts[] 2026-06-03 02:17:08 +02:00
Tuan-Dat Tran
aa8b591afd feat(raspberry_pi): add playbook 2026-06-03 01:23:48 +02:00
Tuan-Dat Tran
935389dc6d feat(raspberry_pi): add empty role scaffold 2026-06-03 01:23:48 +02:00
Tuan-Dat Tran
c4327a7596 fix(common): support aarch64 in extra_packages 2026-05-31 23:41:39 +02:00
Tuan-Dat Tran
b190022ff0 feat(raspberry_pi): add inventory and group vars 2026-05-31 23:29:07 +02:00
Tuan-Dat Tran
8da0ab98f8 fix(k3s_server): skip installation if k3s binary already exists
Primary and secondary install tasks now check k3s_status.stat.exists
so re-running the playbook is idempotent on already-provisioned nodes.
2026-04-27 21:43:42 +02:00
Tuan-Dat Tran
b4e093c9b1 fix(k3s_server): use VIP address in kubeconfig instead of k3s_server_name
k3s_server_name resolves to k3s.seyshiro.de which has no DNS entry.
Use k3s_vip (192.168.20.2) so the kubeconfig always works.
2026-04-27 21:41:55 +02:00
Tuan-Dat Tran
e8df950e87 chore(k3s): update vault-encrypted cluster join token 2026-04-27 21:39:37 +02:00
Tuan-Dat Tran
5b44c46e10 docs(arr-cleanup): improve runbook and fix api key paths
Rewrites findings.md with how-to section, cleaner summary tables,
and more detailed per-pass results. Fixes relative path for
sonarr/radarr API key files after runbook moved deeper in repo.
2026-04-27 21:39:28 +02:00
Tuan-Dat Tran
95715c7748 feat(k3s_server): persist control-plane NoSchedule taint in k3s config
Adds node-taint to /etc/rancher/k3s/config.yaml so the taint
survives node reboots. Taint is already applied live via kubectl.
2026-04-27 21:35:24 +02:00
Tuan-Dat Tran
5bc3024eaf feat(k3s): replace nginx loadbalancer with kube-vip for control-plane HA
Deploys kube-vip as a DaemonSet on all k3s server nodes, advertising a
VIP (192.168.20.2) via ARP. Eliminates the single-point-of-failure
k3s-loadbalancer VM.

- New kube_vip role: RBAC + DaemonSet templates, TLS SAN cert rotation
- playbooks/kube-vip.yaml: migration playbook (serial=1, idempotent)
- Updated k3s install tasks (server primary/secondary, agent) to use k3s_vip
  instead of the loadbalancer VM IP
- Added k3s_vip: 192.168.20.2 to group_vars (below DHCP range .11-.250)

Migration steps in playbook header comment.
2026-04-26 12:08:42 +02:00
Tuan-Dat Tran
fce6f913ff docs(plan): add docker version update plan for jellyfin and gitea 2026-04-23 08:06:35 +02:00
Tuan-Dat Tran
8239988a70 docs(runbook): add arr-stack downloads cleanup investigation and scripts
~16T freed on aya01 (92% → 57% mergerfs pool). Documents root cause
(no hardlinks across mergerfs due to cross-device mounts), cleanup
passes via Sonarr/Radarr API verification, and pending decisions
(Bleach remux, 111 skipped Sonarr entries).
2026-04-23 08:06:27 +02:00
Tuan-Dat Tran
e87dcd06f3 chore(k3s): rotate cluster token secret 2026-04-23 08:06:08 +02:00
Tuan-Dat Tran
543e9a2c97 fix(docker_host): remove /media/docker from NFS mount loop
/media/docker is no longer a valid NFS-backed path; was causing
mount failures on docker_host nodes.
2026-04-23 08:06:03 +02:00
Tuan-Dat Tran
afbc3e3c57 docs(runbook): add Longhorn orphan auto-deletion fix and etcd defrag procedure 2026-04-22 22:03:45 +02:00
Tuan-Dat Tran
b157dd0b89 feat(k3s_server): install etcd-client on control plane nodes 2026-04-22 19:40:24 +02:00
Tuan-Dat Tran
057cd7a7f0 docs(runbook): mark vaultwarden as resolved 2026-04-22 00:52:58 +02:00
Tuan-Dat Tran
db2d5dccd4 docs(runbook): mark Longhorn orphan/etcd defrag as resolved
138 orphans deleted, all 3 etcd members defragged from 634MB to ~57MB.
2026-04-22 00:40:23 +02:00
Tuan-Dat Tran
db7e130515 docs: mark server11 disk issue resolved in runbook 2026-04-21 23:41:13 +02:00
Tuan-Dat Tran
c16e7cf740 fix(k3s_server): use inventory_hostname for primary detection and delegate token fetch
Primary server detection previously used ansible_default_ipv4.address compared against
k3s_primary_server_ip, which breaks with --limit since facts are only gathered for the
targeted hosts, causing the variable to resolve to the wrong IP.

- Replace IP comparisons with `inventory_hostname == groups['k3s_server'] | first`
  in main.yaml (primary install, secondary install, kubeconfig tasks)
- Delegate the node-token slurp to the primary server unconditionally so
  pull_token.yaml works correctly when run against any single node with --limit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 23:30:57 +02:00
Tuan-Dat Tran
c084572521 docs: add k3s-server11 reprovision implementation plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:58:13 +02:00
Tuan-Dat Tran
da7bd42f07 docs: add k3s-server11 reprovision spec and cluster outage runbook
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:55:18 +02:00
Tuan-Dat Tran
f0a45e3fda fix: configure explicit NTP servers in timesyncd instead of relying on DHCP
Gateway at 192.168.20.1 was being provided via DHCP as the NTP server but
does not serve NTP, causing NodeClockNotSynchronising across all nodes.
2026-04-20 20:56:30 +02:00
Tuan-Dat Tran
b5f82e2978 fix: install kitty terminfo on all nodes via common role 2026-04-20 20:36:23 +02:00
Tuan-Dat Tran
29561c44c8 fix: enable and start systemd-timesyncd in common time role
systemd-timesyncd was installed via common_packages but never enabled or
started, causing NodeClockNotSynchronising alerts across all k3s nodes.
2026-04-20 20:18:19 +02:00
Tuan-Dat Tran
d33117a752 chore(docker): update jellyfin to 10.11.7 and gitea to 1.25.5-rootless 2026-04-01 21:20:02 +02:00
Tuan-Dat Tran
e9e4864456 docs: add design spec for docker service version updates (jellyfin 10.11.7, gitea 1.25.5) 2026-04-01 21:17:05 +02:00
Tuan-Dat Tran
043f97ebac docs: add design spec and implementation plan for docker service redeployment 2026-04-01 21:00:51 +02:00