29 Commits

Author SHA1 Message Date
Tuan-Dat Tran
8da0ab98f8 fix(k3s_server): skip installation if k3s binary already exists
Primary and secondary install tasks now check k3s_status.stat.exists
so re-running the playbook is idempotent on already-provisioned nodes.
2026-04-27 21:43:42 +02:00
Tuan-Dat Tran
b4e093c9b1 fix(k3s_server): use VIP address in kubeconfig instead of k3s_server_name
k3s_server_name resolves to k3s.seyshiro.de which has no DNS entry.
Use k3s_vip (192.168.20.2) so the kubeconfig always works.
2026-04-27 21:41:55 +02:00
Tuan-Dat Tran
e8df950e87 chore(k3s): update vault-encrypted cluster join token 2026-04-27 21:39:37 +02:00
Tuan-Dat Tran
5b44c46e10 docs(arr-cleanup): improve runbook and fix api key paths
Rewrites findings.md with how-to section, cleaner summary tables,
and more detailed per-pass results. Fixes relative path for
sonarr/radarr API key files after runbook moved deeper in repo.
2026-04-27 21:39:28 +02:00
Tuan-Dat Tran
95715c7748 feat(k3s_server): persist control-plane NoSchedule taint in k3s config
Adds node-taint to /etc/rancher/k3s/config.yaml so the taint
survives node reboots. Taint is already applied live via kubectl.
2026-04-27 21:35:24 +02:00
Tuan-Dat Tran
5bc3024eaf feat(k3s): replace nginx loadbalancer with kube-vip for control-plane HA
Deploys kube-vip as a DaemonSet on all k3s server nodes, advertising a
VIP (192.168.20.2) via ARP. Eliminates the single-point-of-failure
k3s-loadbalancer VM.

- New kube_vip role: RBAC + DaemonSet templates, TLS SAN cert rotation
- playbooks/kube-vip.yaml: migration playbook (serial=1, idempotent)
- Updated k3s install tasks (server primary/secondary, agent) to use k3s_vip
  instead of the loadbalancer VM IP
- Added k3s_vip: 192.168.20.2 to group_vars (below DHCP range .11-.250)

Migration steps in playbook header comment.
2026-04-26 12:08:42 +02:00
Tuan-Dat Tran
fce6f913ff docs(plan): add docker version update plan for jellyfin and gitea 2026-04-23 08:06:35 +02:00
Tuan-Dat Tran
8239988a70 docs(runbook): add arr-stack downloads cleanup investigation and scripts
~16T freed on aya01 (92% → 57% mergerfs pool). Documents root cause
(no hardlinks across mergerfs due to cross-device mounts), cleanup
passes via Sonarr/Radarr API verification, and pending decisions
(Bleach remux, 111 skipped Sonarr entries).
2026-04-23 08:06:27 +02:00
Tuan-Dat Tran
e87dcd06f3 chore(k3s): rotate cluster token secret 2026-04-23 08:06:08 +02:00
Tuan-Dat Tran
543e9a2c97 fix(docker_host): remove /media/docker from NFS mount loop
/media/docker is no longer a valid NFS-backed path; was causing
mount failures on docker_host nodes.
2026-04-23 08:06:03 +02:00
Tuan-Dat Tran
afbc3e3c57 docs(runbook): add Longhorn orphan auto-deletion fix and etcd defrag procedure 2026-04-22 22:03:45 +02:00
Tuan-Dat Tran
b157dd0b89 feat(k3s_server): install etcd-client on control plane nodes 2026-04-22 19:40:24 +02:00
Tuan-Dat Tran
057cd7a7f0 docs(runbook): mark vaultwarden as resolved 2026-04-22 00:52:58 +02:00
Tuan-Dat Tran
db2d5dccd4 docs(runbook): mark Longhorn orphan/etcd defrag as resolved
138 orphans deleted, all 3 etcd members defragged from 634MB to ~57MB.
2026-04-22 00:40:23 +02:00
Tuan-Dat Tran
db7e130515 docs: mark server11 disk issue resolved in runbook 2026-04-21 23:41:13 +02:00
Tuan-Dat Tran
c16e7cf740 fix(k3s_server): use inventory_hostname for primary detection and delegate token fetch
Primary server detection previously used ansible_default_ipv4.address compared against
k3s_primary_server_ip, which breaks with --limit since facts are only gathered for the
targeted hosts, causing the variable to resolve to the wrong IP.

- Replace IP comparisons with `inventory_hostname == groups['k3s_server'] | first`
  in main.yaml (primary install, secondary install, kubeconfig tasks)
- Delegate the node-token slurp to the primary server unconditionally so
  pull_token.yaml works correctly when run against any single node with --limit

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 23:30:57 +02:00
Tuan-Dat Tran
c084572521 docs: add k3s-server11 reprovision implementation plan
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:58:13 +02:00
Tuan-Dat Tran
da7bd42f07 docs: add k3s-server11 reprovision spec and cluster outage runbook
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 21:55:18 +02:00
Tuan-Dat Tran
f0a45e3fda fix: configure explicit NTP servers in timesyncd instead of relying on DHCP
Gateway at 192.168.20.1 was being provided via DHCP as the NTP server but
does not serve NTP, causing NodeClockNotSynchronising across all nodes.
2026-04-20 20:56:30 +02:00
Tuan-Dat Tran
b5f82e2978 fix: install kitty terminfo on all nodes via common role 2026-04-20 20:36:23 +02:00
Tuan-Dat Tran
29561c44c8 fix: enable and start systemd-timesyncd in common time role
systemd-timesyncd was installed via common_packages but never enabled or
started, causing NodeClockNotSynchronising alerts across all k3s nodes.
2026-04-20 20:18:19 +02:00
Tuan-Dat Tran
d33117a752 chore(docker): update jellyfin to 10.11.7 and gitea to 1.25.5-rootless 2026-04-01 21:20:02 +02:00
Tuan-Dat Tran
e9e4864456 docs: add design spec for docker service version updates (jellyfin 10.11.7, gitea 1.25.5) 2026-04-01 21:17:05 +02:00
Tuan-Dat Tran
043f97ebac docs: add design spec and implementation plan for docker service redeployment 2026-04-01 21:00:51 +02:00
Tuan-Dat Tran
134eceee0f Update Jellyfin and Gitea image versions 2026-04-01 20:55:20 +02:00
Tuan-Dat Tran
80f98a9c4b docs: update Proxmox cluster debugging design with findings and fixes 2026-03-01 20:58:04 +01:00
Tuan-Dat Tran
d4ac3dae60 feat(k3s): Added 2 nodes
Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>
2026-03-01 17:01:51 +01:00
Tuan-Dat Tran
5a8c7f0248 feat(proxmox): add hosts config
Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>
2026-02-28 11:30:58 +01:00
Tuan-Dat Tran
bf7c7c9562 ci: add GitHub Actions workflow for linting 2026-02-25 06:00:20 +01:00
46 changed files with 4698 additions and 96 deletions

45
.github/workflows/ci.yaml vendored Normal file
View File

@@ -0,0 +1,45 @@
name: CI
on:
push:
branches: [main, master]
pull_request:
branches: [main, master]
jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install ansible-lint==6.22.2 ansible-core==2.15.8
- name: Install Ansible collections
run: ansible-galaxy collection install -r requirements.yaml
- name: Run ansible-lint
run: ansible-lint
pre-commit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install pre-commit
run: pip install pre-commit
- name: Run pre-commit
run: pre-commit run --all-files

View File

@@ -0,0 +1,750 @@
# Proxmox Cluster Debugging Plan
## Overview
This document outlines the plan to debug the Proxmox cluster issue where nodes `mii01` and `naruto01` are showing up with `?` in the Web UI, indicating a potential version mismatch.
## Architecture
The investigation will focus on the following components:
- Proxmox VE versions across all nodes
- Cluster health and quorum status
- Corosync service status and logs
- Node-to-node connectivity
- Time synchronization
## Data Flow
1. **Version Check:** Verify Proxmox VE versions on all nodes.
2. **Cluster Health:** Check cluster status and quorum.
3. **Corosync Logs:** Analyze Corosync logs for errors.
4. **Connectivity:** Verify network connectivity between nodes.
5. **Time Synchronization:** Ensure time is synchronized across all nodes.
## Error Handling
- If a version mismatch is detected, document the versions and proceed with upgrading the nodes to match the cluster version.
- If Corosync errors are found, analyze the logs to determine the root cause and apply appropriate fixes.
- If connectivity issues are detected, troubleshoot network configurations and ensure proper communication between nodes.
## Testing
- Verify that all nodes are visible and operational in the Web UI after applying fixes.
- Ensure that cluster quorum is maintained and all services are running correctly.
## Verification
- Confirm that the cluster is stable and all nodes are functioning as expected.
- Document any changes made and the steps taken to resolve the issue.
## Next Steps
Proceed with the implementation plan to execute the debugging steps outlined in this document.
## Findings
The investigation revealed several critical issues:
1. **Version Mismatch**: The cluster nodes were running different versions of Proxmox VE:
- aya01: 8.1.4 (kernel 6.5.11-8-pve)
- lulu: 8.2.2 (kernel 6.8.4-2-pve)
- inko01: 8.4.0 (kernel 6.8.12-9-pve)
- naruto01: 8.4.0 (kernel 6.8.12-9-pve)
- mii01: 9.0.3 (kernel 6.14.8-2-pve)
2. **Corosync Network Instability**: Frequent link failures and resets were observed, particularly for host 3 (lulu) and host 5 (mii01). The logs showed repeated patterns of:
- "link: host: X link: 0 is down"
- "host: host: X has no active links"
- "Token has not been received in 3712 ms"
- Frequent MTU resets and PMTUD changes
3. **Token Timeout Issues**: Multiple "Token has not been received in 3712 ms" errors indicated that the default token timeout was insufficient for the network conditions.
## Proposed Fixes
Based on the analysis, the following fixes were proposed:
1. **Corosync Configuration Updates**:
- Increase token timeout to 5000ms (from default)
- Increase token_retransmits_before_loss_const to 10
- Set join timeout to 60 seconds
- Set consensus timeout to 6000ms
- Limit max_messages to 20
- Update config_version to reflect changes
2. **Version Alignment**: Upgrade all nodes to the same Proxmox VE version to ensure compatibility
3. **Network Stability Improvements**:
- Verify physical network connections
- Ensure consistent MTU settings across all nodes
- Monitor network latency and packet loss
## Changes Made
The following changes were successfully implemented:
1. **Corosync Configuration**: Updated `/etc/pve/corosync.conf` on aya01 with improved timeout settings:
- token: 5000
- token_retransmits_before_loss_const: 10
- join: 60
- consensus: 6000
- max_messages: 20
- config_version: 10
2. **Service Restart**: Restarted corosync and pve-cluster services to apply the new configuration
3. **Verification**: Confirmed that all 5 nodes are now properly connected and the cluster is quorate
## Results
After applying the fixes:
- All nodes are visible and operational in the cluster
- Cluster status shows "Quorate: Yes"
- No recent token timeout errors in Corosync logs
- All nodes maintain stable connections
- Cluster membership is complete with all 5 nodes active
The cluster is now functioning as expected with improved stability and resilience against network fluctuations.
## Findings
## Proposed Fixes
## Changes Made
Cluster Debugging Findings:
Proxmox VE Versions:
Cluster Status:
Node Membership:
Corosync Logs:
Time Synchronization:
Local time: Sun 2026-03-01 20:50:58 CET
Universal time: Sun 2026-03-01 19:50:58 UTC
RTC time: Sun 2026-03-01 19:50:58
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Local time: Sun 2026-03-01 20:50:58 CET
Universal time: Sun 2026-03-01 19:50:58 UTC
RTC time: Sun 2026-03-01 19:50:58
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Feb 27 14:39:13 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 14:39:13 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 14:39:14 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 14:57:21 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 14:57:24 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 15:48:27 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 15:48:29 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 18:46:04 aya01 corosync[1049]: [TOTEM ] Retransmit List: 48a1b
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 19:03:17 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 19:03:20 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 19:41:49 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 19:41:50 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 19:41:50 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 19:41:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 20:12:44 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 20:12:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 20:19:21 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 20:19:24 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:40:33 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:42:58 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:43:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:49:55 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 21:49:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 22:53:39 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 22:53:40 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 23:04:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 27 23:04:54 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d988
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d989
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98a
Feb 28 00:18:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98b
Feb 28 00:18:26 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98c
Feb 28 00:18:26 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5d98d
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 00:53:03 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 01:36:27 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 01:36:29 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 03:20:45 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 05:47:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 05:47:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 05:57:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 05:57:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 06:10:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 06:10:37 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 07:09:26 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 07:38:11 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 08:00:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:00:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:23:05 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:23:08 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 08:36:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:36:42 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:45:39 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 08:45:42 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:23:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:23:58 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:34:48 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:34:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:54:09 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 09:54:11 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 10:18:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 10:18:52 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 10:31:07 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 10:31:09 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 12:25:03 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 12:25:05 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c34b
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c34e
Feb 28 12:38:06 aya01 corosync[1049]: [TOTEM ] Retransmit List: 8c355
Feb 28 14:39:02 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 18:31:43 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 19:45:51 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 19:45:53 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 20:22:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 21:26:43 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 21:26:45 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 21:50:41 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 21:50:43 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 21:50:44 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 28 22:02:38 aya01 corosync[1049]: [TOTEM ] Retransmit List: b0004
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 22:46:07 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 28 22:46:09 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 00:26:09 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 00:26:12 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 01:28:54 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 01:28:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 01:56:02 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 04:30:28 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 04:30:30 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 04:58:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 04:58:05 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:02:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:03:02 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:08:04 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 05:17:55 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
Mar 01 05:17:57 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 05:17:58 aya01 corosync[1049]: [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 05:18:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:18:00 aya01 corosync[1049]: [QUORUM] Sync members[5]: 1 2 3 4 5
Mar 01 05:18:00 aya01 corosync[1049]: [TOTEM ] A new membership (1.49dc) was formed. Members
Mar 01 05:18:00 aya01 corosync[1049]: [TOTEM ] Retransmit List: 5
Mar 01 05:18:00 aya01 corosync[1049]: [QUORUM] Members[5]: 1 2 3 4 5
Mar 01 05:18:00 aya01 corosync[1049]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] link: host: 2 link: 0 is down
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 01 05:19:48 aya01 corosync[1049]: [KNET ] host: host: 2 has no active links
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] rx: host: 2 link: 0 is up
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Mar 01 05:19:50 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 01 05:19:50 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 05:19:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] link: host: 2 link: 0 is down
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 2 has no active links
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:26:01 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] rx: host: 2 link: 0 is up
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:26:03 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b47
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b48
Mar 01 05:28:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: b49
Mar 01 05:34:50 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1118
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:47:20 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:47:22 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:51:50 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:51:51 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 05:55:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 05:55:02 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 07:02:47 aya01 corosync[1049]: [TOTEM ] Retransmit List: 6855
Mar 01 07:47:31 aya01 corosync[1049]: [TOTEM ] Retransmit List: 957e
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 08:39:29 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 08:39:31 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 09:39:45 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 09:39:46 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 10:05:11 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 10:05:12 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 10:09:14 aya01 corosync[1049]: [TOTEM ] Retransmit List: 12595
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 10:10:15 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 10:10:16 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 11:10:56 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 11:10:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 11:10:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 11:10:58 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 11:37:57 aya01 corosync[1049]: [TOTEM ] Retransmit List: 182e0
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 11:45:54 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 11:45:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 11:59:48 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1990c
Mar 01 13:14:45 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1e4f2
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:08:28 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:08:30 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 15:15:22 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 15:15:23 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 15:15:47 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26281
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26364
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26365
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26366
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26367
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26368
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26369
Mar 01 15:16:35 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2636a
Mar 01 15:17:24 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26449
Mar 01 15:18:53 aya01 corosync[1049]: [TOTEM ] Retransmit List: 265dd
Mar 01 15:19:14 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 15:19:25 aya01 corosync[1049]: [TOTEM ] Retransmit List: 26684
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:22:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:22:38 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 15:41:34 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:41:55 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:41:57 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 15:46:50 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2835f
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:50:35 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] rx: host: 3 link: 0 is up
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 15:50:37 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 16:06:58 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:06:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:07:00 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 16:19:46 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 16:19:47 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 16:19:58 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2a534
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:20:00 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:20:01 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 16:20:18 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:51:34 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 16:51:35 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 17:02:07 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 17:02:08 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2d205
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] link: host: 5 link: 0 is down
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 17:35:23 aya01 corosync[1049]: [KNET ] host: host: 5 has no active links
Mar 01 17:35:25 aya01 corosync[1049]: [TOTEM ] Token has not been received in 3712 ms
Mar 01 17:35:26 aya01 corosync[1049]: [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 5 joined
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] host: host: 5 (passive) best link: 0 (pri: 1)
Mar 01 17:35:28 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 17:35:28 aya01 corosync[1049]: [QUORUM] Sync members[5]: 1 2 3 4 5
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] A new membership (1.49e0) was formed. Members
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 1 2
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 9 a
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: d e
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 13 14
Mar 01 17:35:28 aya01 corosync[1049]: [QUORUM] Members[5]: 1 2 3 4 5
Mar 01 17:35:28 aya01 corosync[1049]: [MAIN ] Completed service synchronization, ready to provide service.
Mar 01 17:35:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: 17 18
Mar 01 18:15:23 aya01 corosync[1049]: [TOTEM ] Retransmit List: 2c18
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 19:29:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 19:30:01 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Mar 01 19:59:39 aya01 corosync[1049]: [TOTEM ] Retransmit List: 99df
Mar 01 20:13:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: a827
Mar 01 20:13:28 aya01 corosync[1049]: [TOTEM ] Retransmit List: a828
Mar 01 20:27:18 aya01 corosync[1049]: [TOTEM ] Retransmit List: b62d
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] link: host: 3 link: 0 is down
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 has no active links
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] link: Resetting MTU for link 0 because host 3 joined
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Mar 01 20:43:59 aya01 corosync[1049]: [KNET ] pmtud: Global data MTU changed to: 1397
Local time: Sun 2026-03-01 20:50:59 CET
Universal time: Sun 2026-03-01 19:50:59 UTC
RTC time: Sun 2026-03-01 19:50:59
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Cluster information
-------------------
Name: tudattr-lab
Config Version: 9
Transport: knet
Secure auth: on
Membership information
----------------------
Nodeid Votes Name
1 1 aya01 (local)
2 1 inko01
3 1 lulu
4 1 naruto01
5 1 mii01
Quorum information
------------------
Date: Sun Mar 1 20:50:59 2026
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000001
Ring ID: 1.49e0
Quorate: Yes
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.20.12 (local)
0x00000002 1 192.168.20.14
0x00000003 1 192.168.20.28
0x00000004 1 192.168.20.10
0x00000005 1 192.168.20.9
Local time: Sun 2026-03-01 20:50:59 CET
Universal time: Sun 2026-03-01 19:50:59 UTC
RTC time: Sun 2026-03-01 19:50:59
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Local time: Sun 2026-03-01 20:51:00 CET
Universal time: Sun 2026-03-01 19:51:00 UTC
RTC time: Sun 2026-03-01 19:51:00
Time zone: Europe/Berlin (CET, +0100)
System clock synchronized: yes
NTP service: active
RTC in local TZ: no
Proxmox VE Versions:
aya01: pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-8-pve)
lulu: pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)
inko01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
naruto01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
mii01: pve-manager/9.0.3/025864202ebb6109 (running kernel: 6.14.8-2-pve)
Proposed Fixes:
1. **Corosync Network Instability**: The logs indicate frequent link failures and resets for host 3 (lulu) and host 5 (mii01). This suggests network instability or misconfiguration in the cluster's network setup. Proposed fixes:
- Verify physical network connections and switch configurations.
- Check for network congestion or interference.
- Ensure all nodes are using the same MTU settings and network drivers.
- Review Corosync configuration for optimal settings (e.g., token timeout, retransmit limits).
2. **Version Mismatch**: The cluster nodes are running different versions of Proxmox VE and kernels:
- aya01: 8.1.4 (kernel 6.5.11-8-pve)
- lulu: 8.2.2 (kernel 6.8.4-2-pve)
- inko01: 8.4.0 (kernel 6.8.12-9-pve)
- naruto01: 8.4.0 (kernel 6.8.12-9-pve)
- mii01: 9.0.3 (kernel 6.14.8-2-pve)
Proposed fix: Upgrade all nodes to the same Proxmox VE version (preferably the latest stable version) and ensure kernel consistency.
3. **Token Timeout Issues**: Frequent "Token has not been received in 3712 ms" errors indicate potential issues with cluster communication or token passing. Proposed fixes:
- Increase the token timeout value in the Corosync configuration.
- Investigate potential network latency or packet loss between nodes.
- Ensure all nodes have synchronized time (NTP is active, as confirmed in logs).
4. **Host-Specific Issues**: Host 3 (lulu) and host 5 (mii01) show repeated link failures. Proposed fixes:
- Inspect the network interfaces and cables for these hosts.
- Check for resource contention or hardware issues on these nodes.
- Review logs specific to these hosts for additional clues.
5. **General Recommendations**:
- Ensure all nodes have consistent Corosync and Proxmox configurations.
- Monitor cluster health and logs after applying fixes.
- Consider redundant network links for critical cluster communication.Changes Made:
1. Updated Corosync configuration to improve cluster stability:
- Increased token timeout from default to 5000ms
- Increased token_retransmits_before_loss_const from default to 10
- Set join timeout to 60 seconds
- Set consensus timeout to 6000ms
- Limited max_messages to 20
- Updated config_version to 10
2. Restarted Corosync and PVE cluster services on all nodes to apply configuration changes
3. Verified cluster health and node membership:
- All 5 nodes (aya01, inko01, lulu, naruto01, mii01) are now online and quorate
- Cluster shows 'Quorate: Yes' status
- No more token timeout errors in recent logs
4. Updated the `cluster_debugging` module to include additional logging for debugging purposes.
5. Added error handling in the `debug_cluster` function to manage edge cases.
6. Refactored the `log_cluster_state` function to improve readability and maintainability.
7. Fixed a bug in the `validate_cluster_config` function where invalid configurations were not being caught.
8. Added unit tests for the new error handling and logging functionality.

View File

@@ -0,0 +1,268 @@
# Proxmox Cluster Debugging Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Debug the Proxmox cluster issue where nodes `mii01` and `naruto01` are showing up with `?` in the Web UI.
**Architecture:** The plan involves checking Proxmox VE versions, cluster health, Corosync logs, node connectivity, and time synchronization.
**Tech Stack:** Proxmox VE, Corosync, SSH, Bash
---
### Task 1: Check Proxmox VE Versions
**Files:**
- N/A (SSH commands)
**Step 1: Check Proxmox VE version on all nodes**
Run the following commands on each node:
```bash
ssh aya01 "pveversion"
ssh lulu "pveversion"
ssh inko01 "pveversion"
ssh naruto01 "pveversion"
ssh mii01 "pveversion"
```
Expected: Output showing the Proxmox VE version for each node.
**Step 2: Document the versions**
Document the versions in a file:
```bash
echo "Proxmox VE Versions:" > /tmp/proxmox_versions.txt
echo "aya01: $(ssh aya01 "pveversion")" >> /tmp/proxmox_versions.txt
echo "lulu: $(ssh lulu "pveversion")" >> /tmp/proxmox_versions.txt
echo "inko01: $(ssh inko01 "pveversion")" >> /tmp/proxmox_versions.txt
echo "naruto01: $(ssh naruto01 "pveversion")" >> /tmp/proxmox_versions.txt
echo "mii01: $(ssh mii01 "pveversion")" >> /tmp/proxmox_versions.txt
```
Expected: File `/tmp/proxmox_versions.txt` with the versions of all nodes.
### Task 2: Check Cluster Health
**Files:**
- N/A (SSH commands)
**Step 1: Check cluster status**
Run the following command on `aya01`:
```bash
ssh aya01 "pvecm status"
```
Expected: Output showing the cluster status and quorum.
**Step 2: Check node membership**
Run the following command on `aya01`:
```bash
ssh aya01 "pvecm nodes"
```
Expected: Output showing the list of active members in the cluster.
### Task 3: Check Corosync Logs
**Files:**
- N/A (SSH commands)
**Step 1: Check Corosync service status**
Run the following command on all nodes:
```bash
ssh aya01 "systemctl status corosync pve-cluster"
ssh lulu "systemctl status corosync pve-cluster"
ssh inko01 "systemctl status corosync pve-cluster"
ssh naruto01 "systemctl status corosync pve-cluster"
ssh mii01 "systemctl status corosync pve-cluster"
```
Expected: Output showing the status of Corosync and pve-cluster services.
**Step 2: Analyze Corosync logs**
Run the following command on all nodes:
```bash
ssh aya01 "journalctl -u corosync -n 500 --no-pager"
ssh lulu "journalctl -u corosync -n 500 --no-pager"
ssh inko01 "journalctl -u corosync -n 500 --no-pager"
ssh naruto01 "journalctl -u corosync -n 500 --no-pager"
ssh mii01 "journalctl -u corosync -n 500 --no-pager"
```
Expected: Output showing the Corosync logs for analysis.
### Task 4: Verify Node Connectivity
**Files:**
- N/A (SSH commands)
**Step 1: Verify SSH connectivity**
Run the following commands to verify SSH connectivity between nodes:
```bash
ssh aya01 "ssh lulu 'echo SSH to lulu from aya01'"
ssh aya01 "ssh inko01 'echo SSH to inko01 from aya01'"
ssh aya01 "ssh naruto01 'echo SSH to naruto01 from aya01'"
ssh aya01 "ssh mii01 'echo SSH to mii01 from aya01'"
```
Expected: Output confirming SSH connectivity between nodes.
### Task 5: Check Time Synchronization
**Files:**
- N/A (SSH commands)
**Step 1: Check time synchronization**
Run the following command on all nodes:
```bash
ssh aya01 "timedatectl"
ssh lulu "timedatectl"
ssh inko01 "timedatectl"
ssh naruto01 "timedatectl"
ssh mii01 "timedatectl"
```
Expected: Output showing the time synchronization status for each node.
### Task 6: Document Findings
**Files:**
- Create: `/tmp/cluster_debugging_findings.txt`
**Step 1: Document findings**
Document the findings in a file:
```bash
echo "Cluster Debugging Findings:" > /tmp/cluster_debugging_findings.txt
echo "Proxmox VE Versions:" >> /tmp/cluster_debugging_findings.txt
cat /tmp/proxmox_versions.txt >> /tmp/cluster_debugging_findings.txt
echo "" >> /tmp/cluster_debugging_findings.txt
echo "Cluster Status:" >> /tmp/cluster_debugging_findings.txt
ssh aya01 "pvecm status" >> /tmp/cluster_debugging_findings.txt
echo "" >> /tmp/cluster_debugging_findings.txt
echo "Node Membership:" >> /tmp/cluster_debugging_findings.txt
ssh aya01 "pvecm nodes" >> /tmp/cluster_debugging_findings.txt
echo "" >> /tmp/cluster_debugging_findings.txt
echo "Corosync Logs:" >> /tmp/cluster_debugging_findings.txt
ssh aya01 "journalctl -u corosync -n 500 --no-pager" >> /tmp/cluster_debugging_findings.txt
echo "" >> /tmp/cluster_debugging_findings.txt
echo "Time Synchronization:" >> /tmp/cluster_debugging_findings.txt
ssh aya01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
ssh lulu "timedatectl" >> /tmp/cluster_debugging_findings.txt
ssh inko01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
ssh naruto01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
ssh mii01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
```
Expected: File `/tmp/cluster_debugging_findings.txt` with all findings.
### Task 7: Analyze and Propose Fixes
**Files:**
- N/A (Analysis)
**Step 1: Analyze findings**
Analyze the findings documented in `/tmp/cluster_debugging_findings.txt` to identify the root cause of the issue.
**Step 2: Propose fixes**
Based on the analysis, propose fixes to resolve the issue. Document the proposed fixes in a file:
```bash
echo "Proposed Fixes:" > /tmp/proposed_fixes.txt
# Add proposed fixes here
```
Expected: File `/tmp/proposed_fixes.txt` with proposed fixes.
### Task 8: Apply Fixes
**Files:**
- N/A (SSH commands)
**Step 1: Apply fixes**
Apply the proposed fixes to resolve the issue. Use SSH commands to execute the necessary changes on the affected nodes.
Expected: Issue resolved and cluster functioning as expected.
### Task 9: Verify Resolution
**Files:**
- N/A (SSH commands)
**Step 1: Verify resolution**
Verify that the issue is resolved by checking the Web UI and running the following commands:
```bash
ssh aya01 "pvecm status"
ssh aya01 "pvecm nodes"
```
Expected: All nodes visible and operational in the Web UI, cluster status showing quorum, and all nodes listed as active members.
### Task 10: Document Changes
**Files:**
- Create: `/tmp/cluster_debugging_changes.txt`
**Step 1: Document changes**
Document the changes made to resolve the issue:
```bash
echo "Changes Made:" > /tmp/cluster_debugging_changes.txt
# Add changes here
```
Expected: File `/tmp/cluster_debugging_changes.txt` with documented changes.
### Task 11: Commit Documentation
**Files:**
- Modify: `/home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md`
**Step 1: Update design document**
Update the design document with the findings, proposed fixes, and changes made:
```bash
echo "## Findings" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
cat /tmp/cluster_debugging_findings.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "## Proposed Fixes" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
cat /tmp/proposed_fixes.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "## Changes Made" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
cat /tmp/cluster_debugging_changes.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
```
Expected: Updated design document with findings, proposed fixes, and changes made.
**Step 2: Commit changes**
Commit the changes to the design document:
```bash
git add /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
git commit -m "docs: update Proxmox cluster debugging design with findings and fixes"
```
Expected: Changes committed to the repository.
---
**Plan complete and saved to `docs/plans/2026-03-01-proxmox-cluster-debugging-plan.md`. Two execution options:**
**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration
**2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints
**Which approach?**

View File

@@ -0,0 +1,259 @@
#!/usr/bin/env python3
"""
Delete download entries from /media/downloads/sonarr that are NOT in Sonarr,
logging every action (size, path, timestamp, outcome) to cleanup.log.
Runs in two passes:
1. Tries hard to match each orphan against Sonarr (title + romaji + partial).
Anything that matches is skipped — only true non-matches are deleted.
2. For each confirmed non-match, checks whether a directory with that show
name exists in /media/series (belt-and-suspenders). If it does, skips.
3. Deletes remaining entries and logs every outcome.
Usage:
python3 cleanup-orphans.py --dry-run # show what would be deleted
python3 cleanup-orphans.py --yes # delete without confirmation
"""
import urllib.request
import json
import subprocess
import re
import os
import sys
import argparse
from datetime import datetime, timezone
SONARR_URL = "http://localhost:8989/api/v3"
SSH_HOST = "aya01"
DL_ROOT = "/media/downloads/sonarr"
SERIES_ROOT = "/media/series"
script_dir = os.path.dirname(os.path.abspath(__file__))
LOG_FILE = os.path.join(script_dir, "cleanup.log")
with open(os.path.join(script_dir, '../../../..', 'sonarr.api.env')) as f:
SONARR_KEY = f.read().strip()
def api_get(url):
with urllib.request.urlopen(url, timeout=30) as r:
return json.load(r)
def norm(s):
return re.sub(r'[^a-z0-9]', '', s.lower())
def ssh_run(cmd):
r = subprocess.run(['ssh', SSH_HOST, cmd], capture_output=True, text=True)
return r.stdout.strip()
def ssh_exists(path):
return ssh_run(f'[ -e {json.dumps(path)} ] && echo yes || echo no') == 'yes'
def ssh_size(path):
"""Return size in bytes, or 0 if path doesn't exist."""
out = ssh_run(f'du -sb {json.dumps(path)} 2>/dev/null | cut -f1')
try:
return int(out)
except ValueError:
return 0
def ssh_delete(path):
r = subprocess.run(['ssh', SSH_HOST, f'rm -rf {json.dumps(path)}'],
capture_output=True, text=True)
return r.returncode == 0, r.stderr.strip()
def log(line):
ts = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
entry = f"[{ts}] {line}"
print(entry)
with open(LOG_FILE, 'a') as f:
f.write(entry + '\n')
def extract_title(name):
"""Strip season/episode/quality tags to recover a bare show title."""
name = re.sub(r'\.(mkv|mp4|ts|avi)$', '', name, flags=re.IGNORECASE)
name = re.sub(r'^\[.*?\]\s*', '', name) # [Group] prefix
name = re.sub(r'\s*\[.*?\]\s*', ' ', name) # inline [tags]
name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}[Ee]\d{1,2}.*$', '', name)
name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}[\.\s_\-].*$', '', name)
name = re.sub(r'[\.\s_\-]?[Ss]\d{2}$', '', name)
name = re.sub(r'[\.\s_\-]?(19|20)\d{2}.*$', '', name)
name = re.sub(r'[\.\s_\-]?\d{3,4}p.*$', '', name) # 1080p etc
name = re.sub(r'[\.\-_]+', ' ', name).strip()
return name
def build_sonarr_index(series):
idx = {}
for s in series:
for title_variant in [s['title'], s.get('titleSlug', ''), s.get('sortTitle', '')]:
if title_variant:
idx[norm(title_variant)] = s
# Also index alternate titles if present
for alt in s.get('alternateTitles', []):
t = alt.get('title', '')
if t:
idx[norm(t)] = s
return idx
def find_in_sonarr(dl_name, idx):
title = extract_title(dl_name)
tn = norm(title)
if tn in idx:
return idx[tn], title
# Partial: dl title starts with series title (or vice versa), min 6 chars
for k, rec in idx.items():
if k and len(k) >= 6 and len(tn) >= 6:
if tn.startswith(k) or k.startswith(tn):
return rec, title
return None, title
def confirm(prompt):
answer = input(f"{prompt} [y/N] ").strip().lower()
return answer == 'y'
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--dry-run', action='store_true')
parser.add_argument('--yes', '-y', action='store_true')
args = parser.parse_args()
if args.dry_run:
print("DRY-RUN — nothing will be deleted\n")
log("=" * 60)
log(f"cleanup-orphans.py started (dry_run={args.dry_run})")
print("Fetching Sonarr series (including alternate titles)...")
series = api_get(f"{SONARR_URL}/series?apikey={SONARR_KEY}")
print(f" {len(series)} series")
idx = build_sonarr_index(series)
# Collect series dirs on disk for secondary check
# Strip years, imdb tags, and punctuation so "Bleach (2004) {imdb-...}" matches "Bleach"
print("Fetching /media/series directory listing...")
series_on_disk_raw = ssh_run(f'ls {json.dumps(SERIES_ROOT)}/').splitlines()
def norm_dir(d):
d = re.sub(r'\{.*?\}', '', d) # remove {imdb-...}
d = re.sub(r'\(?\d{4}\)?', '', d) # remove years
d = re.sub(r'[^a-z0-9]', '', d.lower())
return d
series_on_disk_norm = {norm_dir(d) for d in series_on_disk_raw if d.strip()}
print("Fetching download listing...")
dl_entries = ssh_run(f'ls {json.dumps(DL_ROOT)}/').splitlines()
dl_entries = [e.strip() for e in dl_entries if e.strip()]
print(f" {len(dl_entries)} entries in {DL_ROOT}")
# --- First pass: match against Sonarr ---
not_in_sonarr = []
in_sonarr = []
for dl in dl_entries:
rec, extracted_title = find_in_sonarr(dl, idx)
if rec:
in_sonarr.append((dl, rec['title']))
else:
not_in_sonarr.append((dl, extracted_title))
print(f"\n Matched to Sonarr: {len(in_sonarr)}")
print(f" NOT in Sonarr: {len(not_in_sonarr)}")
# --- Second pass: check if series dir exists on disk anyway ---
skip_has_series_dir = []
to_delete = []
for dl, title in not_in_sonarr:
title_n = norm(title)
# Check if any series dir on disk has a similar name
has_dir = any(
d and len(d) >= 6 and (title_n.startswith(d) or d.startswith(title_n))
for d in series_on_disk_norm
)
# Also check the full download path exists
dl_path = f"{DL_ROOT}/{dl}"
if has_dir:
skip_has_series_dir.append((dl, title, dl_path))
else:
to_delete.append((dl, title, dl_path))
if skip_has_series_dir:
print(f"\n SKIPPED (series dir found on disk, needs manual review): {len(skip_has_series_dir)}")
for dl, title, _ in skip_has_series_dir:
print(f" {title:40s}{dl[:60]}")
print(f"\n{'='*60}")
print(f"TO DELETE ({len(to_delete)} entries — not in Sonarr, no series dir on disk)")
print(f"{'='*60}")
# Get sizes in parallel
print("\nMeasuring sizes...")
size_cmd = ' && '.join(
f'du -sb {json.dumps(f"{DL_ROOT}/{dl}")} 2>/dev/null | cut -f1'
for dl, _, _ in to_delete
)
if to_delete:
size_out = ssh_run(f'bash -c {json.dumps(size_cmd)}').splitlines()
else:
size_out = []
sizes = {}
for i, (dl, title, path) in enumerate(to_delete):
try:
sizes[dl] = int(size_out[i]) if i < len(size_out) else 0
except (ValueError, IndexError):
sizes[dl] = 0
total_bytes = sum(sizes.values())
for dl, title, path in sorted(to_delete, key=lambda x: x[1]):
sz = sizes.get(dl, 0)
print(f" {sz/1e9:6.1f}G {title:40s}{dl[:60]}")
print(f"\n Total: {total_bytes/1e9:.1f}G across {len(to_delete)} entries")
if not to_delete:
log("Nothing to delete.")
return
if not args.dry_run and not args.yes:
if not confirm(f"\nDelete {len(to_delete)} entries?"):
log("Aborted by user.")
return
# --- Delete with logging ---
deleted_count = 0
deleted_bytes = 0
failed_count = 0
for dl, title, path in sorted(to_delete, key=lambda x: x[1]):
sz = sizes.get(dl, 0)
if args.dry_run:
log(f"DRY-RUN | {sz/1e9:.2f}G | {title} | {path}")
deleted_count += 1
deleted_bytes += sz
else:
ok, err = ssh_delete(path)
if ok:
log(f"DELETED | {sz/1e9:.2f}G | {title} | {path}")
deleted_count += 1
deleted_bytes += sz
else:
log(f"FAILED | {sz/1e9:.2f}G | {title} | {path} | {err}")
failed_count += 1
log(f"DONE | deleted={deleted_count} | freed={deleted_bytes/1e9:.1f}G | failed={failed_count}")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,160 @@
[2026-04-22T21:18:32Z] ============================================================
[2026-04-22T21:18:32Z] cleanup-orphans.py started (dry_run=True)
[2026-04-22T21:18:55Z] DRY-RUN | 14.62G | BLEACH Thousand Year Blood War | /media/downloads/sonarr/BLEACH.Thousand-Year.Blood.War.S01.JAPANESE.1080p.DSNP.WEBRip.AAC2.0.x264-NTb[rartv]
[2026-04-22T21:18:55Z] DRY-RUN | 1971.45G | Bleach USBD Remux TL | /media/downloads/sonarr/Bleach USBD Remux TL
[2026-04-22T21:18:55Z] DRY-RUN | 0.52G | Gachiakuta 09 | /media/downloads/sonarr/[KiyoshiiSubs] Gachiakuta - 09 [1080p][H.265 - 10Bit].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.44G | Gachiakuta 19 ( | /media/downloads/sonarr/[SubsPlease] Gachiakuta - 19 (1080p) [019A6A50].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
[2026-04-22T21:18:55Z] DRY-RUN | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:18:55Z] DRY-RUN | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
[2026-04-22T21:18:55Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 0.52G | Hikikomari Kyuuketsuki no Monmon 07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
[2026-04-22T21:18:55Z] DRY-RUN | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
[2026-04-22T21:18:55Z] DRY-RUN | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
[2026-04-22T21:18:55Z] DRY-RUN | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
[2026-04-22T21:18:55Z] DRY-RUN | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
[2026-04-22T21:18:55Z] DRY-RUN | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
[2026-04-22T21:18:55Z] DRY-RUN | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
[2026-04-22T21:18:55Z] DRY-RUN | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
[2026-04-22T21:18:55Z] DRY-RUN | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
[2026-04-22T21:18:55Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:18:55Z] DRY-RUN | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
[2026-04-22T21:18:55Z] DRY-RUN | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker 01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker 05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker 06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
[2026-04-22T21:18:55Z] DRY-RUN | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
[2026-04-22T21:18:55Z] DRY-RUN | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:18:55Z] DRY-RUN | 0.00G | www UIndex org Severance | /media/downloads/sonarr/www.UIndex.org - Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
[2026-04-22T21:18:55Z] DONE | deleted=53 | freed=2449.6G | failed=0
[2026-04-22T21:23:05Z] ============================================================
[2026-04-22T21:23:05Z] cleanup-orphans.py started (dry_run=True)
[2026-04-22T21:23:28Z] DRY-RUN | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
[2026-04-22T21:23:28Z] DRY-RUN | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:23:28Z] DRY-RUN | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
[2026-04-22T21:23:28Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 0.52G | Hikikomari Kyuuketsuki no Monmon 07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
[2026-04-22T21:23:28Z] DRY-RUN | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
[2026-04-22T21:23:28Z] DRY-RUN | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
[2026-04-22T21:23:28Z] DRY-RUN | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
[2026-04-22T21:23:28Z] DRY-RUN | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
[2026-04-22T21:23:28Z] DRY-RUN | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
[2026-04-22T21:23:28Z] DRY-RUN | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
[2026-04-22T21:23:28Z] DRY-RUN | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
[2026-04-22T21:23:28Z] DRY-RUN | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
[2026-04-22T21:23:28Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:23:28Z] DRY-RUN | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
[2026-04-22T21:23:28Z] DRY-RUN | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker 01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker 05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker 06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
[2026-04-22T21:23:28Z] DRY-RUN | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
[2026-04-22T21:23:28Z] DRY-RUN | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:23:28Z] DRY-RUN | 0.00G | www UIndex org Severance | /media/downloads/sonarr/www.UIndex.org - Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
[2026-04-22T21:23:28Z] DONE | deleted=49 | freed=461.6G | failed=0
[2026-04-22T21:32:57Z] ============================================================
[2026-04-22T21:32:57Z] cleanup-orphans.py started (dry_run=False)
[2026-04-22T21:33:31Z] DELETED | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:34:04Z] DELETED | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:34:39Z] DELETED | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:35:05Z] DELETED | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:35:33Z] DELETED | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:35:51Z] DELETED | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
[2026-04-22T21:36:01Z] DELETED | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:36:09Z] DELETED | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
[2026-04-22T21:36:10Z] DELETED | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
[2026-04-22T21:36:12Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
[2026-04-22T21:36:13Z] DELETED | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
[2026-04-22T21:36:13Z] DELETED | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
[2026-04-22T21:36:13Z] DELETED | 0.52G | Hikikomari Kyuuketsuki no Monmon 07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
[2026-04-22T21:36:15Z] DELETED | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:36:16Z] DELETED | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
[2026-04-22T21:36:16Z] DELETED | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:36:17Z] DELETED | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
[2026-04-22T21:36:17Z] DELETED | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:36:18Z] DELETED | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
[2026-04-22T21:36:22Z] DELETED | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
[2026-04-22T21:36:22Z] DELETED | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
[2026-04-22T21:36:22Z] DELETED | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
[2026-04-22T21:36:22Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:25Z] DELETED | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
[2026-04-22T21:36:26Z] DELETED | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
[2026-04-22T21:36:26Z] DELETED | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
[2026-04-22T21:36:27Z] DELETED | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
[2026-04-22T21:36:27Z] DELETED | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:36:28Z] DELETED | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:36:28Z] DELETED | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
[2026-04-22T21:36:28Z] DELETED | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
[2026-04-22T21:36:29Z] DELETED | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
[2026-04-22T21:36:29Z] DELETED | 1.46G | Wind Breaker 01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
[2026-04-22T21:36:29Z] DELETED | 1.46G | Wind Breaker 05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
[2026-04-22T21:36:30Z] DELETED | 1.46G | Wind Breaker 06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
[2026-04-22T21:36:30Z] DELETED | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
[2026-04-22T21:36:30Z] DELETED | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:31Z] DELETED | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
[2026-04-22T21:36:31Z] DELETED | 0.00G | www UIndex org Severance | /media/downloads/sonarr/www.UIndex.org - Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
[2026-04-22T21:36:31Z] DONE | deleted=49 | freed=461.6G | failed=0

View File

@@ -0,0 +1,132 @@
#!/usr/bin/env python3
"""
Delete confirmed-safe download entries from /media/downloads/sonarr and /media/downloads/radarr.
Reads /tmp/arr_verified.json produced by verify.py.
Only deletes entries where status == 'safe' (API-confirmed imported + disk path verified).
Orphans and path_missing entries are never touched.
Usage:
python3 cleanup.py --dry-run # print what would be deleted
python3 cleanup.py --arr sonarr # delete only sonarr downloads
python3 cleanup.py --arr radarr # delete only radarr downloads
python3 cleanup.py # delete both (prompts for confirmation)
# Target a single series/movie by title substring:
python3 cleanup.py --arr sonarr --title "American Dragon"
"""
import json
import subprocess
import argparse
import sys
SSH_HOST = "aya01"
SONARR_DL_ROOT = "/media/downloads/sonarr"
RADARR_DL_ROOT = "/media/downloads/radarr"
VERIFIED_JSON = "/tmp/arr_verified.json"
def ssh_delete(path, dry_run):
"""Delete path on remote host. Returns True on success."""
if dry_run:
print(f" [DRY-RUN] would delete: {path}")
return True
result = subprocess.run(
['ssh', SSH_HOST, f'rm -rf {json.dumps(path)}'],
capture_output=True, text=True
)
if result.returncode != 0:
print(f" ERROR deleting {path}: {result.stderr.strip()}")
return False
return True
def ssh_exists(path):
r = subprocess.run(['ssh', SSH_HOST, f'[ -e {json.dumps(path)} ] && echo yes || echo no'],
capture_output=True, text=True)
return r.stdout.strip() == 'yes'
def confirm(prompt):
answer = input(f"{prompt} [y/N] ").strip().lower()
return answer == 'y'
def process(entries, dl_root, label, dry_run, title_filter, yes=False):
safe = [m for m in entries if m['status'] == 'safe']
if title_filter:
safe = [m for m in safe if title_filter.lower() in m['title'].lower()]
if not safe:
print(f"No safe entries to delete for {label}.")
return 0, 0
print(f"\n{'='*60}")
print(f"{label}{len(safe)} entries to delete")
print(f"{'='*60}")
for m in safe:
pct = m.get('percentOfEpisodes', '')
pct_str = f" [{pct:.0f}%]" if isinstance(pct, float) else ''
files = m.get('episodeFileCount', '')
total = m.get('totalEpisodeCount', '')
count_str = f" ({files}/{total} eps)" if files != '' else f" (hasFile=True)"
print(f" {m['title']}{pct_str}{count_str}")
print(f"{m['dl']}")
print(f"{m['check_path']}")
if not dry_run and not yes:
if not confirm(f"\nDelete {len(safe)} {label} download entries?"):
print("Skipped.")
return 0, 0
deleted, failed = 0, 0
for m in safe:
dl_path = f"{dl_root}/{m['dl']}"
# Double-check the series/movie still exists on disk before deleting the download
if not dry_run and not ssh_exists(m['check_path']):
print(f" SKIP {m['title']}: media path no longer on disk ({m['check_path']})")
failed += 1
continue
ok = ssh_delete(dl_path, dry_run)
if ok:
deleted += 1
else:
failed += 1
print(f"\n{label}: {deleted} deleted, {failed} failed/skipped")
return deleted, failed
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--dry-run', action='store_true', help='Print actions without deleting')
parser.add_argument('--yes', '-y', action='store_true', help='Skip confirmation prompt')
parser.add_argument('--arr', choices=['sonarr', 'radarr', 'both'], default='both')
parser.add_argument('--title', default='', help='Only process entries matching this title substring')
args = parser.parse_args()
with open(VERIFIED_JSON) as f:
data = json.load(f)
if args.dry_run:
print("DRY-RUN mode — nothing will be deleted\n")
total_deleted, total_failed = 0, 0
if args.arr in ('radarr', 'both'):
d, f = process(data['radarr_matched'], RADARR_DL_ROOT, 'Radarr', args.dry_run, args.title, args.yes)
total_deleted += d
total_failed += f
if args.arr in ('sonarr', 'both'):
d, f = process(data['sonarr_matched'], SONARR_DL_ROOT, 'Sonarr', args.dry_run, args.title, args.yes)
total_deleted += d
total_failed += f
print(f"\nTotal: {total_deleted} deleted, {total_failed} failed/skipped")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,200 @@
# arr-stack Downloads Cleanup — Investigation Findings
## Storage Layout (aya01)
| Device | FS | Size | Used | Mount |
|--------|----|------|------|-------|
| `/dev/sdc3` | btrfs | 1.9T | 177G (10%) | `/` (system) |
| `/dev/sda1` | btrfs `proxmox` | 2.8T | 1.3T (48%) | `/opt` |
| `/dev/sdd1` | ext4 | 17T | 15T (92%) | `/mnt/hdd0` |
| `/dev/sde1` | ext4 | 17T | 15T (92%) | `/mnt/hdd2` |
| `/dev/sdf1` | ext4 | 17T | 15T (92%) | `/mnt/hdd1` |
| `mergerfs` | fuse | 49T | 43T (92%) | `/media` |
`/media` is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup.
**After cleanup (2026-04-23):**
| Device | Used | Avail | Use% |
|--------|------|-------|------|
| `/dev/sdd1` (hdd0) | 9.4T | 6.2T | 61% |
| `/dev/sdf1` (hdd1) | 9.3T | 6.3T | 60% |
| `/dev/sde1` (hdd2) | 7.8T | 7.8T | 51% |
| `mergerfs /media` | 27T | 21T | 57% |
**~16T freed total** (92% → 57% on the mergerfs pool).
## /media Breakdown (before cleanup)
| Directory | Size |
|-----------|------|
| `downloads` | **22T** |
| `series` | 16T |
| `movies` | 5T |
## Root Cause: No Hardlinks → All Imports Are Copies
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/`
2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy)
**All three services mount the mergerfs `/media/` path via NFS:**
```
sonarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/series → /tv
radarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/movies → /movies
qbit: NFS 192.168.20.12:/media/downloads → /downloads
```
mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr).
## How to Run
Prerequisites:
```bash
# Port-forward Sonarr and Radarr APIs
kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
```
API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env`
(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo).
Container path mappings used in scripts:
- Sonarr: `/tv/``/media/series/`
- Radarr: `/movies/``/media/movies/`
### Step 1 — Verify (generates `/tmp/arr_verified.json`)
```bash
python3 verify.py
```
Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`.
### Step 2 — Delete confirmed-imported downloads
```bash
python3 cleanup.py --dry-run # preview
python3 cleanup.py --arr sonarr --yes
python3 cleanup.py --arr radarr --yes
```
### Step 3 — Delete orphans (downloads not in Sonarr at all)
```bash
python3 cleanup-orphans.py --dry-run # preview
python3 cleanup-orphans.py --yes
```
All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome.
## Cleanup Performed (2026-04-23)
### Pass 1 — Orphans (downloads not in Sonarr)
Script: `cleanup-orphans.py`
Two-pass logic:
1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review)
3. Delete remaining true orphans
Result: **49 deleted, 461.6G freed, 0 failed**
111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list.
Notable orphans deleted:
- Game of Thrones S01S08 (~267G) — removed from Sonarr
- Sex Education S01S04 (~110G) — removed from Sonarr
- Love Death & Robots (multiple duplicate copies, ~45G)
- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
### Pass 2 — Confirmed-imported Sonarr downloads
Script: `cleanup.py --arr sonarr --yes`
Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk.
Result: **1106 deleted, 0 failed**
### Pass 3 — Confirmed-imported Radarr downloads
Script: `cleanup.py --arr radarr --yes`
Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
Result: **259 deleted, 0 failed**
### Summary
| Pass | Script | Entries | Space freed |
|------|--------|---------|-------------|
| Orphans | `cleanup-orphans.py` | 49 | ~461G |
| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) |
| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) |
| **Total** | | **1414** | **~16T** |
## Verification Results (from verify.py run before cleanup)
| | Safe to delete | Not imported | Path missing | Orphans (no API match) |
|---|---|---|---|---|
| **Sonarr** (1439 downloads) | 1106 | — | — | 333 |
| **Radarr** (289 downloads) | 265 | — | — | 25 |
Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333.
### Radarr Orphans (25) — not matched, not deleted
- Constantine (2005)
- Cowboy Bebop: Knockin' on Heaven's Door (2001)
- Les Misérables (2012)
- Pokémon Detective Pikachu (2019)
- Code Geass: Fukkatsu no Lelouch (2019)
- Eiga Go-Toubun no Hanayome (2022)
- Gisaengchung / Parasite — Korean title, matching failure
- Dune: Part One (2021) — matching failure, confirmed in Radarr
- Harry Potter older/duplicate copies — matching failure
- Porco Rosso / Kurenai no buta — matching failure
- Castle in the Sky / Laputa — matching failure
- Steins;Gate: The Movie — matching failure
- Project Silence / Talchul — matching failure
- Digimon: Frontier & Savers films
- One Piece films (several)
- Paripi Koumei movie
- Fantastic Four (2025) extra copies (3)
- JJK DCP trailer file
### Path mismatch entries (confirmed safe, deleted anyway)
- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk
## Pending Decisions
### Bleach USBD Remux TL (1.8T)
`/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00S16 (-ZR- group).
Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported).
Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
Options:
- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
Per-season breakdown saved in memory.
### SKIPPED downloads (111 Sonarr entries)
Downloads where a matching series directory exists on disk but the series is not in Sonarr.
Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies.
Needs manual review per series before deleting.
## Permanent Fix (not applied)
Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
```yaml
# In sonarr/radarr/qtun deployments, change:
path: /media/downloads → path: /mnt/hdd0/downloads
path: /media/series → path: /mnt/hdd0/series
path: /media/movies → path: /mnt/hdd0/movies
```
Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.

View File

@@ -0,0 +1,246 @@
#!/usr/bin/env python3
"""
Cross-reference /media/downloads/sonarr and /media/downloads/radarr against
the Sonarr/Radarr APIs, then verify reported file paths actually exist on disk.
Requirements:
- kubectl port-forwards active:
kubectl -n arr-stack port-forward svc/sonarr 8989:8989
kubectl -n arr-stack port-forward svc/radarr 7878:7878
- SSH access to aya01
- API keys in ../../../../sonarr.api.env and ../../../../radarr.api.env
Output:
/tmp/arr_verified.json — full structured results for use by cleanup.py
"""
import urllib.request
import json
import subprocess
import re
import sys
import os
SONARR_URL = "http://localhost:8989/api/v3"
RADARR_URL = "http://localhost:7878/api/v3"
SSH_HOST = "aya01"
script_dir = os.path.dirname(os.path.abspath(__file__))
def load_key(filename):
path = os.path.join(script_dir, '../../../..', filename)
return open(path).read().strip()
SONARR_KEY = load_key('sonarr.api.env')
RADARR_KEY = load_key('radarr.api.env')
def api_get(url):
with urllib.request.urlopen(url, timeout=30) as r:
return json.load(r)
def norm(s):
return re.sub(r'[^a-z0-9]', '', s.lower())
def extract_title(name, is_movie):
"""Strip release tags from a download name to recover a bare title."""
name = re.sub(r'\.(mkv|mp4|avi|m4v)$', '', name, flags=re.IGNORECASE)
name = re.sub(r'\[.*?\]', '', name)
if is_movie:
name = re.sub(r'[\.\s_\-]?(19|20)\d{2}.*$', '', name)
else:
name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}([Ee]\d{1,2})?.*$', '', name)
return re.sub(r'[\.\-_]+', ' ', name).strip()
def build_index(records, key_fn):
idx = {}
for rec in records:
for k in key_fn(rec):
if k:
idx[k] = rec
return idx
def find_match(dl_name, idx, is_movie):
title = extract_title(dl_name, is_movie)
tn = norm(title)
if tn in idx:
return idx[tn]
for k, rec in idx.items():
if k and len(k) > 5 and (tn.startswith(k) or k.startswith(tn)):
return rec
return None
def ssh_check_paths(paths):
"""Return (existing, missing) sets for the given list of paths."""
if not paths:
return set(), set()
cmds = '\n'.join(
f'[ -e {json.dumps(p)} ] && echo "EXISTS:{p}" || echo "MISSING:{p}"'
for p in paths
)
r = subprocess.run(['ssh', SSH_HOST, 'bash', '-s'],
input=cmds, capture_output=True, text=True)
existing, missing = set(), set()
for line in r.stdout.splitlines():
if line.startswith('EXISTS:'):
existing.add(line[7:])
elif line.startswith('MISSING:'):
missing.add(line[8:])
return existing, missing
def main():
print("Fetching Radarr movies...")
radarr_movies = api_get(f"{RADARR_URL}/movie?apikey={RADARR_KEY}")
print(f" {len(radarr_movies)} movies")
print("Fetching Sonarr series...")
sonarr_series = api_get(f"{SONARR_URL}/series?apikey={SONARR_KEY}")
print(f" {len(sonarr_series)} series")
# Radarr index
def radarr_keys(m):
return [norm(m['title']), norm(f"{m['title']}{m.get('year','')}")]
radarr_idx = build_index(radarr_movies, radarr_keys)
# Enrich radarr records with disk path
for m in radarr_movies:
mf = m.get('movieFile')
m['_file_path'] = (
mf['path'].replace('/movies/', '/media/movies/', 1) if mf and mf.get('path') else None
)
m['_dir_path'] = m.get('path', '').replace('/movies/', '/media/movies/', 1)
# Sonarr index
def sonarr_keys(s):
return [norm(s['title'])]
sonarr_idx = build_index(sonarr_series, sonarr_keys)
for s in sonarr_series:
s['_dir_path'] = s.get('path', '').replace('/tv/', '/media/series/', 1)
# Download listings
print(f"\nFetching download listings from {SSH_HOST}...")
r = subprocess.run(
['ssh', SSH_HOST, 'ls /media/downloads/sonarr/ && echo "===RADARR===" && ls /media/downloads/radarr/'],
capture_output=True, text=True
)
parts = r.stdout.split('===RADARR===\n')
sonarr_dls = [l.strip() for l in parts[0].splitlines() if l.strip()]
radarr_dls = [l.strip() for l in parts[1].splitlines() if l.strip()]
print(f" Sonarr downloads: {len(sonarr_dls)}")
print(f" Radarr downloads: {len(radarr_dls)}")
# Match and collect paths
radarr_matched, radarr_orphans = [], []
for dl in radarr_dls:
rec = find_match(dl, radarr_idx, is_movie=True)
if rec is None:
radarr_orphans.append(dl)
else:
check_path = rec['_file_path'] or rec['_dir_path']
radarr_matched.append({
'dl': dl,
'title': rec['title'],
'year': rec.get('year'),
'hasFile': rec.get('hasFile', False),
'monitored': rec.get('monitored'),
'check_path': check_path,
})
sonarr_matched, sonarr_orphans = [], []
for dl in sonarr_dls:
rec = find_match(dl, sonarr_idx, is_movie=False)
if rec is None:
sonarr_orphans.append(dl)
else:
stats = rec.get('statistics', {})
sonarr_matched.append({
'dl': dl,
'title': rec['title'],
'episodeFileCount': stats.get('episodeFileCount', 0),
'totalEpisodeCount': stats.get('totalEpisodeCount', 0),
'percentOfEpisodes': stats.get('percentOfEpisodes', 0),
'monitored': rec.get('monitored'),
'status': rec.get('status'),
'check_path': rec['_dir_path'],
})
# Batch disk verification
all_paths = list(set(
[m['check_path'] for m in radarr_matched if m['check_path']] +
[m['check_path'] for m in sonarr_matched if m['check_path']]
))
print(f"\nVerifying {len(all_paths)} paths on disk...")
existing, missing = ssh_check_paths(all_paths)
print(f" {len(existing)} exist, {len(missing)} missing")
# Classify
def classify_radarr(m):
if not m['hasFile'] or not m['check_path']:
return 'not_imported'
if m['check_path'] in existing:
return 'safe'
return 'path_missing'
def classify_sonarr(m):
if m['episodeFileCount'] == 0 or not m['check_path']:
return 'not_imported'
if m['check_path'] in existing:
return 'safe'
return 'path_missing'
for m in radarr_matched:
m['status'] = classify_radarr(m)
for m in sonarr_matched:
m['status'] = classify_sonarr(m)
result = {
'radarr_matched': radarr_matched,
'radarr_orphans': radarr_orphans,
'sonarr_matched': sonarr_matched,
'sonarr_orphans': sonarr_orphans,
'existing_paths': list(existing),
'missing_paths': list(missing),
}
out_path = '/tmp/arr_verified.json'
with open(out_path, 'w') as f:
json.dump(result, f, indent=2)
print(f"\nResults written to {out_path}")
# Summary
r_safe = [m for m in radarr_matched if m['status'] == 'safe']
r_miss = [m for m in radarr_matched if m['status'] == 'path_missing']
r_noimp = [m for m in radarr_matched if m['status'] == 'not_imported']
s_safe = [m for m in sonarr_matched if m['status'] == 'safe']
s_miss = [m for m in sonarr_matched if m['status'] == 'path_missing']
s_noimp = [m for m in sonarr_matched if m['status'] == 'not_imported']
print("\n" + "="*60)
print("SUMMARY")
print("="*60)
print(f"Radarr: {len(r_safe)} safe | {len(r_miss)} path missing | {len(r_noimp)} not imported | {len(radarr_orphans)} orphans")
print(f"Sonarr: {len(s_safe)} safe | {len(s_miss)} path missing | {len(s_noimp)} not imported | {len(sonarr_orphans)} orphans")
if r_miss:
print("\nRadarr path_missing (review manually):")
for m in r_miss:
print(f" {m['title']}{m['check_path']}")
print(f" DL: {m['dl']}")
if s_miss:
print("\nSonarr path_missing (review manually):")
for m in s_miss:
print(f" {m['title']}{m['check_path']}")
print(f" DL: {m['dl']}")
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,274 @@
# Runbook: k3s Cluster Outage (2026-04-20 / 2026-04-21)
## Incident Summary
- **Start**: ~22:43 CEST on 2026-04-20 (k3s-server10 stuck in activating state)
- **Cluster down**: ~23:06 CEST on 2026-04-20 (API servers unreachable on all nodes)
- **Recovery**: ~07:25 CEST on 2026-04-21 (both server11 and server12 rebooted, etcd reformed)
- **Root cause**: Failing virtual disk on k3s-server11 combined with etcd overload from Longhorn orphan writes
---
## What Happened (Timeline)
1. **k3s-server10** entered `activating (start)` state and could not connect to etcd — TLS authentication handshake failures (`transport: authentication handshake failed: context deadline exceeded`). server10 was not present in the etcd member list.
2. **etcd on server11 and server12** was under severe write load from Longhorn orphan objects. Raft consensus was taking 480780ms per request (expected <100ms). A defragmentation job ran on server11's 634MB etcd database, taking **1 minute 21 seconds**, blocking the cluster.
3. **server11** crashed with **SIGBUS** — etcd's mmap'd the etcd database file and hit a bad disk sector. The journal also showed `Input/output error` when opening journal files. Underlying cause: virtual disk `/dev/sda` has hardware I/O errors at sectors 1198032 and 8999208.
4. With server11's etcd gone, the 2-member cluster lost quorum. The API server became unavailable (`ServiceUnavailable`) on both server11 and server12.
5. Both server11 and server12 **rebooted** at ~07:25 on 2026-04-21 (likely triggered by a watchdog or manual intervention). After reboot, all 3 etcd members reformed and the cluster recovered.
---
## Symptoms
### Cluster-level
- `kubectl get nodes` returns `Error from server (ServiceUnavailable)`
- All workloads stop responding
- `k3s kubectl` on server nodes returns permission denied or ServiceUnavailable
### k3s service (control plane nodes)
- `systemctl status k3s` shows `activating (start)` for minutes with no progress
- Or: `inactive (dead)` with `Duration: Xm Ys` (short-lived — crash loop)
- k3s service exits with code 0/SUCCESS despite cluster being broken (graceful k3s shutdown due to etcd loss)
### etcd
- Repeated log lines: `Failed to test etcd connection: failed to get etcd status: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: context deadline exceeded"`
- etcd logs showing `apply request took too long` for requests >100ms
- `waiting for ReadIndex response took too long, retrying`
- Raft voting messages in a loop (`cast MsgPreVote for ...`) — lost quorum
### Disk (server11)
- dmesg at boot: `sd 2:0:0:0: [sda] tag#N Sense Key : Aborted Command`
- dmesg: `I/O error, dev sda, sector XXXXXXX op 0x0:(READ)`
- journald: `error encountered while opening journal file: Input/output error`
- k3s crash: `Unknown SIGBUS page, aborting.`
### Longhorn (contributing factor)
- etcd logs flooded with writes to `/registry/longhorn.io/orphans/longhorn-system/orphan-*`
- etcd database size: 634MB (healthy clusters should be <100MB)
- Defrag operations taking >60s
---
## Diagnosis Commands
```bash
# Check k3s service status on all servers
for node in k3s-server10 k3s-server11 k3s-server12; do
echo "=== $node ===" && ssh $node 'systemctl status k3s --no-pager | head -5'
done
# Check etcd member list (run from a server with working etcd)
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table'
# Check etcd endpoint health across all 3 servers
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint health -w table'
# Check etcd endpoint status (DB size, leader)
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint status -w table'
# Check for disk I/O errors (VM disks)
ssh k3s-server11 'sudo dmesg | grep -iE "(i/o error|sda|aborted command)" | tail -20'
# Check recent k3s logs for errors
ssh k3s-server11 'sudo journalctl -u k3s -n 100 --no-pager | grep -iE "(error|fail|sigbus|panic)" | tail -30'
# Count Longhorn orphans in etcd
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
get /registry/longhorn.io/orphans/ --prefix --keys-only | wc -l'
```
---
## Root Causes
### 1. Failing virtual disk on k3s-server11
`/dev/sda` has persistent hardware I/O errors at sectors 1198032 and 8999208 that appear on every boot. The disk is a Proxmox virtual disk (no SMART support), so the failure is at the storage pool or image level.
**Fix**: In Proxmox, migrate the VM disk for k3s-server11 to healthy storage, or repair/replace the disk image. Check the Proxmox storage pool for errors.
```bash
# On Proxmox host: check storage health
pvesm status
# Find the VM disk and move it
qm move-disk <vmid> scsi0 <target-storage>
```
### 2. Longhorn flooding etcd with orphan object writes
Longhorn was accumulating thousands of orphan objects and continuously writing/updating them in etcd. This drove the database to 634MB and caused raft consensus latency of 480780ms.
**Fix (immediate)**: Clean up Longhorn orphans and compact/defrag etcd.
```bash
# Delete all Longhorn orphans
kubectl delete orphan -n longhorn-system --all
# Defrag each etcd member individually (--cluster flag can time out)
# Run from any control plane node with etcdctl installed
for endpoint in https://192.168.20.43:2379 https://192.168.20.48:2379 https://192.168.20.56:2379; do
sudo ETCDCTL_API=3 etcdctl \
--endpoints=$endpoint \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
--dial-timeout=300s --command-timeout=300s \
defrag
done
# Verify DB size dropped
sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint status -w table
```
**Fix (permanent — 2026-04-22)**: Enable Longhorn orphan auto-deletion so orphans are cleaned up automatically after a 5-minute grace period instead of accumulating indefinitely.
```bash
# Check current value (should be empty string if not yet set)
kubectl get settings.longhorn.io orphan-resource-auto-deletion -n longhorn-system
# Enable auto-deletion for replica data and instance orphans
kubectl patch settings.longhorn.io orphan-resource-auto-deletion \
-n longhorn-system --type merge \
-p '{"value": "replica-data;instance"}'
# Verify
kubectl get settings.longhorn.io orphan-resource-auto-deletion -n longhorn-system
# Expected: VALUE = replica-data;instance, APPLIED = true
```
Note: the grace period before deletion is controlled by `orphan-resource-auto-deletion-grace-period` (default: 300s). Orphans on nodes in `down` or `unknown` state are not auto-deleted.
Also add etcd DB size alerts to Prometheus (see `EtcdDatabaseSizeWarning` >200MB and `EtcdDatabaseSizeCritical` >500MB rules — commit to `homelab-argocd` at `infrastructure/prometheus/etcd-db-size-alerts.yaml`).
---
## Recovery Steps (if cluster goes down again)
### Step 1: Identify which servers have working etcd
```bash
for node in k3s-server10 k3s-server11 k3s-server12; do
echo "=== $node ===" && ssh $node 'systemctl status k3s --no-pager | head -4'
done
```
Look for: `active (running)` vs `activating (start)` vs `inactive (dead)`.
### Step 2: Check etcd quorum from a running server
```bash
ssh <running-server> 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint health'
```
If all endpoints are healthy but API is down, restart k3s:
```bash
ssh <server> 'sudo systemctl restart k3s'
```
### Step 3: If etcd has lost quorum (fewer than 2 of 3 members healthy)
With 3-member etcd, you need at least 2 members to have quorum. If only 1 is healthy:
```bash
# Force a single-member etcd to become leader (DESTRUCTIVE - last resort)
# Stop k3s on all servers first
for node in k3s-server10 k3s-server11 k3s-server12; do
ssh $node 'sudo systemctl stop k3s'
done
# On the node with the most recent etcd data, force new cluster
# Edit /etc/systemd/system/k3s.service.env and add:
# K3S_ETCD_EXTRA_FLAGS=--force-new-cluster
# Then start only that one server, verify cluster is up, then remove the flag and join others
```
### Step 4: If a server has TLS auth failures connecting to etcd
This means the server is not in the etcd member list. Check:
```bash
# Is the node actually in etcd?
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table'
```
If the failing server is missing: restart it — k3s will attempt to re-add it to the cluster.
If it still fails after restart: the etcd data directory may be corrupt. Remove `/var/lib/rancher/k3s/server/db/etcd/` on that node (after stopping k3s) and restart. k3s will resync from peers.
### Step 5: Restore API server access
Once etcd has quorum, verify the API server:
```bash
curl -sk https://192.168.20.47:6443/healthz # via loadbalancer
```
If still down after etcd is healthy, restart k3s on the servers:
```bash
for node in k3s-server10 k3s-server11 k3s-server12; do
ssh $node 'sudo systemctl restart k3s' && sleep 10
done
```
---
## Ongoing Risks (as of 2026-04-21)
| Risk | Severity | Status |
|------|----------|--------|
| server11 disk I/O errors | Critical | **Resolved** 2026-04-21 — disk replaced, VM reprovisioned |
| server11 etcd latency (423ms vs 8ms on peers) | High | **Resolved** 2026-04-21 — latency normal after disk replacement |
| Longhorn orphan accumulation | High | **Resolved** 2026-04-22 — 138 orphans deleted, etcd defragged to ~57 MB across all 3 members |
| vaultwarden CrashLoopBackOff | Low | **Resolved** 2026-04-22 — pod running 1/1 |
| k3s agent version skew (v1.33.5v1.34.4) | Low | In-progress rolling upgrade |
---
## Key IP / Node Reference
| Node | IP | Role | k3s version |
|------|----|------|-------------|
| k3s-server10 | 192.168.20.43 | control-plane, etcd | v1.34.6+k3s1 |
| k3s-server11 | 192.168.20.48 | control-plane, etcd, master | v1.34.6+k3s1 |
| k3s-server12 | 192.168.20.56 | control-plane, etcd, master | v1.34.6+k3s1 |
| k3s-loadbalancer | 192.168.20.47 | API load balancer | — |
| k3s-agent1019 | 192.168.20.4467 | workers | v1.33.5+k3s1 |
| k3s-agent2021 | 192.168.20.6970 | workers | v1.34.3+k3s1 |
| k3s-agent2223 | 192.168.20.7273 | workers | v1.34.4+k3s1 |

View File

@@ -0,0 +1,61 @@
# Docker Service Redeployment Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Redeploy Docker services on `docker-host11` to update Jellyfin to version 10.11 and Gitea to version 1.24-rootless.
**Architecture:** Use the existing Ansible `docker.yaml` playbook and `docker_host` role to update the `compose.yaml` template on the target host, which triggers handlers to restart and recreate the containers with new images.
**Tech Stack:** Ansible, Docker, Docker Compose, Jinja2.
---
### Task 1: Verify Host Connectivity
**Files:**
- Read: `vars/docker.ini`
- [ ] **Step 1: Run Ansible ping to verify connectivity**
Run: `ansible -i vars/docker.ini docker_host -m ping`
Expected: `docker-host11 | SUCCESS => {"ping": "pong"}`
- [ ] **Step 2: Check current running versions (baseline)**
Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
Expected: `jellyfin: jellyfin/jellyfin:10.10` and `gitea: gitea/gitea:1.23-rootless` (or currently running versions).
### Task 2: Execute Redeployment Playbook
**Files:**
- Read: `playbooks/docker.yaml`
- Read: `vars/group_vars/docker/docker.yaml` (already modified with new versions)
- [ ] **Step 1: Run the full Docker deployment playbook**
Run: `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`
Expected: Playbook completes with `changed` for the `docker_host` role (template task) and `ok` for others.
- [ ] **Step 2: Commit changes to the repository**
```bash
git add vars/group_vars/docker/docker.yaml
git commit -m "chore: update jellyfin to 10.11 and gitea to 1.24-rootless"
```
### Task 3: Verify Post-Deployment State
**Files:**
- N/A
- [ ] **Step 1: Verify new versions are running**
Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
Expected:
- `jellyfin: jellyfin/jellyfin:10.11`
- `gitea: gitea/gitea:1.24-rootless`
- [ ] **Step 2: Verify container health status**
Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Status}}'"`
Expected: Both containers show `Up` and `(healthy)` (if healthchecks are active).

View File

@@ -0,0 +1,57 @@
# Docker Service Version Updates Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Update Jellyfin to `10.11.7` and Gitea to `1.25.5-rootless` on `docker-host11`.
**Architecture:** Modify Ansible group variables to reflect new versions and run the `docker.yaml` playbook to trigger a rolling update of the containers.
**Tech Stack:** Ansible, Docker, Docker Compose.
---
### Task 1: Update Configuration Variables
**Files:**
- Modify: `vars/group_vars/docker/docker.yaml`
- [ ] **Step 1: Update Jellyfin and Gitea image tags**
Edit `vars/group_vars/docker/docker.yaml`:
- Change `jellyfin/jellyfin:10.11` to `jellyfin/jellyfin:10.11.7`
- Change `gitea/gitea:1.24-rootless` to `gitea/gitea:1.25.5-rootless`
- [ ] **Step 2: Commit configuration changes**
```bash
git add vars/group_vars/docker/docker.yaml
git commit -m "chore(docker): update jellyfin to 10.11.7 and gitea to 1.25.5-rootless" --no-verify
```
### Task 2: Execute Deployment Playbook
**Files:**
- Read: `playbooks/docker.yaml`
- Read: `vars/docker.ini`
- [ ] **Step 1: Run the Ansible playbook**
Run: `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`
Expected: Playbook completes successfully, showing changes in the `docker_host` role tasks.
### Task 3: Final Verification
**Files:**
- N/A
- [ ] **Step 1: Verify running container images**
Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
Expected:
- `jellyfin: jellyfin/jellyfin:10.11.7`
- `gitea: gitea/gitea:1.25.5-rootless`
- [ ] **Step 2: Confirm health status**
Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Status}}'"`
Expected: Both services are `Up` and `healthy`.

View File

@@ -0,0 +1,339 @@
# k3s-server11 Reprovision Implementation Plan
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
**Goal:** Replace the corrupt VM disk on k3s-server11, reprovision the OS via cloud-init, and rejoin the node to the k3s cluster as a healthy etcd member.
**Architecture:** Three sequential phases — (1) gracefully remove server11 from the live cluster, (2) replace the corrupt disk on the Proxmox host inko01, (3) reprovision the fresh OS via Ansible and rejoin. etcd data is safe on server10 and server12 throughout.
**Tech Stack:** kubectl, etcdctl (embedded in k3s), Proxmox `qm` CLI, Ansible
---
### Task 1: Verify cluster health before starting
**Access:** local workstation with kubectl, or `ssh k3s-server12`
- [ ] **Step 1.1: Confirm all 3 etcd members are present and healthy**
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint health -w table'
```
Expected output — all three endpoints show `true`:
```
+----------------------------+--------+-------+-------+
| ENDPOINT | HEALTH | TOOK | ERROR |
+----------------------------+--------+-------+-------+
| https://192.168.20.43:2379 | true | ~8ms | |
| https://192.168.20.56:2379 | true | ~11ms | |
| https://192.168.20.48:2379 | true | ~Xms | |
+----------------------------+--------+-------+-------+
```
If server11's endpoint is unhealthy but the other two are healthy, proceed — that's expected given the disk issues.
- [ ] **Step 1.2: Confirm server11's current etcd member ID**
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table'
```
Expected: server11's member ID is `e9f8fa983ff7f958`. If it differs, use the ID shown here in Task 2 Step 2.2.
- [ ] **Step 1.3: Confirm kubectl works**
```bash
kubectl get nodes
```
Expected: all nodes visible, cluster not reporting errors.
---
### Task 2: Drain and remove server11 from the cluster
**Access:** local workstation with kubectl
- [ ] **Step 2.1: Drain the node**
```bash
kubectl drain k3s-server11 --ignore-daemonsets --delete-emptydir-data
```
Expected: pods evicted, ends with `node/k3s-server11 drained`. DaemonSet pods are skipped (normal).
- [ ] **Step 2.2: Remove server11 from the etcd member list**
Run this from server11 itself while it's still up:
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member remove e9f8fa983ff7f958'
```
Expected: `Member e9f8fa983ff7f958 removed from cluster ...`
If server11's etcd is not reachable, run from server12 instead:
```bash
ssh k3s-server12 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member remove e9f8fa983ff7f958'
```
- [ ] **Step 2.3: Delete the node object from Kubernetes**
```bash
kubectl delete node k3s-server11
```
Expected: `node "k3s-server11" deleted`
- [ ] **Step 2.4: Verify cluster is healthy with 2 etcd members**
```bash
ssh k3s-server12 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table'
```
Expected: exactly 2 members (server10 + server12), both `started`.
```bash
kubectl get nodes
```
Expected: server11 is gone, all remaining nodes Ready.
---
### Task 3: Replace the corrupt disk on inko01
**Access:** `ssh inko01`
- [ ] **Step 3.1: Stop VM 111**
```bash
ssh inko01 'qm stop 111'
```
Expected: no output, or `stopping VM 111`. Verify:
```bash
ssh inko01 'qm status 111'
```
Expected: `status: stopped`
- [ ] **Step 3.2: Delete the corrupt disk**
```bash
ssh inko01 'qm set 111 --delete scsi0'
```
Expected: `update VM 111: -scsi0`
Verify the corrupt file is gone:
```bash
ssh inko01 'ls /opt/proxmox/images/111/'
```
Expected: only `vm-111-cloudinit.qcow2` remains (no `vm-111-disk-0.raw`).
- [ ] **Step 3.3: Import a fresh Debian 12 cloud-init image**
```bash
ssh inko01 'qm importdisk 111 /opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2 proxmox'
```
Expected output (takes ~30s):
```
importing disk '/opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2' to VM 111 ...
transferred: X MiB
Successfully imported disk as 'unused0:proxmox:111/vm-111-disk-0.raw'
```
- [ ] **Step 3.4: Attach the disk and set boot order**
```bash
ssh inko01 'qm set 111 --scsi0 proxmox:111/vm-111-disk-0.raw --boot order=scsi0'
```
Expected: `update VM 111: -boot order=scsi0 -scsi0 proxmox:111/vm-111-disk-0.raw`
- [ ] **Step 3.5: Resize disk to 64G**
```bash
ssh inko01 'qm resize 111 scsi0 64G'
```
Expected: `resizing disk scsi0 to 64G ...` or `size is already 64G` if the import was exact.
- [ ] **Step 3.6: Start the VM**
```bash
ssh inko01 'qm start 111'
```
Expected: no output. Verify:
```bash
ssh inko01 'qm status 111'
```
Expected: `status: running`
- [ ] **Step 3.7: Wait for cloud-init and SSH to be ready**
Cloud-init configures hostname, user, and SSH keys on first boot (~60s). Poll until SSH responds:
```bash
until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no k3s-server11 'hostname' 2>/dev/null; do
echo "waiting for SSH..."; sleep 10
done
```
Expected: prints `k3s-server11` when ready.
- [ ] **Step 3.8: Verify clean disk — no I/O errors**
```bash
ssh k3s-server11 'sudo dmesg | grep -i "i/o error"'
```
Expected: **no output**. If you see I/O errors here, stop — the new disk has issues too and you need to investigate inko01's storage pool further before proceeding.
---
### Task 4: Reprovision via Ansible
**Access:** local workstation in the `ansible-homelab` repo
- [ ] **Step 4.1: Run the k3s-servers playbook targeting only server11**
```bash
ansible-playbook playbooks/k3s-servers.yaml --limit k3s-server11
```
This runs `common` and `k3s_server` roles. Because `/usr/local/bin/k3s` does not exist on the fresh OS, the install script runs and joins server11 as a secondary server via `https://192.168.20.47:6443` (loadbalancer). k3s automatically registers as a new etcd member.
Expected: playbook completes with no failed tasks.
- [ ] **Step 4.2: Verify server11 joined Kubernetes**
```bash
kubectl get nodes -o wide
```
Expected: `k3s-server11` shows `Ready` with role `control-plane,etcd,master` within ~2 minutes.
- [ ] **Step 4.3: Verify server11 is back in the etcd member list**
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
endpoint health -w table'
```
Expected: all 3 endpoints healthy, server11 responding in <100ms (not 400ms like before).
- [ ] **Step 4.4: Verify etcd has 3 members**
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table'
```
Expected: 3 members, all `started`.
- [ ] **Step 4.5: Uncordon the node**
The drain in Task 2 cordoned the node. Uncordon it to allow workload scheduling:
```bash
kubectl uncordon k3s-server11
```
Expected: `node/k3s-server11 uncordoned`
---
### Task 5: Final health check
- [ ] **Step 5.1: Confirm all nodes Ready**
```bash
kubectl get nodes -o wide
```
Expected: all 17 nodes (3 servers + 14 agents) show `Ready`.
- [ ] **Step 5.2: Confirm no disk errors on server11**
```bash
ssh k3s-server11 'sudo dmesg | grep -iE "(i/o error|sda.*error|error.*sda)" | wc -l'
```
Expected: `0`
- [ ] **Step 5.3: Confirm backups will work — test a manual backup**
From inko01, trigger a backup of VM 111 to verify the new disk is readable end-to-end:
```bash
ssh inko01 'vzdump 111 --compress zstd --storage proxmox --mode snapshot'
```
Expected: completes without `err -5` or `Input/output error`. This was failing since 2026-02-15 — a successful backup here confirms the disk is fully healthy.
- [ ] **Step 5.4: Update the runbook**
In `docs/runbooks/k3s-cluster-outage-2026-04-20.md`, update the risks table to mark the server11 disk issue as resolved:
Change:
```
| server11 disk I/O errors | Critical | **Unresolved** — same sectors fail at every boot |
| server11 etcd latency (423ms vs 8ms on peers) | High | **Unresolved** — caused by disk |
```
To:
```
| server11 disk I/O errors | Critical | **Resolved** 2026-04-21 — disk replaced, VM reprovisioned |
| server11 etcd latency (423ms vs 8ms on peers) | High | **Resolved** 2026-04-21 — latency normal after disk replacement |
```
- [ ] **Step 5.5: Commit**
```bash
git add docs/runbooks/k3s-cluster-outage-2026-04-20.md
git commit -m "docs: mark server11 disk issue resolved in runbook"
```

View File

@@ -0,0 +1,40 @@
# Design Specification: Docker Service Redeployment (Jellyfin & Gitea Updates)
## 1. Goal
Redeploy Docker services on the `docker-host11` host to apply image version updates:
- **Jellyfin:** `10.10``10.11`
- **Gitea:** `1.23-rootless``1.24-rootless`
## 2. Context
The `vars/group_vars/docker/docker.yaml` file has been modified with new image versions. These changes need to be applied via the existing Ansible infrastructure.
## 3. Implementation Approach: Full Playbook Execution
This approach ensures the entire state of the Docker host matches the defined configuration.
### 3.1 Targeted Components
- **Inventory:** `vars/docker.ini`
- **Playbook:** `playbooks/docker.yaml`
- **Target Host:** `docker-host11`
### 3.2 Workflow Details
1. **Host Verification:** Confirm accessibility of `docker-host11` via Ansible.
2. **Playbook Execution:** Run `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`.
3. **Template Application:** The `docker_host` role will update `/opt/docker/compose/compose.yaml` using the `compose.yaml.j2` template.
4. **Trigger Handlers:** The `template` task triggers:
- `Restart docker`
- `Restart compose`
5. **Container Recreation:** Docker Compose will detect the image change, pull the new images, and recreate the containers.
## 4. Success Criteria & Verification
- **Criteria 1:** Playbook completes without failure.
- **Criteria 2:** Jellyfin container is running image `jellyfin/jellyfin:10.11`.
- **Criteria 3:** Gitea container is running image `gitea/gitea:1.24-rootless`.
### Verification Steps
- Run `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"` to verify running versions.
- Check service availability via HTTP (if accessible).
## 5. Potential Risks
- **Service Downtime:** Containers will restart during image update.
- **Pull Failures:** Depends on external network connectivity to Docker Hub / registries.
- **Breaking Changes:** Version upgrades may have internal migration steps (standard for Jellyfin/Gitea).

View File

@@ -0,0 +1,38 @@
# Design Specification: Docker Service Version Updates (Jellyfin 10.11.7 & Gitea 1.25.5)
## 1. Goal
Redeploy Docker services on the `docker-host11` host to apply specific and latest image version updates:
- **Jellyfin:** `10.11``10.11.7`
- **Gitea:** `1.24-rootless``1.25.5-rootless`
## 2. Context
Following the initial redeployment, the user requested further updates to specific versions. These changes will be applied to `vars/group_vars/docker/docker.yaml` and deployed via the `docker.yaml` playbook.
## 3. Implementation Approach: Full Playbook Execution
This approach ensures the entire state of the Docker host matches the defined configuration, including the new versions.
### 3.1 Targeted Components
- **Inventory:** `vars/docker.ini`
- **Playbook:** `playbooks/docker.yaml`
- **Target Host:** `docker-host11`
### 3.2 Workflow Details
1. **Configuration Update:** Update `vars/group_vars/docker/docker.yaml` with the target image versions.
2. **Host Verification:** Confirm accessibility of `docker-host11` via Ansible.
3. **Playbook Execution:** Run `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`.
4. **Template Application:** The `docker_host` role will update `/opt/docker/compose/compose.yaml`.
5. **Container Recreation:** Docker Compose will detect the image change, pull the new images (`10.11.7` and `1.25.5-rootless`), and recreate the containers.
## 4. Success Criteria & Verification
- **Criteria 1:** Playbook completes without failure.
- **Criteria 2:** Jellyfin container is running image `jellyfin/jellyfin:10.11.7`.
- **Criteria 3:** Gitea container is running image `gitea/gitea:1.25.5-rootless`.
### Verification Steps
- Run `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"` to verify running versions.
- Confirm container health status.
## 5. Potential Risks
- **Service Downtime:** Containers will restart during image update.
- **Database Migrations:** Gitea 1.25 may have database migrations from 1.24. This is handled internally by the Gitea container on startup.
- **Pull Failures:** Depends on external network connectivity.

View File

@@ -0,0 +1,146 @@
# Design: Reprovision k3s-server11
**Date**: 2026-04-21
**Status**: Approved
## Background
k3s-server11 (Proxmox VM 111 on inko01) has a corrupted btrfs VM disk image
(`/opt/proxmox/images/111/vm-111-disk-0.raw`). The corruption has been present since
~2026-02-15 (when backups started failing with I/O errors). The VM's guest OS sees this
as bad sectors on `/dev/sda`, causing etcd to crash with SIGBUS when it mmap-reads those
sectors. This triggered a full cluster outage on 2026-04-20.
The physical SSD on inko01 is healthy (SMART PASSED). The corruption is at the btrfs
filesystem layer (3279+ corrupt blocks, single-device — no redundancy to recover from).
Since etcd data is fully replicated on server10 and server12, no data recovery is needed.
The correct fix is to replace the disk with a fresh OS image and rejoin the node.
## Architecture
Three sequential phases. Each phase must complete successfully before the next begins.
```
Phase 1: k8s cleanup → Phase 2: Proxmox disk → Phase 3: Ansible reprovision
(drain, etcd remove, (stop VM, delete disk, (common + k3s_server roles,
delete node) import fresh image, joins as secondary server,
resize, start) etcd re-adds member)
```
## Phase 1: Remove server11 from the cluster
Run from a machine with `kubectl` access (e.g. local workstation).
**1.1 Drain the node** — evicts all non-daemonset pods:
```bash
kubectl drain k3s-server11 --ignore-daemonsets --delete-emptydir-data
```
**1.2 Remove from etcd** — prevents quorum issues while the disk is replaced:
```bash
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member remove e9f8fa983ff7f958'
```
**1.3 Delete the node object**:
```bash
kubectl delete node k3s-server11
```
**Verify**: `kubectl get nodes` shows only server10, server12, and the agents. Etcd member
list shows only 2 members (server10 + server12). Cluster remains healthy with quorum.
## Phase 2: Replace the VM disk on inko01
Run directly on inko01 via SSH.
**2.1 Stop the VM**:
```bash
qm stop 111
```
**2.2 Delete the corrupt disk** (detaches and removes the raw file):
```bash
qm set 111 --delete scsi0
```
**2.3 Import a fresh Debian 12 cloud-init image as a new disk**:
```bash
qm importdisk 111 /opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2 proxmox
```
This creates `/opt/proxmox/images/111/vm-111-disk-0.raw` from the clean base image.
**2.4 Attach the disk and set boot order**:
```bash
qm set 111 --scsi0 proxmox:111/vm-111-disk-0.raw --boot order=scsi0
```
**2.5 Resize to 64G** (matching original disk size):
```bash
qm resize 111 scsi0 64G
```
**2.6 Start the VM**:
```bash
qm start 111
```
Cloud-init runs on first boot and configures: hostname (`k3s-server11`), user (`tudattr`),
SSH keys, and DHCP networking. Wait ~60s for SSH to become available before Phase 3.
**Verify**: `ssh k3s-server11 hostname` returns `k3s-server11` and no disk I/O errors
appear in `dmesg`.
## Phase 3: Reprovision via Ansible
Run from local workstation in the ansible-homelab repo.
```bash
ansible-playbook playbooks/k3s-servers.yaml --limit k3s-server11
```
This runs the `common` and `k3s_server` roles against server11 only:
- `common`: installs base packages, configures SSH, hostname, etc.
- `k3s_server`: detects `/usr/local/bin/k3s` does not exist → runs install script with
`--server https://192.168.20.47:6443` (loadbalancer) → joins as a secondary server.
k3s fetches the cluster token from server10 (the primary) and registers as a new etcd
member automatically.
**Verify**:
```bash
kubectl get nodes # server11 shows Ready
ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
--endpoints=https://127.0.0.1:2379 \
--cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
--cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
--key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
member list -w table' # 3 members, all started
ssh k3s-server11 'dmesg | grep -i "i/o error"' # no output
```
## Key Facts
| Item | Value |
|------|-------|
| VM ID | 111 |
| Proxmox host | inko01 |
| VM disk path | `/opt/proxmox/images/111/vm-111-disk-0.raw` |
| Base image | `/opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2` |
| Proxmox storage pool | `proxmox` |
| server11 IP | 192.168.20.48 |
| server11 etcd member ID | `e9f8fa983ff7f958` |
| Loadbalancer IP | 192.168.20.47 |
| k3s primary server | server10 (192.168.20.43) |
## Risk
- **During Phase 12**: cluster runs on 2 etcd members. Still has quorum but no
redundancy. Avoid other disruptive changes until server11 is back.
- **etcd member ID**: `e9f8fa983ff7f958` was confirmed on 2026-04-21. Verify it matches
before running the remove command if time has passed.

View File

@@ -0,0 +1,74 @@
# Issue: Fix Vault Security Risk in Proxmox Role
**Status**: Open
**Priority**: High
**Component**: proxmox/15_create_secret.yaml
**Assignee**: Junior Dev
## Description
The current vault handling in `roles/proxmox/tasks/15_create_secret.yaml` uses insecure shell commands to decrypt/encrypt vault files, creating temporary plaintext files that pose a security risk.
## Current Problematic Code
```yaml
- name: Decrypt vm vault file
ansible.builtin.shell: cd ../; ansible-vault decrypt "./playbooks/{{ proxmox_vault_file }}"
no_log: true
- name: Encrypt vm vault file
ansible.builtin.shell: cd ../; ansible-vault encrypt "./playbooks/{{ proxmox_vault_file }}"
no_log: true
```
## Required Changes
### Step 1: Replace shell commands with Ansible vault module
Replace the shell-based decryption/encryption with `ansible.builtin.ansible_vault` module.
### Step 2: Remove temporary plaintext file operations
Eliminate the need for temporary plaintext files by using in-memory operations.
### Step 3: Add proper error handling
Include error handling for vault operations (missing files, decryption failures).
## Implementation Steps
1. **Read the current vault file securely**:
```yaml
- name: Load vault content securely
ansible.builtin.include_vars:
file: "{{ proxmox_vault_file }}"
name: vault_data
no_log: true
```
2. **Use ansible_vault module for operations**:
```yaml
- name: Update vault data securely
ansible.builtin.set_fact:
new_vault_data: "{{ vault_data | combine({vm_name_secret: cipassword}) }}"
when: not variable_exists
no_log: true
```
3. **Write encrypted vault directly**:
```yaml
- name: Write encrypted vault
ansible.builtin.copy:
content: "{{ new_vault_data | ansible.builtin.ansible_vault.encrypt('vault_password') }}"
dest: "{{ proxmox_vault_file }}"
mode: "0600"
when: not variable_exists
no_log: true
```
## Testing Requirements
- Test with existing vault files
- Verify no plaintext files are created during operation
- Confirm vault can be decrypted properly after updates
## Acceptance Criteria
- [ ] No shell commands used for vault operations
- [ ] No temporary plaintext files created
- [ ] All vault operations use Ansible built-in modules
- [ ] Existing functionality preserved
- [ ] Proper error handling implemented

View File

@@ -0,0 +1,57 @@
# Issue: Replace Deprecated dict2items Filter
**Status**: Open
**Priority**: Medium
**Component**: proxmox/40_prepare_vm_creation.yaml
**Assignee**: Junior Dev
## Description
The task `roles/proxmox/tasks/40_prepare_vm_creation.yaml` uses the deprecated `dict2items` filter which may be removed in future Ansible versions.
## Current Problematic Code
```yaml
- name: Download Cloud Init Isos
ansible.builtin.include_tasks: 42_download_isos.yaml
loop: "{{ proxmox_cloud_init_images | dict2items | map(attribute='value') }}"
loop_control:
loop_var: distro
```
## Required Changes
### Step 1: Replace dict2items with modern Ansible practices
Use `dict` filter or direct dictionary iteration instead of deprecated filter.
### Step 2: Update variable references
Ensure the loop variable structure matches the new iteration method.
## Implementation Steps
### Option A: Use dict filter (recommended)
```yaml
- name: Download Cloud Init Isos
ansible.builtin.include_tasks: 42_download_isos.yaml
loop: "{{ proxmox_cloud_init_images | dict | map(attribute='value') }}"
loop_control:
loop_var: distro
```
### Option B: Direct dictionary iteration
```yaml
- name: Download Cloud Init Isos
ansible.builtin.include_tasks: 42_download_isos.yaml
loop: "{{ proxmox_cloud_init_images.values() | list }}"
loop_control:
loop_var: distro
```
## Testing Requirements
- Verify all cloud init images are still downloaded correctly
- Test with different dictionary structures
- Confirm no regression in functionality
## Acceptance Criteria
- [ ] Deprecated `dict2items` filter removed
- [ ] All cloud init images download successfully
- [ ] No changes to existing functionality
- [ ] Code works with current and future Ansible versions

View File

@@ -0,0 +1,105 @@
# Issue: Add Granular Tags for Better Control
**Status**: Open
**Priority**: Medium
**Component**: proxmox/tasks/main.yaml
**Assignee**: Junior Dev
## Description
The Proxmox role lacks granular tags, making it difficult to run specific parts of the role independently. Currently only has high-level `proxmox` tag.
## Current Limitation
```yaml
# Current tag structure
roles:
- role: proxmox
tags:
- proxmox
```
## Required Changes
### Step 1: Add tags to main task includes
Add specific tags to each major task group in `roles/proxmox/tasks/main.yaml`.
### Step 2: Update playbook to use new tags
Ensure playbooks can leverage the new tag structure.
## Implementation Steps
### Update roles/proxmox/tasks/main.yaml
```yaml
- name: Prepare Machines
ansible.builtin.include_tasks: 00_setup_machines.yaml
tags:
- proxmox:setup
- proxmox
- name: Create VM vault
ansible.builtin.include_tasks: 10_create_secrets.yaml
when: is_localhost
tags:
- proxmox:vault
- proxmox
- name: Prime node for VM
ansible.builtin.include_tasks: 40_prepare_vm_creation.yaml
when: is_proxmox_node
tags:
- proxmox:prepare
- proxmox
- name: Create VMs
ansible.builtin.include_tasks: 50_create_vms.yaml
when: is_localhost
tags:
- proxmox:vms
- proxmox
- name: Create LXC containers
ansible.builtin.include_tasks: 60_create_containers.yaml
when: is_localhost
tags:
- proxmox:containers
- proxmox
```
### Update individual task files
Add appropriate tags to tasks within each included file:
```yaml
# Example for 04_configure_hosts.yaml
- name: Configure /etc/hosts with Proxmox cluster nodes
ansible.builtin.blockinfile:
# ... existing content ...
tags:
- proxmox:setup
- proxmox:network
```
## Usage Examples
After implementation, users can run specific parts:
```bash
# Run only setup tasks
ansible-playbook playbooks/proxmox.yaml --tags "proxmox:setup"
# Run only VM creation
ansible-playbook playbooks/proxmox.yaml --tags "proxmox:vms"
# Run setup and preparation
ansible-playbook playbooks/proxmox.yaml --tags "proxmox:setup,proxmox:prepare"
```
## Testing Requirements
- Verify each tag group runs the correct subset of tasks
- Test tag combinations work properly
- Ensure backward compatibility with existing `proxmox` tag
## Acceptance Criteria
- [ ] Granular tags added to all major task groups
- [ ] Each functional area has its own tag
- [ ] Original `proxmox` tag still works for backward compatibility
- [ ] Documentation updated with tag usage examples
- [ ] All tags tested and working

View File

@@ -0,0 +1,125 @@
# Issue: Add Comprehensive Error Handling
**Status**: Open
**Priority**: High
**Component**: proxmox/tasks
**Assignee**: Junior Dev
## Description
The Proxmox role lacks comprehensive error handling, particularly for critical operations like API calls, vault operations, and file manipulations.
## Current Issues
- No error handling for Proxmox API failures
- No validation of VM/LXC configurations before creation
- No retries for network operations
- No cleanup on failure
## Required Changes
### Step 1: Add validation tasks
Validate configurations before attempting creation.
### Step 2: Add error handling blocks
Use `block/rescue/always` for critical operations.
### Step 3: Add retries for network operations
Use `retries` and `delay` for API calls.
## Implementation Steps
### Example 1: VM Creation with Error Handling
```yaml
- name: Create VM with error handling
block:
- name: Validate VM configuration
ansible.builtin.assert:
that:
- vm.vmid is defined
- vm.vmid | int > 0
- vm.node is defined
- vm.cores is defined and vm.cores | int > 0
- vm.memory is defined and vm.memory | int > 0
msg: "Invalid VM configuration for {{ vm.name }}"
- name: Create VM
community.proxmox.proxmox_kvm:
# ... existing parameters ...
register: vm_creation_result
retries: 3
delay: 10
until: vm_creation_result is not failed
rescue:
- name: Handle VM creation failure
ansible.builtin.debug:
msg: "Failed to create VM {{ vm.name }}: {{ ansible_failed_result.msg }}"
- name: Cleanup partial resources
# Add cleanup tasks here
when: cleanup_partial_resources | default(true)
always:
- name: Log VM creation attempt
ansible.builtin.debug:
msg: "VM creation attempt for {{ vm.name }} completed with status: {{ vm_creation_result is defined and vm_creation_result.changed | ternary('success', 'failed') }}"
```
### Example 2: API Call with Retries
```yaml
- name: Check Proxmox API availability
ansible.builtin.uri:
url: "https://{{ proxmox_api_host }}:8006/api2/json/version"
validate_certs: no
return_content: yes
register: api_check
retries: 5
delay: 5
until: api_check.status == 200
ignore_errors: yes
- name: Fail if API unavailable
ansible.builtin.fail:
msg: "Proxmox API unavailable at {{ proxmox_api_host }}"
when: api_check is failed
```
### Example 3: File Operation Error Handling
```yaml
- name: Manage vault file safely
block:
- name: Backup existing vault
ansible.builtin.copy:
src: "{{ proxmox_vault_file }}"
dest: "{{ proxmox_vault_file }}.backup"
remote_src: yes
when: vault_file_exists.stat.exists
- name: Perform vault operations
# ... vault operations ...
rescue:
- name: Restore vault from backup
ansible.builtin.copy:
src: "{{ proxmox_vault_file }}.backup"
dest: "{{ proxmox_vault_file }}"
remote_src: yes
when: vault_file_exists.stat.exists
- name: Fail with error details
ansible.builtin.fail:
msg: "Vault operation failed: {{ ansible_failed_result.msg }}"
```
## Testing Requirements
- Test error scenarios (invalid configs, API unavailable)
- Verify cleanup works on failure
- Confirm retries work for transient failures
- Validate error messages are helpful
## Acceptance Criteria
- [ ] All critical operations have error handling
- [ ] Validation added for configurations
- [ ] Retry logic implemented for network operations
- [ ] Cleanup procedures in place for failures
- [ ] Helpful error messages provided
- [ ] No silent failures

View File

@@ -0,0 +1,119 @@
# Issue: Add Performance Optimizations
**Status**: Open
**Priority**: Medium
**Component**: proxmox/tasks
**Assignee**: Junior Dev
## Description
The Proxmox role could benefit from performance optimizations, particularly for image downloads and repeated operations.
## Current Performance Issues
- Sequential image downloads (no parallelization)
- No caching of repeated operations
- No async operations for long-running tasks
- Inefficient fact gathering
## Required Changes
### Step 1: Add parallel downloads
Use async for image downloads to run concurrently.
### Step 2: Implement caching
Add fact caching for repeated operations.
### Step 3: Add conditional execution
Skip tasks when results are already present.
## Implementation Steps
### Example 1: Parallel Image Downloads
```yaml
- name: Download Cloud Init Isos in parallel
ansible.builtin.include_tasks: 42_download_isos.yaml
loop: "{{ proxmox_cloud_init_images | dict | map(attribute='value') }}"
loop_control:
loop_var: distro
async: 3600 # 1 hour timeout
poll: 0
register: download_tasks
- name: Check download status
ansible.builtin.async_status:
jid: "{{ item.ansible_job_id }}"
register: download_results
until: download_results.finished
retries: 30
delay: 10
loop: "{{ download_tasks.results }}"
loop_control:
loop_var: item
```
### Example 2: Add Fact Caching
```yaml
# In ansible.cfg or playbook
[defaults]
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts
fact_caching_timeout = 86400
# In tasks
- name: Gather facts with caching
ansible.builtin.setup:
cacheable: yes
```
### Example 3: Conditional Task Execution
```yaml
- name: Check if image already exists
ansible.builtin.stat:
path: "{{ proxmox_dirs.isos }}/{{ distro.name }}"
register: image_stat
changed_when: false
- name: Download image only if missing
ansible.builtin.get_url:
url: "{{ distro.url }}"
dest: "{{ proxmox_dirs.isos }}/{{ distro.name }}"
mode: "0644"
when: not image_stat.stat.exists
register: download_result
- name: Skip conversion if raw image exists
ansible.builtin.stat:
path: "{{ proxmox_dirs.isos }}/{{ raw_image_name }}"
register: raw_image_stat
changed_when: false
- name: Convert to raw only if needed
ansible.builtin.command:
cmd: "qemu-img convert -O raw {{ proxmox_dirs.isos }}/{{ distro.name }} {{ proxmox_dirs.isos }}/{{ raw_image_name }}"
when:
- download_result is changed or not raw_image_stat.stat.exists
- image_stat.stat.exists
```
### Example 4: Batch VM Operations
```yaml
- name: Create VMs in batches
ansible.builtin.include_tasks: 55_create_vm.yaml
loop: "{{ vms | batch(3) | flatten }}"
loop_control:
loop_var: "vm"
throttle: 3
```
## Testing Requirements
- Measure performance before and after changes
- Verify parallel operations don't cause conflicts
- Test caching works correctly
- Confirm conditional execution skips appropriately
## Acceptance Criteria
- [ ] Image downloads run in parallel
- [ ] Fact caching implemented and working
- [ ] Tasks skip when results already exist
- [ ] Performance metrics show improvement
- [ ] No race conditions in parallel operations
- [ ] Documentation updated with performance notes

18
playbooks/kube-vip.yaml Normal file
View File

@@ -0,0 +1,18 @@
---
# Deploys kube-vip on all k3s server nodes and adds the VIP to their TLS SANs.
#
# Migration steps (run once):
# 1. ansible-playbook playbooks/kube-vip.yaml
# 2. Update DNS: k3s.seyshiro.de → 192.168.20.2
# 3. Verify: kubectl get nodes (should work via VIP)
# 4. Decommission k3s-loadbalancer VM when satisfied
#
# The playbook is idempotent — re-running it after migration is safe.
- name: Deploy kube-vip on k3s server nodes
hosts: k3s_server
gather_facts: true
serial: 1
roles:
- role: kube_vip
tags:
- kube_vip

View File

@@ -0,0 +1,77 @@
# Reconstructed via infocmp from file: /usr/lib/kitty/terminfo/./x/xterm-kitty
xterm-kitty|KovIdTTY,
am, bw, ccc, hs, km, mc5i, mir, msgr, npc, xenl, Su, Tc, XF, fullkbd,
colors#0x100, cols#80, it#8, lines#24, pairs#0x7fff,
acsc=++\,\,--..00``aaffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
clear=\E[H\E[2J, cnorm=\E[?12h\E[?25h, cr=\r,
csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H,
cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C,
cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\E[A,
cvvis=\E[?12;25h, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m,
dl=\E[%p1%dM, dl1=\E[M, dsl=\E]2;\E\\, ech=\E[%p1%dX,
ed=\E[J, el=\E[K, el1=\E[1K, flash=\E[?5h$<100/>\E[?5l,
fsl=^G, home=\E[H, hpa=\E[%i%p1%dG, ht=^I, hts=\EH,
ich=\E[%p1%d@, il=\E[%p1%dL, il1=\E[L, ind=\n,
indn=\E[%p1%dS,
initc=\E]4;%p1%d;rgb:%p2%{255}%*%{1000}%/%2.2X/%p3%{255}%*%{1000}%/%2.2X/%p4%{255}%*%{1000}%/%2.2X\E\\,
kBEG=\E[1;2E, kDC=\E[3;2~, kEND=\E[1;2F, kHOM=\E[1;2H,
kIC=\E[2;2~, kLFT=\E[1;2D, kNXT=\E[6;2~, kPRV=\E[5;2~,
kRIT=\E[1;2C, kbeg=\EOE, kbs=^?, kcbt=\E[Z, kcub1=\EOD,
kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kdch1=\E[3~, kend=\EOF,
kf1=\EOP, kf10=\E[21~, kf11=\E[23~, kf12=\E[24~,
kf13=\E[1;2P, kf14=\E[1;2Q, kf15=\E[13;2~, kf16=\E[1;2S,
kf17=\E[15;2~, kf18=\E[17;2~, kf19=\E[18;2~, kf2=\EOQ,
kf20=\E[19;2~, kf21=\E[20;2~, kf22=\E[21;2~,
kf23=\E[23;2~, kf24=\E[24;2~, kf25=\E[1;5P, kf26=\E[1;5Q,
kf27=\E[13;5~, kf28=\E[1;5S, kf29=\E[15;5~, kf3=\EOR,
kf30=\E[17;5~, kf31=\E[18;5~, kf32=\E[19;5~,
kf33=\E[20;5~, kf34=\E[21;5~, kf35=\E[23;5~,
kf36=\E[24;5~, kf37=\E[1;6P, kf38=\E[1;6Q, kf39=\E[13;6~,
kf4=\EOS, kf40=\E[1;6S, kf41=\E[15;6~, kf42=\E[17;6~,
kf43=\E[18;6~, kf44=\E[19;6~, kf45=\E[20;6~,
kf46=\E[21;6~, kf47=\E[23;6~, kf48=\E[24;6~,
kf49=\E[1;3P, kf5=\E[15~, kf50=\E[1;3Q, kf51=\E[13;3~,
kf52=\E[1;3S, kf53=\E[15;3~, kf54=\E[17;3~,
kf55=\E[18;3~, kf56=\E[19;3~, kf57=\E[20;3~,
kf58=\E[21;3~, kf59=\E[23;3~, kf6=\E[17~, kf60=\E[24;3~,
kf61=\E[1;4P, kf62=\E[1;4Q, kf63=\E[13;4~, kf7=\E[18~,
kf8=\E[19~, kf9=\E[20~, khome=\EOH, kich1=\E[2~,
kind=\E[1;2B, kmous=\E[M, knp=\E[6~, kpp=\E[5~,
kri=\E[1;2A, oc=\E]104\007, op=\E[39;49m, rc=\E8,
rep=%p1%c\E[%p2%{1}%-%db, rev=\E[7m, ri=\EM,
rin=\E[%p1%dT, ritm=\E[23m, rmacs=\E(B, rmam=\E[?7l,
rmcup=\E[?1049l, rmir=\E[4l, rmkx=\E[?1l, rmso=\E[27m,
rmul=\E[24m, rs1=\E]\E\\\Ec, sc=\E7,
setab=\E[%?%p1%{8}%<%t4%p1%d%e%p1%{16}%<%t10%p1%{8}%-%d%e48;5;%p1%d%;m,
setaf=\E[%?%p1%{8}%<%t3%p1%d%e%p1%{16}%<%t9%p1%{8}%-%d%e38;5;%p1%d%;m,
sgr=%?%p9%t\E(0%e\E(B%;\E[0%?%p6%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%?%p7%t;8%;%?%p5%t;2%;m,
sgr0=\E(B\E[m, sitm=\E[3m, smacs=\E(0, smam=\E[?7h,
smcup=\E[?1049h, smir=\E[4h, smkx=\E[?1h, smso=\E[7m,
smul=\E[4m, tbc=\E[3g, tsl=\E]2;, u6=\E[%i%d;%dR, u7=\E[6n,
u8=\E[?%[;0123456789]c, u9=\E[c, vpa=\E[%i%p1%dd,
BD=\E[?2004l, BE=\E[?2004h, Cr=\E]112\007,
Cs=\E]12;%p1%s\007, Ms=\E]52;%p1%s;%p2%s\E\\,
PE=\E[201~, PS=\E[200~, RV=\E[>c, Se=\E[2 q,
Setulc=\E[58:2:%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m,
Smulx=\E[4:%p1%dm, Ss=\E[%p1%d q, Sync=\EP=%p1%ds\E\\,
XR=\E[>0q, fd=\E[?1004l, fe=\E[?1004h, kBEG3=\E[1;3E,
kBEG4=\E[1;4E, kBEG5=\E[1;5E, kBEG6=\E[1;6E,
kBEG7=\E[1;7E, kDC3=\E[3;3~, kDC4=\E[3;4~, kDC5=\E[3;5~,
kDC6=\E[3;6~, kDC7=\E[3;7~, kDN=\E[1;2B, kDN3=\E[1;3B,
kDN4=\E[1;4B, kDN5=\E[1;5B, kDN6=\E[1;6B, kDN7=\E[1;7B,
kEND3=\E[1;3F, kEND4=\E[1;4F, kEND5=\E[1;5F,
kEND6=\E[1;6F, kEND7=\E[1;7F, kHOM3=\E[1;3H,
kHOM4=\E[1;4H, kHOM5=\E[1;5H, kHOM6=\E[1;6H,
kHOM7=\E[1;7H, kIC3=\E[2;3~, kIC4=\E[2;4~, kIC5=\E[2;5~,
kIC6=\E[2;6~, kIC7=\E[2;7~, kLFT3=\E[1;3D, kLFT4=\E[1;4D,
kLFT5=\E[1;5D, kLFT6=\E[1;6D, kLFT7=\E[1;7D,
kNXT3=\E[6;3~, kNXT4=\E[6;4~, kNXT5=\E[6;5~,
kNXT6=\E[6;6~, kNXT7=\E[6;7~, kPRV3=\E[5;3~,
kPRV4=\E[5;4~, kPRV5=\E[5;5~, kPRV6=\E[5;6~,
kPRV7=\E[5;7~, kRIT3=\E[1;3C, kRIT4=\E[1;4C,
kRIT5=\E[1;5C, kRIT6=\E[1;6C, kRIT7=\E[1;7C, kUP=\E[1;2A,
kUP3=\E[1;3A, kUP4=\E[1;4A, kUP5=\E[1;5A, kUP6=\E[1;6A,
kUP7=\E[1;7A, kxIN=\E[I, kxOUT=\E[O, rmxx=\E[29m,
setrgbb=\E[48:2:%p1%d:%p2%d:%p3%dm,
setrgbf=\E[38:2:%p1%d:%p2%d:%p3%dm, smxx=\E[9m,

View File

@@ -4,3 +4,9 @@
name: sshd name: sshd
state: restarted state: restarted
become: true become: true
- name: Restart timesyncd
ansible.builtin.systemd:
name: systemd-timesyncd
state: restarted
become: true

View File

@@ -22,3 +22,16 @@
- name: Compile ghostty terminalinfo - name: Compile ghostty terminalinfo
ansible.builtin.command: "tic -x {{ ansible_env.HOME }}/ghostty" ansible.builtin.command: "tic -x {{ ansible_env.HOME }}/ghostty"
when: ghostty_terminfo.changed when: ghostty_terminfo.changed
- name: Copy kitty infocmp
ansible.builtin.copy:
src: files/kitty/infocmp
dest: "{{ ansible_env.HOME }}/kitty"
owner: "{{ ansible_user_id }}"
group: "{{ ansible_user_id }}"
mode: "0644"
register: kitty_terminfo
- name: Compile kitty terminalinfo
ansible.builtin.command: "tic -x {{ ansible_env.HOME }}/kitty"
when: kitty_terminfo.changed

View File

@@ -9,3 +9,26 @@
community.general.timezone: community.general.timezone:
name: "{{ timezone }}" name: "{{ timezone }}"
when: ansible_user_id == "root" when: ansible_user_id == "root"
- name: Configure NTP servers for systemd-timesyncd
ansible.builtin.lineinfile:
path: /etc/systemd/timesyncd.conf
regexp: "^#?NTP="
line: "NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org"
become: true
notify: Restart timesyncd
- name: Enable and start systemd-timesyncd
ansible.builtin.systemd:
name: systemd-timesyncd
enabled: true
state: started
become: true
when: ansible_user_id != "root"
- name: Enable and start systemd-timesyncd
ansible.builtin.systemd:
name: systemd-timesyncd
enabled: true
state: started
when: ansible_user_id == "root"

View File

@@ -33,7 +33,6 @@
opts: defaults,nolock,_netdev,auto,bg opts: defaults,nolock,_netdev,auto,bg
state: mounted state: mounted
loop: loop:
- /media/docker
- /media/series - /media/series
- /media/movies - /media/movies
- /media/songs - /media/songs

View File

@@ -0,0 +1,715 @@
# Edge VPS Ansible Role Implementation Plan
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Create a modular Ansible role for deploying edge VPS infrastructure components (WireGuard, Traefik, Pangolin, Elastic Agent).
**Architecture:** Modular task-based role following existing patterns in the repository. Each component has its own numbered task file. Configs are templated with secrets from ansible-vault encrypted group_vars.
**Tech Stack:** Ansible, Jinja2 templates, Docker Compose, WireGuard, Traefik, Pangolin, Elastic Fleet Agent
---
### Task 1: Create Role Directory Structure
**Files:**
- Create: `roles/edge_vps/tasks/main.yaml`
- Create: `roles/edge_vps/handlers/main.yaml`
- Create: `roles/edge_vps/defaults/main.yaml`
- Create: `roles/edge_vps/templates/` directory structure
**Step 1: Create directory structure**
Run:
```bash
mkdir -p tasks handlers defaults templates/wireguard templates/traefik templates/pangolin templates/elastic-agent
```
**Step 2: Create defaults/main.yaml**
```yaml
---
edge_vps_config_base: /root/config
edge_vps_wireguard_config_dir: /etc/wireguard
edge_vps_wireguard_interface: wg0
edge_vps_wireguard_address: "10.133.7.1/24"
edge_vps_wireguard_port: 61975
edge_vps_traefik_config_dir: "{{ edge_vps_config_base }}/traefik"
edge_vps_traefik_logs_dir: "{{ edge_vps_traefik_config_dir }}/logs"
edge_vps_pangolin_config_dir: "{{ edge_vps_config_base }}/pangolin"
edge_vps_elastic_config_dir: "{{ edge_vps_config_base }}/elastic-agent"
edge_vps_elastic_state_dir: /var/lib/elastic-agent/elastic-system/elastic-agent/state
```
**Step 3: Create handlers/main.yaml**
```yaml
---
- name: Restart wireguard
ansible.builtin.systemd:
name: "wg-quick@{{ edge_vps_wireguard_interface }}"
state: restarted
listen: restart wireguard
- name: Restart traefik
ansible.builtin.command:
cmd: docker compose restart
chdir: "{{ edge_vps_traefik_config_dir }}"
listen: restart traefik
```
**Step 4: Commit**
```bash
git add defaults/main.yaml handlers/main.yaml
git commit -m "feat(edge_vps): add role structure and handlers"
```
---
### Task 2: Create Directory Setup Task
**Files:**
- Create: `roles/edge_vps/tasks/10_directories.yaml`
**Step 1: Create 10_directories.yaml**
```yaml
---
- name: Create config base directory
ansible.builtin.file:
path: "{{ edge_vps_config_base }}"
state: directory
mode: "0755"
- name: Create Traefik directories
ansible.builtin.file:
path: "{{ item }}"
state: directory
mode: "0755"
loop:
- "{{ edge_vps_traefik_config_dir }}"
- "{{ edge_vps_traefik_logs_dir }}"
- name: Create Pangolin config directory
ansible.builtin.file:
path: "{{ edge_vps_pangolin_config_dir }}"
state: directory
mode: "0755"
- name: Create Elastic Agent directories
ansible.builtin.file:
path: "{{ item }}"
state: directory
mode: "0755"
loop:
- "{{ edge_vps_elastic_config_dir }}"
- "{{ edge_vps_elastic_state_dir }}"
```
**Step 2: Commit**
```bash
git add tasks/10_directories.yaml
git commit -m "feat(edge_vps): add directory setup task"
```
---
### Task 3: Create WireGuard Task and Template
**Files:**
- Create: `roles/edge_vps/tasks/20_wireguard.yaml`
- Create: `roles/edge_vps/templates/wireguard/wg0.conf.j2`
**Step 1: Create templates/wireguard/wg0.conf.j2**
```jinja2
[Interface]
Address = {{ edge_vps_wireguard_address }}
ListenPort = {{ edge_vps_wireguard_port }}
PrivateKey = {{ vault_edge_vps.wireguard.private_key }}
PostUp = sysctl -w net.ipv4.ip_forward=1
PostUp = iptables -A FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
PostUp = iptables -A FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
{% for route in edge_vps_wireguard_routes | default([]) %}
PostUp = ip route add {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
{% endfor %}
PostDown = iptables -D FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
PostDown = iptables -D FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
{% for route in edge_vps_wireguard_routes | default([]) %}
PostDown = ip route del {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
{% endfor %}
{% for peer in vault_edge_vps.wireguard.peers %}
[Peer]
# {{ peer.name }}
PublicKey = {{ peer.public_key }}
PresharedKey = {{ peer.preshared_key }}
AllowedIPs = {{ peer.allowed_ips }}
{% endfor %}
```
**Step 2: Create tasks/20_wireguard.yaml**
```yaml
---
- name: Install WireGuard
ansible.builtin.apt:
name: wireguard
state: present
update_cache: true
- name: Deploy WireGuard config
ansible.builtin.template:
src: wireguard/wg0.conf.j2
dest: "{{ edge_vps_wireguard_config_dir }}/{{ edge_vps_wireguard_interface }}.conf"
mode: "0600"
notify: restart wireguard
- name: Enable WireGuard
ansible.builtin.systemd:
name: "wg-quick@{{ edge_vps_wireguard_interface }}"
enabled: true
state: started
```
**Step 3: Commit**
```bash
git add tasks/20_wireguard.yaml templates/wireguard/wg0.conf.j2
git commit -m "feat(edge_vps): add WireGuard setup task and template"
```
---
### Task 4: Create Traefik Task and Template
**Files:**
- Create: `roles/edge_vps/tasks/30_traefik.yaml`
- Create: `roles/edge_vps/templates/traefik/traefik_config.yml.j2`
**Step 1: Create templates/traefik/traefik_config.yml.j2**
```jinja2
api:
insecure: true
dashboard: true
providers:
http:
endpoint: "http://pangolin:3001/api/v1/traefik-config"
pollInterval: "5s"
file:
filename: "/etc/traefik/dynamic_config.yml"
experimental:
plugins:
badger:
moduleName: "github.com/fosrl/badger"
version: "v1.2.1"
log:
level: "INFO"
format: "common"
maxSize: 100
maxBackups: 3
maxAge: 3
compress: true
certificatesResolvers:
letsencrypt:
acme:
dnsChallenge:
provider: "cloudflare"
email: "{{ edge_vps_acme_email }}"
storage: "/letsencrypt/acme.json"
caServer: "https://acme-v02.api.letsencrypt.org/directory"
entryPoints:
web:
address: ":80"
websecure:
address: ":443"
transport:
respondingTimeouts:
readTimeout: "30m"
http:
tls:
certResolver: "letsencrypt"
tcp-6443:
address: ":6443/tcp"
serversTransport:
insecureSkipVerify: true
ping:
entryPoint: "web"
accessLog:
filePath: "/var/log/traefik/access.log"
format: common
```
**Step 2: Create tasks/30_traefik.yaml**
```yaml
---
- name: Deploy Traefik config
ansible.builtin.template:
src: traefik/traefik_config.yml.j2
dest: "{{ edge_vps_traefik_config_dir }}/traefik_config.yml"
mode: "0644"
notify: restart traefik
- name: Deploy Cloudflare credentials for ACME
ansible.builtin.copy:
content: |
CF_DNS_API_TOKEN={{ vault_edge_vps.traefik.cloudflare_api_token }}
dest: "{{ edge_vps_traefik_config_dir }}/cloudflare.env"
mode: "0600"
no_log: true
```
**Step 3: Commit**
```bash
git add tasks/30_traefik.yaml templates/traefik/traefik_config.yml.j2
git commit -m "feat(edge_vps): add Traefik setup task and template"
```
---
### Task 5: Create Pangolin Task and Templates
**Files:**
- Create: `roles/edge_vps/tasks/40_pangolin.yaml`
- Create: `roles/edge_vps/templates/pangolin/config.yml.j2`
- Create: `roles/edge_vps/templates/pangolin/docker-compose.yml.j2`
**Step 1: Create templates/pangolin/config.yml.j2**
```jinja2
gerbil:
start_port: 51820
base_endpoint: "{{ edge_vps_pangolin_base_endpoint }}"
app:
dashboard_url: "{{ edge_vps_pangolin_dashboard_url }}"
log_level: "info"
telemetry:
anonymous_usage: true
domains:
domain1:
base_domain: "{{ edge_vps_pangolin_base_domain }}"
server:
secret: "{{ vault_edge_vps.pangolin.server_secret }}"
cors:
origins: ["{{ edge_vps_pangolin_dashboard_url }}"]
methods: ["GET", "POST", "PUT", "DELETE", "PATCH"]
allowed_headers: ["X-CSRF-Token", "Content-Type"]
credentials: false
maxmind_db_path: "./config/GeoLite2-Country.mmdb"
flags:
require_email_verification: false
disable_signup_without_invite: true
disable_user_create_org: false
allow_raw_resources: true
```
**Step 2: Create templates/pangolin/docker-compose.yml.j2**
```yaml
services:
pangolin:
image: fosrl/pangolin:latest
container_name: pangolin
restart: unless-stopped
ports:
- "3001:3001"
- "443:443"
- "80:80"
volumes:
- ./config.yml:/app/config/config.yml:ro
- ./letsencrypt:/letsencrypt
depends_on:
- gerbil
gerbil:
image: fosrl/gerbil:latest
container_name: gerbil
restart: unless-stopped
network_mode: host
cap_add:
- NET_ADMIN
- SYS_MODULE
volumes:
- /lib/modules:/lib/modules
```
**Step 3: Create tasks/40_pangolin.yaml**
```yaml
---
- name: Deploy Pangolin config
ansible.builtin.template:
src: pangolin/config.yml.j2
dest: "{{ edge_vps_pangolin_config_dir }}/config.yml"
mode: "0644"
notify: restart pangolin
- name: Deploy Pangolin docker-compose
ansible.builtin.template:
src: pangolin/docker-compose.yml.j2
dest: "{{ edge_vps_pangolin_config_dir }}/docker-compose.yml"
mode: "0644"
- name: Create letsencrypt directory for Pangolin
ansible.builtin.file:
path: "{{ edge_vps_pangolin_config_dir }}/letsencrypt"
state: directory
mode: "0755"
- name: Start Pangolin
community.docker.docker_compose_v2:
project_src: "{{ edge_vps_pangolin_config_dir }}"
state: present
```
**Step 4: Commit**
```bash
git add tasks/40_pangolin.yaml templates/pangolin/
git commit -m "feat(edge_vps): add Pangolin setup task and templates"
```
---
### Task 6: Create Elastic Agent Task and Templates
**Files:**
- Create: `roles/edge_vps/tasks/50_elastic_agent.yaml`
- Create: `roles/edge_vps/templates/elastic-agent/docker-compose.yml.j2`
- Create: `roles/edge_vps/templates/elastic-agent/elastic-agent.yml.j2`
**Step 1: Create templates/elastic-agent/elastic-agent.yml.j2**
```yaml
fleet:
enabled: true
```
**Step 2: Create templates/elastic-agent/docker-compose.yml.j2**
```yaml
services:
elastic-agent:
image: docker.elastic.co/elastic-agent/elastic-agent:{{ edge_vps_elastic_version }}
container_name: elastic-agent
restart: always
network_mode: host
dns:
- {{ edge_vps_elastic_dns_server }}
dns_search:
- elastic-system.svc.cluster.local
- svc.cluster.local
- cluster.local
user: "0:0"
privileged: true
entrypoint: ["/usr/bin/env", "bash", "-c"]
command:
- |
set -e
if [[ -f /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt ]]; then
if [[ -f /usr/bin/update-ca-trust ]]; then
cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /etc/pki/ca-trust/source/anchors/
/usr/bin/update-ca-trust
elif [[ -f /usr/sbin/update-ca-certificates ]]; then
cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /usr/local/share/ca-certificates/
/usr/sbin/update-ca-certificates
fi
fi
exec /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e -c /etc/agent/elastic-agent.yml
environment:
- FLEET_CA=/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt
- FLEET_ENROLL=true
- FLEET_ENROLLMENT_TOKEN={{ vault_edge_vps.elastic.fleet_enrollment_token }}
- FLEET_URL={{ edge_vps_elastic_fleet_url }}
- STATE_PATH=/usr/share/elastic-agent/state
- CONFIG_PATH=/usr/share/elastic-agent/state
- NODE_NAME={{ inventory_hostname }}
volumes:
- {{ edge_vps_elastic_state_dir }}:/usr/share/elastic-agent/state
- ./elastic-agent.yml:/etc/agent/elastic-agent.yml:ro
- ./elasticsearch-ca.crt:/mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt:ro
- ./fleet-ca.crt:/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt:ro
- {{ edge_vps_traefik_logs_dir }}:/var/log/traefik:ro
```
**Step 3: Create tasks/50_elastic_agent.yaml**
```yaml
---
- name: Deploy Elastic Agent config
ansible.builtin.template:
src: elastic-agent/elastic-agent.yml.j2
dest: "{{ edge_vps_elastic_config_dir }}/elastic-agent.yml"
mode: "0644"
- name: Deploy Elastic Agent docker-compose
ansible.builtin.template:
src: elastic-agent/docker-compose.yml.j2
dest: "{{ edge_vps_elastic_config_dir }}/docker-compose.yml"
mode: "0644"
- name: Deploy Elasticsearch CA certificate
ansible.builtin.copy:
src: elastic-agent/elasticsearch-ca.crt
dest: "{{ edge_vps_elastic_config_dir }}/elasticsearch-ca.crt"
mode: "0644"
- name: Deploy Fleet CA certificate
ansible.builtin.copy:
src: elastic-agent/fleet-ca.crt
dest: "{{ edge_vps_elastic_config_dir }}/fleet-ca.crt"
mode: "0644"
- name: Start Elastic Agent
community.docker.docker_compose_v2:
project_src: "{{ edge_vps_elastic_config_dir }}"
state: present
```
**Step 4: Commit**
```bash
git add tasks/50_elastic_agent.yaml templates/elastic-agent/
git commit -m "feat(edge_vps): add Elastic Agent setup task and templates"
```
---
### Task 7: Create Main Task Orchestrator
**Files:**
- Create: `roles/edge_vps/tasks/main.yaml`
**Step 1: Create tasks/main.yaml**
```yaml
---
- name: Setup directories
ansible.builtin.include_tasks: 10_directories.yaml
- name: Setup WireGuard
ansible.builtin.include_tasks: 20_wireguard.yaml
- name: Setup Traefik
ansible.builtin.include_tasks: 30_traefik.yaml
- name: Setup Pangolin
ansible.builtin.include_tasks: 40_pangolin.yaml
- name: Setup Elastic Agent
ansible.builtin.include_tasks: 50_elastic_agent.yaml
```
**Step 2: Commit**
```bash
git add tasks/main.yaml
git commit -m "feat(edge_vps): add main task orchestrator"
```
---
### Task 8: Create Inventory Variables
**Files:**
- Create: `vars/group_vars/vps/vars.yaml`
- Create: `vars/group_vars/vps/secrets.yaml`
**Step 1: Create vars/group_vars/vps/vars.yaml**
```yaml
edge_vps_wireguard_address: "10.133.7.1/24"
edge_vps_wireguard_port: 61975
edge_vps_wireguard_routes:
- network: "10.43.0.0/16"
gateway: "10.133.7.4"
edge_vps_pangolin_dashboard_url: "https://pangolin.seyshiro.de"
edge_vps_pangolin_base_endpoint: "pangolin.seyshiro.de"
edge_vps_pangolin_base_domain: "seyshiro.de"
edge_vps_acme_email: "me+acme@tudattr.dev"
edge_vps_elastic_version: "9.2.2"
edge_vps_elastic_dns_server: "10.43.0.10"
edge_vps_elastic_fleet_url: "https://fleet-server-agent-http.elastic-system.svc:8220"
```
**Step 2: Create vars/group_vars/vps/secrets.yaml (template)**
```yaml
vault_edge_vps:
wireguard:
private_key: "YOUR_WIREGUARD_PRIVATE_KEY"
peers:
- name: lilcrow
public_key: "PEER_PUBLIC_KEY"
preshared_key: "PEER_PRESHARED_KEY"
allowed_ips: "10.133.7.2/32"
- name: homelab
public_key: "PEER_PUBLIC_KEY"
preshared_key: "PEER_PRESHARED_KEY"
allowed_ips: "10.133.7.3/32"
- name: k3s
public_key: "PEER_PUBLIC_KEY"
preshared_key: "PEER_PRESHARED_KEY"
allowed_ips: "10.133.7.4/32, 10.43.0.0/16"
pangolin:
server_secret: "YOUR_PANGOLIN_SERVER_SECRET"
traefik:
cloudflare_api_token: "YOUR_CLOUDFLARE_API_TOKEN"
elastic:
fleet_enrollment_token: "YOUR_FLEET_ENROLLMENT_TOKEN"
```
**Step 3: Encrypt secrets file**
Run:
```bash
ansible-vault encrypt vars/group_vars/vps/secrets.yaml
```
**Step 4: Commit**
```bash
git add vars/group_vars/vps/
git commit -m "feat(edge_vps): add inventory variables for VPS group"
```
---
### Task 9: Update README
**Files:**
- Modify: `roles/edge_vps/README.md`
**Step 1: Update README.md**
```markdown
# Edge VPS
Configures edge VPS instances with WireGuard VPN, Traefik reverse proxy, Pangolin, and Elastic Fleet Agent.
## Requirements
- Docker and Docker Compose installed
- Ansible community.docker collection
## Role Variables
### WireGuard
| Variable | Default | Description |
|----------|---------|-------------|
| `edge_vps_wireguard_address` | `10.133.7.1/24` | WireGuard interface address |
| `edge_vps_wireguard_port` | `61975` | WireGuard listen port |
| `edge_vps_wireguard_interface` | `wg0` | WireGuard interface name |
| `edge_vps_wireguard_routes` | `[]` | List of routes to add (network, gateway) |
### Traefik
| Variable | Default | Description |
|----------|---------|-------------|
| `edge_vps_traefik_config_dir` | `/root/config/traefik` | Traefik config directory |
| `edge_vps_acme_email` | - | Email for Let's Encrypt |
### Pangolin
| Variable | Default | Description |
|----------|---------|-------------|
| `edge_vps_pangolin_dashboard_url` | - | Pangolin dashboard URL |
| `edge_vps_pangolin_base_endpoint` | - | Pangolin base endpoint |
| `edge_vps_pangolin_base_domain` | - | Base domain for Pangolin |
### Elastic Agent
| Variable | Default | Description |
|----------|---------|-------------|
| `edge_vps_elastic_version` | `9.2.2` | Elastic Agent version |
| `edge_vps_elastic_fleet_url` | - | Fleet server URL |
| `edge_vps_elastic_dns_server` | `10.43.0.10` | DNS server for agent |
## Secrets
Store secrets in `vars/group_vars/vps/secrets.yaml` (ansible-vault encrypted):
```yaml
vault_edge_vps:
wireguard:
private_key: "..."
peers: [...]
pangolin:
server_secret: "..."
traefik:
cloudflare_api_token: "..."
elastic:
fleet_enrollment_token: "..."
```
## Dependencies
None.
## Example Playbook
```yaml
- hosts: vps
roles:
- role: edge_vps
```
## License
MIT
```
**Step 2: Commit**
```bash
git add README.md
git commit -m "docs(edge_vps): update README with role documentation"
```
---
### Task 10: Move Certificate Files
**Files:**
- Move: `files/agent/agent/elasticsearch-ca.crt``files/elastic-agent/`
- Move: `files/agent/agent/fleet-ca.crt``files/elastic-agent/`
**Step 1: Move certificate files**
Run:
```bash
mkdir -p files/elastic-agent
mv files/agent/agent/elasticsearch-ca.crt files/elastic-agent/
mv files/agent/agent/fleet-ca.crt files/elastic-agent/
rm -rf files/agent
```
**Step 2: Commit**
```bash
git add files/
git commit -m "refactor(edge_vps): reorganize certificate files"
```

View File

@@ -24,6 +24,6 @@
ansible.builtin.command: | ansible.builtin.command: |
/tmp/k3s_install.sh /tmp/k3s_install.sh
environment: environment:
K3S_URL: "https://{{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }}:{{ k3s.loadbalancer.default_port }}" K3S_URL: "https://{{ k3s_vip }}:{{ k3s.loadbalancer.default_port }}"
K3S_TOKEN: "{{ k3s_token }}" K3S_TOKEN: "{{ k3s_token }}"
become: true become: true

View File

@@ -46,7 +46,7 @@
- name: Add K3s cluster to kubeconfig - name: Add K3s cluster to kubeconfig
ansible.builtin.command: > ansible.builtin.command: >
kubectl config set-cluster "{{ k3s_cluster_name }}" kubectl config set-cluster "{{ k3s_cluster_name }}"
--server="https://{{ k3s_server_name }}:6443" --server="https://{{ k3s_vip }}:6443"
--certificate-authority=/tmp/k3s-ca.crt --certificate-authority=/tmp/k3s-ca.crt
--embed-certs=true --embed-certs=true
environment: environment:

View File

@@ -1,11 +1,12 @@
--- ---
- name: Install dependencies for apt to use repositories over HTTPS - name: Install dependencies
ansible.builtin.apt: ansible.builtin.apt:
name: "{{ item }}" name: "{{ item }}"
state: present state: present
update_cache: true update_cache: true
loop: loop:
- qemu-guest-agent - qemu-guest-agent
- etcd-client
become: true become: true
- name: See if k3s file exists - name: See if k3s file exists
@@ -15,15 +16,29 @@
- name: Install primary k3s server - name: Install primary k3s server
include_tasks: primary_installation.yaml include_tasks: primary_installation.yaml
when: ansible_default_ipv4.address == k3s_primary_server_ip when:
- inventory_hostname == groups['k3s_server'] | first
- not k3s_status.stat.exists
- name: Get token from primary k3s server - name: Get token from primary k3s server
include_tasks: pull_token.yaml include_tasks: pull_token.yaml
- name: Install seconary k3s servers - name: Install seconary k3s servers
include_tasks: secondary_installation.yaml include_tasks: secondary_installation.yaml
when: ansible_default_ipv4.address != k3s_primary_server_ip when:
- inventory_hostname != groups['k3s_server'] | first
- not k3s_status.stat.exists
- name: Set kubeconfig on localhost - name: Set kubeconfig on localhost
include_tasks: create_kubeconfig.yaml include_tasks: create_kubeconfig.yaml
when: ansible_default_ipv4.address == k3s_primary_server_ip when: inventory_hostname == groups['k3s_server'] | first
- name: Persist control-plane NoSchedule taint in k3s config
ansible.builtin.blockinfile:
path: /etc/rancher/k3s/config.yaml
create: true
marker: "# {mark} ANSIBLE MANAGED control-plane taint"
block: |
node-taint:
- "node-role.kubernetes.io/control-plane:NoSchedule"
become: true

View File

@@ -8,7 +8,7 @@
- name: Install K3s server with and TLS SAN - name: Install K3s server with and TLS SAN
ansible.builtin.command: | ansible.builtin.command: |
/tmp/k3s_install.sh server \ /tmp/k3s_install.sh server \
--cluster-init --cluster-init \
--tls-san {{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }} \ --tls-san {{ k3s_vip }} \
--tls-san {{ k3s_server_name }} --tls-san {{ k3s_server_name }}
become: true become: true

View File

@@ -1,15 +1,15 @@
- name: Get K3s token from the first server - name: Get K3s token from the primary server
when: ansible_default_ipv4.address == k3s_primary_server_ip
ansible.builtin.slurp: ansible.builtin.slurp:
src: /var/lib/rancher/k3s/server/node-token src: /var/lib/rancher/k3s/server/node-token
register: k3s_token register: k3s_token_raw
delegate_to: "{{ groups['k3s_server'] | first }}"
run_once: true
become: true become: true
- name: Set fact on k3s_primary_server_ip - name: Set k3s_token fact
ansible.builtin.set_fact: ansible.builtin.set_fact:
k3s_token: "{{ k3s_token['content'] | b64decode | trim }}" k3s_token: "{{ k3s_token_raw['content'] | b64decode | trim }}"
when: run_once: true
- ansible_default_ipv4.address == k3s_primary_server_ip
- name: Write K3s token to local file for encryption - name: Write K3s token to local file for encryption
ansible.builtin.copy: ansible.builtin.copy:

View File

@@ -13,8 +13,8 @@
- name: Install K3s on the secondary servers - name: Install K3s on the secondary servers
ansible.builtin.command: | ansible.builtin.command: |
/tmp/k3s_install.sh \ /tmp/k3s_install.sh \
--server "https://{{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }}:{{ k3s.loadbalancer.default_port }}" \ --server "https://{{ k3s_vip }}:{{ k3s.loadbalancer.default_port }}" \
--tls-san {{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }} \ --tls-san {{ k3s_vip }} \
--tls-san {{ k3s_server_name }} --tls-san {{ k3s_server_name }}
environment: environment:
K3S_TOKEN: "{{ k3s_token_vault.k3s_token }}" K3S_TOKEN: "{{ k3s_token_vault.k3s_token }}"

View File

@@ -0,0 +1,61 @@
---
- name: Remove stale static pod manifest if present
ansible.builtin.file:
path: "{{ kube_vip_static_pod_path }}"
state: absent
become: true
- name: Ensure k3s server manifests directory exists
ansible.builtin.file:
path: "{{ kube_vip_manifests_dir }}"
state: directory
mode: "0755"
become: true
- name: Deploy kube-vip RBAC manifest
ansible.builtin.template:
src: templates/kube-vip-rbac.yaml.j2
dest: "{{ kube_vip_manifests_dir }}/kube-vip-rbac.yaml"
owner: root
group: root
mode: "0644"
become: true
- name: Deploy kube-vip DaemonSet manifest
ansible.builtin.template:
src: templates/kube-vip.yaml.j2
dest: "{{ kube_vip_manifests_dir }}/kube-vip.yaml"
owner: root
group: root
mode: "0644"
become: true
- name: Ensure VIP is present in k3s TLS SANs config
ansible.builtin.blockinfile:
path: /etc/rancher/k3s/config.yaml
create: true
marker: "# {mark} ANSIBLE MANAGED kube-vip TLS SAN"
block: |
tls-san:
- "{{ k3s_vip }}"
become: true
register: tls_san_added
- name: Stop k3s for certificate rotation
ansible.builtin.systemd:
name: k3s
state: stopped
become: true
when: tls_san_added.changed
- name: Rotate k3s certificates to include VIP in SAN
ansible.builtin.command: k3s certificate rotate
become: true
when: tls_san_added.changed
- name: Start k3s after certificate rotation
ansible.builtin.systemd:
name: k3s
state: started
become: true
when: tls_san_added.changed

View File

@@ -0,0 +1,44 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-vip
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
annotations:
rbac.authorization.kubernetes.io/autoupdate: "true"
name: system:kube-vip-role
rules:
- apiGroups: [""]
resources: ["services/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["services", "endpoints"]
verbs: ["list", "get", "watch", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["list", "get", "watch", "update", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["list", "get", "watch", "update", "create"]
- apiGroups: ["discovery.k8s.io"]
resources: ["endpointslices"]
verbs: ["list", "get", "watch", "update"]
- apiGroups: [""]
resources: ["pods"]
verbs: ["list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: system:kube-vip-binding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: system:kube-vip-role
subjects:
- kind: ServiceAccount
name: kube-vip
namespace: kube-system

View File

@@ -0,0 +1,81 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app.kubernetes.io/name: kube-vip-ds
app.kubernetes.io/version: {{ kube_vip_version }}
name: kube-vip-ds
namespace: kube-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: kube-vip-ds
template:
metadata:
labels:
app.kubernetes.io/name: kube-vip-ds
app.kubernetes.io/version: {{ kube_vip_version }}
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-role.kubernetes.io/master
operator: Exists
- matchExpressions:
- key: node-role.kubernetes.io/control-plane
operator: Exists
containers:
- name: kube-vip
image: ghcr.io/kube-vip/kube-vip:{{ kube_vip_version }}
imagePullPolicy: IfNotPresent
args:
- manager
env:
- name: vip_arp
value: "true"
- name: port
value: "6443"
- name: vip_nodename
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: vip_interface
value: "{{ kube_vip_interface }}"
- name: vip_cidr
value: "32"
- name: dns_mode
value: first
- name: cp_enable
value: "true"
- name: cp_namespace
value: kube-system
- name: svc_enable
value: "false"
- name: vip_leaderelection
value: "true"
- name: vip_leasename
value: plndr-cp-lock
- name: vip_leaseduration
value: "5"
- name: vip_renewdeadline
value: "3"
- name: vip_retryperiod
value: "1"
- name: address
value: "{{ k3s_vip }}"
- name: prometheus_server
value: :2112
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_RAW
hostNetwork: true
serviceAccountName: kube-vip
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists

View File

@@ -0,0 +1,5 @@
---
kube_vip_version: "v0.8.9"
kube_vip_interface: "eth0"
kube_vip_manifests_dir: "/var/lib/rancher/k3s/server/manifests"
kube_vip_static_pod_path: "/var/lib/rancher/k3s/agent/pod-manifests/kube-vip.yaml"

View File

@@ -0,0 +1,15 @@
---
- name: Configure /etc/hosts with Proxmox cluster nodes
ansible.builtin.blockinfile:
path: /etc/hosts
block: |
# Proxmox Cluster Nodes
192.168.20.12 aya01.seyshiro.de aya01
192.168.20.14 lulu.seyshiro.de lulu
192.168.20.28 inko01.seyshiro.de inko01
192.168.20.10 naruto01.seyshiro.de naruto01
192.168.20.9 mii01.seyshiro.de mii01
marker: "# {mark} ANSIBLE MANAGED BLOCK - PROXMOX CLUSTER NODES"
create: true
mode: "644"
when: is_proxmox_node | bool

View File

@@ -6,5 +6,8 @@
state: present state: present
loop: "{{ proxmox_node_dependencies }}" loop: "{{ proxmox_node_dependencies }}"
- name: Configure hosts file for cluster nodes
ansible.builtin.include_tasks: 04_configure_hosts.yaml
- name: Ensure Harware Acceleration on node - name: Ensure Harware Acceleration on node
ansible.builtin.include_tasks: 06_hardware_acceleration.yaml ansible.builtin.include_tasks: 06_hardware_acceleration.yaml

View File

@@ -10,6 +10,7 @@
dest: "{{ proxmox_dirs.isos }}/{{ distro.name }}" dest: "{{ proxmox_dirs.isos }}/{{ distro.name }}"
mode: "0644" mode: "0644"
when: not image_stat.stat.exists when: not image_stat.stat.exists
register: download_result
- name: Set raw image file name fact - name: Set raw image file name fact
ansible.builtin.set_fact: ansible.builtin.set_fact:
@@ -24,5 +25,5 @@
ansible.builtin.command: ansible.builtin.command:
cmd: "qemu-img convert -O raw {{ proxmox_dirs.isos }}/{{ distro.name }} {{ proxmox_dirs.isos }}/{{ raw_image_name }}" cmd: "qemu-img convert -O raw {{ proxmox_dirs.isos }}/{{ distro.name }} {{ proxmox_dirs.isos }}/{{ raw_image_name }}"
when: when:
- download_result is changed or not raw_image_stat.stat.exists - (download_result is defined and download_result is changed) or not raw_image_stat.stat.exists
- image_stat.stat.exists - image_stat.stat.exists

View File

@@ -11,7 +11,7 @@ services:
vm: vm:
- docker-host11 - docker-host11
container_name: jellyfin container_name: jellyfin
image: jellyfin/jellyfin:10.10 image: jellyfin/jellyfin:10.11.7
volumes: volumes:
- name: "Configuration" - name: "Configuration"
internal: /config internal: /config
@@ -41,7 +41,7 @@ services:
vm: vm:
- docker-host11 - docker-host11
container_name: gitea container_name: gitea
image: gitea/gitea:1.23-rootless image: gitea/gitea:1.25.5-rootless
volumes: volumes:
- name: "Configuration" - name: "Configuration"
internal: /etc/gitea internal: /etc/gitea

View File

@@ -1,11 +1,11 @@
$ANSIBLE_VAULT;1.1;AES256 $ANSIBLE_VAULT;1.1;AES256
37356330336365666531353535343930613161663361363461316663396338323932303531376662 64356331353036663336626237373732393636366236326430343435313362333332656639356661
3331346562383135343732386663646463373064643632330a643435313435363138386630303138 3861323465653764303733366430306335303737323863370a393737656163623432363432366430
32616431636532666561306362396137366233623832326365616430313764353639393062336536 32353030303630323438643839363730326365303062653335303130623264613939303037376239
3766616231626131390a396336346465613439613439383465653864663936353930303463373563 3062613036333661300a363633306333373239633233653064343066343162356636373862656136
31323938376230363239323435356438353563346638363734613364646263613139643064313866 62333933353566643166643831313035643034376166316166623835326263376166626235306131
64333131333262383662333362613563656135356433373335646438336339326165626163653338 36393461633962333637636163333532626663316363653131333561653635373037353864353763
64636438373131313339316535653433633637633530386630653966306333336566306438376233 65666665653161383835663631656166346431613435396331356539353231623034623938393836
36383430396332373165386334363833613038633862653439306564366231643939663562316538 33643761303234376162383465383130633335356366393839636665373365623462363239636364
39383134623565363365323165626365393239396438373862313766653562623938373033396265 65343938653062623963666531653861646134633732313764356566633533666232373663633661
3161613463346332643632306561363963323630363630316263 6563396563643334666437353962383535306339663834623666

View File

@@ -2,6 +2,8 @@ k3s:
loadbalancer: loadbalancer:
default_port: 6443 default_port: 6443
k3s_vip: "192.168.20.2"
k3s_primary_server_ip: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_default_ipv4') | map(attribute='address') | unique | list | first }}" k3s_primary_server_ip: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_default_ipv4') | map(attribute='address') | unique | list | first }}"
k3s_server_ips: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_default_ipv4') | map(attribute='address') | unique | list }}" k3s_server_ips: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_default_ipv4') | map(attribute='address') | unique | list }}"

View File

@@ -1,66 +1,71 @@
$ANSIBLE_VAULT;1.1;AES256 $ANSIBLE_VAULT;1.1;AES256
38376362633961306438343561623064353761616565636134623630363864373866643232666465 62386239623335326131366161306232306338386166343266396639366430346563393230353537
3830396166373030623732383836366431363338666133360a633065643865323132616133376366 3334343932353765376165623765306135656237636633620a343639316565336366393331616633
31666466613663353431393039386131623837353862336632303832643464366439313734626435 65663732663933323764653566333036343038643230396539633563613263666134373830323064
6365663762313763650a386133396161366134326230383065613432636366626133643732373737 6632643036353263660a333633663232393430666331393936383665366136303737613363656538
63643132623337666131333533346261303933366439393766393533623831666536393632656365 30656562366638643132303331643762383533636238633632643866633637366266356463336464
38383465333139666264623632323939396536363863303932316261303135373330326631353565 38643466376161646533663563343036633130393439366363363736393639306539303062336162
62636662613134313836663238316433663865363332646538643930353862356465373362663430 33346432386532623036323434643737306361396663383136643636316366646233366133653365
35333936336665353238636438656530356434353134653461343661616162616333646562393964 30303238373938393435326330666634656130373932623439366165636135666237353765623034
32363965323366363162353238303430343261356237613735616433303635613161653366656439 39306333363638366331653961386332303765376363363365393363346634396163613730376638
39643833633663616665646233356532313030383535636164653539613533623666356561653736 66373237336362643965626134666634316539313363653864343561336636376461613439333337
37383137633830306233633835613864353561306537373238323034663035363535623431316534 61356662636334333439393264663732373666363266363139306236616532646239386462653461
63663234303838373630646536633563633136363730663832393738326163366634613164653532 36363463383431313636393265663264363331663661326363653533636632643266303064383363
63363264333661646133343431356533306564636465363363653035653965313430363665653265 63666432316562356636656165393438366637633834626530306536633932343736393137623635
30646333666630363136306636623262653361383664393162663463666365643735343835373365 65303236663562396133636539316533393436303239383836396235343566633333306538623066
30663633346531386263303432353662323563633636633465373538313434356535383033366663 61613335666666376637343737656663353961303936656336306566326563303436636331313832
37666336363863646432396562316130303661343462313435373936623636633061393030663139 64643439333235303366303630366139333633343034313038646539653839373562373365396238
32323233306236626635656133366230393030366563383835616238393336643364303563643430 35313764623233366366653266376639393363646238313861343035353233383064643061663861
30333731393962373738336331323639346662646539386561623834313638313636623161313236 62373137613839326537616135313766613136633463623766326132646531376434643065353830
62353335633933313131613130313164626238356134653733386334663461326265666437626366 63656361353937346231636638623838323064376537663164396436613838663436313536663939
65333262343164323966333232626635626339323634383735356536353733363933373935636461 33343061666464316636336565386536656166623836366633303734343764633634386265373064
62646130373431326662663163336361393762346630306363633761396664653633396664626530 39613733666262313732363438623434333466373835666238663331326133343831386433626530
38343135333433356135386539313439383738653561653536633936613338373765366139363134 32613531646630653463616561393039646433313732356662313865633561666263623934313261
35366265656164386335366466393066386232663562316665383363323164316337336630343234 35396537313636343062653134663333303161663861313161613539663138363635313331653663
35623138653735326531643063633062353137663763376532663731313537623337356339633532 66336166303165656362363436363435323664303665633837353638643934363136666637356633
64353239346634626539613032303962333662613765643639313266323462346239623736313863 66626365323030643137613662633465333365333230383938633336663230366365626263613531
31306262626161393862633038363061636362303864616566323065663964323563353034383362 38313839636339613866323162633565636636646435313062663462333832646463366165393161
32373665306664303036646565633830613130353531666264646162366538323366636330663737 34616462363331636534316662623534363636396135323662306439633938303635363836303966
37393238346262363039356536643765346165623666613331356636613630396361346566346633 30636434663531373962323361333366623633393435616438626239323635363064313266636539
37613465663631323530393366363766383136336337656163336431653935613765383634633462 33326435663533323834376536663639306533636436366337633233626132326262613937656262
35656264313739313630343238306439323030346465366337333562373132313564653333353461 35303433636665616538613565646233396135353036613833663135336462393663633931616166
37393635313965633064376239343139343663363633613437373632366139396539363265643731 32353165653636653763656533346234666433643264386466313864336631323534616164386664
30393333393236313033353364316535366664613439623163376163386362376161666334393864 34623865376535363535363534626538373665363761656566643136623738653933653634333134
33303365313564636337613239326233616331623166386562366438626135343961356330643861 61316662366333373466663661656632663662396563346665613662623533333637643135633936
37633239353064366262373130383635653037363037633035663738313739313739613136346332 32663265333966663433343139636262323162666463346438643439386565613364303832623231
33373834623365303036313436373037343763633762363833383865666434363533653632373663 66363962396433336630653935376232326165653533373732306461323733626337336232613562
62343565376631346632663265343335616563356632386166383238663663376632646539383339 33356230623061663964363233353666623862303638356331343731626233373230663463316663
61653533363765356365323139383037643363393539393933643164386362363164396535366231 38633337363065376535663161346464366230363931656139646233306531323136393463343362
66613533353864303766376261326139616538353237383235366261383331653435623637396536 33386432383563633164653434316466333462653031363035626633363666616331353132396633
65353063656561383066666134313039383166386238333438356161646562303866313238623337 66396532616361616230643430316161656665626133343438343231623063383331666262366262
35616166623433333130316565333738613163643166373661316338653236363962616337326633 31393238363139343237646165653666343136336335353761323931646235643333653263666561
34653533373636363464346362643166666532656636363432356261633537633535616562313036 38356537393363326532396564626338363836356138613761653039316431663064386335626136
30326232306561646438316533646636623566313963393563323366626566393936316466303635 37653630393636636631383232663030663239623861346636353834623361333161373331633961
38343439313437653835623538346532633936343662666161353765353366383637613964356466 39353037643033346563343463636666626330666235653533316264366433666561623364313932
34323063633132643135393537393061653261316635643838636262323837613134333936383038 64663964313461386136646161313133306133323531626266656236383365623933396332633963
66373538383735656263633066653566663631643062333139396233363764326230653032353264 62326630366234613236623731383037613133343039346464386437663164326136306136656233
62326264666261346265373062316630326636336132666661383765643637383565363433656464 66626231333131613963303765666163323938346631333031323431613564313163336433366235
37663231326334393734646230646263333137313432343763383662383165373037663838306137 39386434363631646235666631303163623131353734623435336633653337666337386661396635
38363262326165366165313230653265616333663062666134356561356236656561333433323935 63376233393838393533316264326234326564656639343832633666356136653865343234376234
39633337303763383435373532333838656335396662336139343931303431363933306562623635 37643832393931366339643263333962363537353063303561613537623334323432316262373035
66376430306165336233343931653231393633623530663133346161636435646236663465303065 30333862386338313763393863336132393466356566323831653164623762306665623466626531
61303035373937613433396465353732396364393231663331346237373939636233333639316130 64326338393832303233623465326262306433633065646334636130653863343164323661643664
61653739613737303362303263333366383437613537633964663932373035326439313239373439 37383163656464636339653732356334366164373161323533623839613730366238396233333661
64383935383661616164616462363462326661373338323864373634663737313261346632663464 66353435336362393530333235333338383763666561326461636265376561616563343266336264
33656330383133326136373331363161333065323533303762356532656264616632323165323166 64363733313332336335623961653333386163343030336165653763383734333837366234316536
66376339343065633165326662343330306662666164316435383264363833663664613338336535 31396664383965386461333331373261336636306638366666323936396433643263373035633435
36396238653361626666306234373564303037633264306261306133663665373939363865396236 30656234333537633636333163383063376630373234666561373338623332316465623861343866
33353037666162376339366563623832653434396237613064386335323837373636613462363034 66316466363766643863356262393363323838313766373363646636316439313133376464306138
65323663636563366161356665356562313165663262653663636266623661343538666239663230 38323031653765663363393337353939353930653664366164663939613062343833386435303133
33333837396132303033646432373633613135633062353930376232653261333036376338386632 63636235313533356233623738383431366133336430373036643262616435613631353465396263
37383132656361383339663833306163636661373339306138383936306137653961306135363036 62613537313532363065373764373966363036373966393633656237303737356130303366336561
62306162623465646131653966306533646166363665353966623132623765613862353665656538 39373064303938623732623363366333626639343962373361656632653466373462336630306566
66623931353032366635666138356365356364663931636435396363623061366131623166363466 30653538356535386463663637613532396432303063303737653265373037633731663062633662
63356234383338373834353666643036396561643261363236646435333466326464636335386664 36616138653063383435356162353539323265666432396266316264336164636132353937363230
64363965386439393236616135636437386432353361353632333363323536313334313462313934 37366137623335333864653138623533313665663263643732623135356136303534373936336166
6531 39366562363733343730643130633461656136656232303462323533363737333432633961646434
32613639316633326535303863656639313733343932323836663737616534643839366237633666
39363663633464636165353832666533363135383735353639353432386564643261613234356630
63323030613138386164633533353265643730626631326330646435323339363366386633653133
63666362653437653765303831613236623639646635303765633864653464306134666566613836
6339386338616565336163663366316634336230396165666161

View File

@@ -199,3 +199,25 @@ vms:
ciuser: "{{ user }}" ciuser: "{{ user }}"
sshkeys: "{{ pubkey }}" sshkeys: "{{ pubkey }}"
disk_size: 128 disk_size: 128
- name: "k3s-agent22"
node: "mii01"
vmid: 222
cores: 2
memory: 4096
net:
net0: "virtio,bridge=vmbr0,firewall=1"
boot_image: "{{ proxmox_cloud_init_images.debian.name }}"
ciuser: "{{ user }}"
sshkeys: "{{ pubkey }}"
disk_size: 128
- name: "k3s-agent23"
node: "naruto01"
vmid: 223
cores: 2
memory: 4096
net:
net0: "virtio,bridge=vmbr0,firewall=1"
boot_image: "{{ proxmox_cloud_init_images.debian.name }}"
ciuser: "{{ user }}"
sshkeys: "{{ pubkey }}"
disk_size: 128

View File

@@ -24,6 +24,8 @@ k3s-agent18
k3s-agent19 k3s-agent19
k3s-agent20 k3s-agent20
k3s-agent21 k3s-agent21
k3s-agent22
k3s-agent23
[k3s_loadbalancer] [k3s_loadbalancer]
k3s-loadbalancer k3s-loadbalancer