fix(k3s_server): skip installation if k3s binary already exists

Primary and secondary install tasks now check k3s_status.stat.exists so re-running the playbook is idempotent on already-provisioned nodes.
fix(k3s_server): use VIP address in kubeconfig instead of k3s_server_name
2026-04-27 21:43:42 +02:00 · 2026-04-27 21:41:55 +02:00 · 2026-04-27 21:39:37 +02:00 · 2026-04-27 21:39:28 +02:00 · 2026-04-27 21:35:24 +02:00 · 2026-04-26 12:08:42 +02:00
212 changed files with 7057 additions and 1270 deletions
--- a/.ansible-lint
+++ b/.ansible-lint
@@ -13,6 +13,8 @@ skip_list:
  - fqcn-builtins
  - no-handler
  - var-naming
+  - no-changed-when
+  - risky-shell-pipe

 # Enforce certain rules that are not enabled by default.
 enable_list:
@@ -25,7 +27,7 @@ enable_list:
  - no-changed-when

 # Offline mode disables any features that require internet access.
-offline: true
+offline: false

 # Set the desired verbosity level.
 verbosity: 1
--- a/.gitattributes
+++ b/.gitattributes
@@ -0,0 +1,8 @@
+vars/group_vars/proxmox/secrets_vm.yml diff=ansible-vault merge=binary
+vars/group_vars/all/secrets.yml diff=ansible-vault merge=binary
+vars/group_vars/docker/secrets.yml diff=ansible-vault merge=binary
+vars/group_vars/k3s/secrets.yml diff=ansible-vault merge=binary
+vars/group_vars/k3s/secrets_token.yml diff=ansible-vault merge=binary
+vars/group_vars/kubernetes/secrets.yml diff=ansible-vault merge=binary
+vars/group_vars/proxmox/secrets.yml diff=ansible-vault merge=binary
+vars/group_vars/proxmox/secrets_vm.yml diff=ansible-vault merge=binary
--- a/.github/workflows/ci.yaml
+++ b/.github/workflows/ci.yaml
@@ -0,0 +1,45 @@
+name: CI
+
+on:
+  push:
+    branches: [main, master]
+  pull_request:
+    branches: [main, master]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install ansible-lint==6.22.2 ansible-core==2.15.8
+
+      - name: Install Ansible collections
+        run: ansible-galaxy collection install -r requirements.yaml
+
+      - name: Run ansible-lint
+        run: ansible-lint
+
+  pre-commit:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v5
+        with:
+          python-version: '3.11'
+
+      - name: Install pre-commit
+        run: pip install pre-commit
+
+      - name: Run pre-commit
+        run: pre-commit run --all-files
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1 @@
+.worktrees/
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -0,0 +1,23 @@
+repos:
+  - repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v4.6.0
+    hooks:
+      - id: trailing-whitespace
+      - id: end-of-file-fixer
+      - id: check-yaml
+      - id: check-added-large-files
+  - repo: local
+    hooks:
+      - id: ansible-galaxy-install
+        name: Install ansible-galaxy collections
+        entry: ansible-galaxy collection install -r requirements.yaml
+        language: system
+        pass_filenames: false
+        always_run: true
+  - repo: https://github.com/ansible/ansible-lint
+    rev: v6.22.2
+    hooks:
+      - id: ansible-lint
+        files: \.(yaml)$
+        additional_dependencies:
+          - ansible-core==2.15.8
--- a/README.md
+++ b/README.md
@@ -1,87 +1,82 @@
 # TuDatTr IaC

-**I do not recommend this project being used for ones own infrastructure, as
-this project is heavily attuned to my specific host/network setup**
-The Ansible Project to provision fresh Debian VMs for my Proxmox instances.
-Some values are hard coded such as the public key both in
-[./scripts/debian_seed.sh](./scripts/debian_seed.sh) and [./group_vars/all/vars.yml](./group_vars/all/vars.yml).
+**I do not recommend this project being used for one's own infrastructure, as this project is heavily attuned to my specific host/network setup.**

-## Prerequisites
+This Ansible project automates the setup of a K3s Kubernetes cluster on Proxmox VE. It also includes playbooks for configuring Docker hosts, load balancers, and other services.

- [secrets.yml](secrets.yml) in the root directory of this repository.
-  Skeleton file can be found as [./secrets.yml.skeleton](./secrets.yml.skeleton).
- IP Configuration of hosts like in [./host_vars/\*](./host_vars/*)
- Setup [~/.ssh/config](~/.ssh/config) for the respective hosts used.
- Install `passlib` for your operating system. Needed to hash passwords ad-hoc.
+## Repository Structure

-## Improvable Variables
+The repository is organized into the following main directories:

- `group_vars/k3s/vars.yml`:
-  - `k3s.server.ips`: Take list of IPs from host_vars `k3s_server*.yml`.
-  - `k3s_db_connection_string`: Embed this variable in the `k3s.db.`-directory.
-    Currently causes loop.
+- `playbooks/`: Contains the main Ansible playbooks for different setup scenarios.
+- `roles/`: Contains the Ansible roles that are used by the playbooks.
+- `vars/`: Contains variable files, including group-specific variables.

-## Run Playbook
+## Playbooks

-To run a first playbook and test the setup the following command can be executed.
+The following playbooks are available:

+- `proxmox.yml`: Provisions VMs and containers on Proxmox VE.
+- `k3s-servers.yml`: Sets up the K3s master nodes.
+- `k3s-agents.yml`: Sets up the K3s agent nodes.
+- `k3s-loadbalancer.yml`: Configures a load balancer for the K3s cluster.
+- `k3s-storage.yml`: Configures storage for the K3s cluster.
+- `docker.yml`: Sets up Docker hosts and their load balancer.
+- `docker-host.yml`: Configures the docker hosts.
+- `docker-lb.yml`: Configures a load balancer for Docker services.
+- `kubernetes_setup.yml`: A meta-playbook for setting up the entire Kubernetes cluster.
+
+## Roles
+
+The following roles are defined:
+
+- `common`: Common configuration tasks for all nodes.
+- `proxmox`: Manages Proxmox VE, including VM and container creation.
+- `k3s_server`: Installs and configures K3s master nodes.
+- `k3s_agent`: Installs and configures K3s agent nodes.
+- `k3s_loadbalancer`: Configures an Nginx-based load balancer for the K3s cluster.
+- `k3s_storage`: Configures storage solutions for Kubernetes.
+- `docker_host`: Installs and configures Docker.
+- `kubernetes_argocd`: Deploys Argo CD to the Kubernetes cluster.
+- `node_exporter`: Installs the Prometheus Node Exporter for monitoring.
+- `reverse_proxy`: Configures a Caddy-based reverse proxy.
+- `edge_vps`: Placeholder role for Edge VPS configuration.
+
+## Usage
+
+1.  **Install dependencies:**
+
+    ```bash
+    pip install -r requirements.txt
+    ansible-galaxy install -r requirements.yml
+    ```
+
+2.  **Configure variables:**
+
+    - Create an inventory file (e.g., `vars/k3s.ini`).
+    - Adjust variables in `vars/group_vars/` to match your environment.
+
+3.  **Run playbooks:**
+
+    ```bash
+    # To provision VMs on Proxmox
+    ansible-playbook -i vars/proxmox.ini playbooks/proxmox.yml
+
+    # To set up the K3s cluster
+    ansible-playbook -i vars/k3s.ini playbooks/kubernetes_setup.yml
+    ```
+
+## Notes
+
+### Vault Git Diff
+
+This repo has a `.gitattributes` which points at the repos ansible-vault files.
+These can be temporarily decrypted for git diff by adding this in conjunction with the `.gitattributes`:
 ```sh
-ansible-playbook -i production -J k3s-servers.yml
+# https://stackoverflow.com/questions/29937195/how-to-diff-ansible-vault-changes
+git config --global diff.ansible-vault.textconv "ansible-vault view"
 ```

-This will run the [./k3s-servers.yml](./k3s-servers.yml) playbook and execute
-its roles.
+## Disclaimer

-## After successful k3s installation
-
-To access our Kubernetes cluster from our host machine to work on it via
-flux and such we need to manually copy a k3s config from one of our server nodes to our host machine.
-Then we need to install `kubectl` on our host machine and optionally `kubectx` if we're already
-managing other Kubernetes instances.
-Then we replace the localhost address inside of the config with the IP of our load balancer.
-Finally we'll need to set the KUBECONFIG variable.
-
-```sh
-mkdir ~/.kube/
-scp k3s-server00:/etc/rancher/k3s/k3s.yaml ~/.kube/config
-chown $USER ~/.kube/config
-sed -i "s/127.0.0.1/192.168.20.22/" ~/.kube/config
-export KUBECONFIG=~/.kube/config
-```
-
-Install flux and continue in the flux repository.
-
-## Longhorn Nodes
-
-To create longhorn nodes from existing kubernetes nodes we want to increase
-their storage capacity. Since we're using VMs for our k3s nodes we can
-resize the root-disk of the VMs in the proxmox GUI.
-
-Then we have to resize the partitions inside of the VM so the root partition
-uses the newly available space.
-When we have LVM-based root partition we can do the following:
-
-```sh
-# Create a new partition from the free space.
-sudo fdisk /dev/sda
-# echo "n\n\n\n\n\nw\n"
-# n > 5x\n > w > \n
-# Create a LVM volume on the new partition
-sudo pvcreate /dev/sda3
-sudo vgextend  k3s-vg /dev/sda3
-# Use the newly available storage in the root volume
-sudo lvresize -l +100%FREE -r /dev/k3s-vg/root
-```
-
-## Cloud Init VMs
-
-```sh
-# On Hypervisor Host
-qm resize <vmid> scsi0 +32G
-# On VM
-sudo fdisk -l /dev/sda # To check
-echo 1 | sudo tee /sys/class/block/sda/device/rescan
-sudo fdisk -l /dev/sda # To check
-# sudo apt-get install cloud-guest-utils
-sudo growpart /dev/sda 1
-```
+This project is highly customized for the author's specific environment. Using it without modification is not recommended.
--- a/ansible.cfg
+++ b/ansible.cfg
@@ -14,7 +14,7 @@ vault_password_file=/media/veracrypt1/scripts/ansible_vault.sh

 # (list) Check all of these extensions when looking for 'variable' files which should be YAML or JSON or vaulted versions of these.
 # This affects vars_files, include_vars, inventory and vars plugins among others.
-yaml_valid_extensions=.yml
+yaml_valid_extensions=.yaml

 # (boolean) Set this to "False" if you want to avoid host key checking by the underlying tools Ansible uses to connect to the host
 host_key_checking=False
--- a/ansible.cfg.default
+++ b/ansible.cfg.default
@@ -688,4 +688,3 @@

 # (list) default list of tags to skip in your plays, has precedence over Run Tags
 ;skip=
-
--- a/blog.md
+++ b/blog.md
@@ -0,0 +1,69 @@
+---
+title: "Automating My Homelab: From Bare Metal to Kubernetes with Ansible"
+date: 2025-07-27
+author: "TuDatTr"
+tags: ["Ansible", "Proxmox", "Kubernetes", "K3s", "IaC", "Homelab"]
+---
+
+## The Homelab: Repeatable, Automated, and Documented
+
+For many tech enthusiasts, a homelab is a playground for learning, experimenting, and self-hosting services. But as the complexity grows, so does the management overhead. Manually setting up virtual machines, configuring networks, and deploying applications becomes a tedious and error-prone process. This lead me to building my homelab as Infrastructure as Code (IaC) with Ansible.
+
+This blog post walks you through my Ansible project, which automates the entire lifecycle of my homelab—from provisioning VMs on Proxmox to deploying a production-ready K3s Kubernetes cluster.
+
+## Why Ansible?
+
+When I decided to automate my infrastructure, I considered several tools. I chose Ansible for its simplicity, agentless architecture, and gentle learning curve. Writing playbooks in YAML felt declarative and intuitive, and the vast collection of community-supported modules meant I wouldn't have to reinvent the wheel.
+
+## The Architecture: A Multi-Layered Approach
+
+My Ansible project is designed to be modular and scalable, with a clear separation of concerns. It's built around a collection of roles, each responsible for a specific component of the infrastructure.
+
+### Layer 1: Proxmox Provisioning
+
+The foundation of my homelab is Proxmox VE. The `proxmox` role is the first step in the automation pipeline. It handles:
+
+- **VM and Container Creation:** Using a simple YAML definition in my `vars` files, I can specify the number of VMs and containers to create, their resources (CPU, memory, disk), and their base operating system images.
+- **Cloud-Init Integration:** For VMs, I leverage Cloud-Init to perform initial setup, such as setting the hostname, creating users, and injecting SSH keys for Ansible to connect to.
+- **Hardware Passthrough:** The role also configures hardware passthrough for devices like Intel Quick Sync for video transcoding in my media server.
+
+### Layer 2: The K3s Kubernetes Cluster
+
+With the base VMs ready, the next step is to build the Kubernetes cluster. I chose K3s for its lightweight footprint and ease of installation. The setup is divided into several roles:
+
+- `k3s_server`: This role bootstraps the first master node and then adds additional master nodes to create a highly available control plane.
+- `k3s_agent`: This role joins the worker nodes to the cluster.
+- `k3s_loadbalancer`: A dedicated VM running Nginx is set up to act as a load balancer for the K3s API server, ensuring a stable endpoint for `kubectl` and other clients.
+
+### Layer 3: Applications and Services
+
+Once the Kubernetes cluster is up and running, it's time to deploy applications. My project includes roles for:
+
+- `docker_host`: For services that are better suited to run in a traditional Docker environment, this role sets up and configures Docker hosts.
+- `kubernetes_argocd`: I use Argo CD for GitOps-based continuous delivery. This role deploys Argo CD to the cluster and configures it to sync with my application repositories.
+- `reverse_proxy`: Caddy is my reverse proxy of choice, and this role automates its installation and configuration, including obtaining SSL certificates from Let's Encrypt.
+
+## Putting It All Together: The Power of Playbooks
+
+The playbooks in the `playbooks/` directory tie everything together. For example, the `kubernetes_setup.yml` playbook runs all the necessary roles in the correct order to bring up the entire Kubernetes cluster from scratch.
+
+```yaml
+# playbooks/kubernetes_setup.yml
+---
+- name: Set up Kubernetes Cluster
+  hosts: all
+  gather_facts: true
+  roles:
+    - role: k3s_server
+    - role: k3s_agent
+    - role: k3s_loadbalancer
+    - role: kubernetes_argocd
+```
+
+## Final Thoughts and Future Plans
+
+This Ansible project has transformed my homelab from a collection of manually configured machines into a fully automated and reproducible environment. I can now tear down and rebuild my entire infrastructure with a single command, which gives me the confidence to experiment without fear of breaking things.
+
+While the project is highly tailored to my specific needs, I hope this overview provides some inspiration for your own automation journey. The principles of IaC and the power of tools like Ansible can be applied to any environment, big or small.
+
+What's next? I plan to explore more advanced Kubernetes concepts, such as Cilium for networking and policy, and integrate more of my self-hosted services into the GitOps workflow with Argo CD. The homelab is never truly "finished," and that's what makes it so much fun.
--- a/changelog.md
+++ b/changelog.md
@@ -0,0 +1,75 @@
+# Changelog
+
+Technical evolution of the infrastructure stack, tracking the migration from standalone Docker hosts to a fully automated, GitOps-managed Kubernetes cluster.
+
+## Phase 5: GitOps & Cluster Hardening (July 2025 - Present)
+
+*Shifted control plane management to ArgoCD and expanded storage capabilities.*
+
+- **GitOps Implementation**:
+  - Deployed **ArgoCD** in an App-of-Apps pattern to manage cluster state (`89c51aa`).
+  - Integrated **Sealed Secrets** (implied via vault diffs) and **Cert-Manager** for automated TLS management (`76000f8`).
+  - Migrated core services (Traefik, MetalLB) to Helm charts managed via ArgoCD manifests.
+- **Storage Architecture**:
+  - Implemented **Longhorn** with iSCSI support for distributed block storage (`48aec11`).
+  - Added **NFS Provisioner** (`e1a2248`) for ReadWriteMany volumes capabilities.
+- **Networking**:
+  - Centralized primary server IP logic (`97a5d6c`) to support HA control plane capability.
+  - Replaced Netcup DNS webhooks with **Cloudflare** for Caddy ACME challenges (`9cb90a8`).
+- **Observability**:
+  - Added **healthcheck** definitions to Docker Compose services (`0e8e07e`) and K3s probes.
+
+## Phase 4: IaaC Refactoring & Proxmox API Integration (Nov 2024 - June 2025)
+
+*Refactored Ansible roles for modularity and implemented Proxmox API automation for "click-less" provisioning.*
+
+- **Proxmox Automation**:
+  - Developed `roles/proxmox` to interface with Proxmox API: automated VM creation, cloning from templates, and Cloud-Init injection (`f2ea03b`).
+  - Configured **PCI Passthrough** (`591342f`) and hardware acceleration for media transcoding nodes.
+  - Added cron-based VM state reconciliation (`a1da69a`).
+- **Ansible Restructuring**:
+  - **Inventory Refactor**: Moved from root-level inventory files to a hierarchical `vars/` structure (`609e000`).
+  - **Linting Pipeline**: Integrated `ansible-lint` and `pre-commit` hooks (`6eef96b`) to enforce YAML standards and best practices.
+  - **Vault Security**: Configured `.gitattributes` to enable `ansible-vault view` for cleartext diffs in git (`c3905ed`).
+- **Identity Management**:
+  - Deployed **Keycloak** (`42196a3`) for OIDC/SAML authentication across the stack.
+
+## Phase 3: The Kubernetes Migration (Sep 2024 - Oct 2024)
+
+*Architectural pivot from Docker Compose to K3s.*
+
+- **Control Plane Setup**:
+  - Bootstrapped **K3s** cluster with dedicated server/agent split.
+  - Configured **HAProxy/Nginx** load balancers (`51a49d0`) for API server high availability.
+- **Node Provisioning**:
+  - Standardized node bootstrapping (kernel modules, sysctl params) for K3s compatibility.
+  - Deployed specialized storage nodes for Longhorn (`7d58de9`).
+- **Decommissioning**:
+  - Drained and removed legacy Docker hosts (`0aed818`).
+  - Migrated stateful workloads (Postgres) to cluster-managed resources.
+
+## Phase 2: Docker Service Expansion (2023 - 2024)
+
+*Vertical scaling of Docker hosts and introduction of the monitoring stack.*
+
+- **Service Stack**:
+  - Deployed the **\*arr suite** (Sonarr, Radarr, etc.) and Jellyfin with hardware mapping (`3d7f143`).
+  - Integrated **Paperless-ngx** with Redis and Tika consumption (`3f88065`).
+  - Self-hosted **Gitea** and **GitLab** (later removed) for source control.
+- **Observability V1**:
+  - Deployed **Prometheus** and **Grafana** stack (`b3ae5ef`).
+  - Added **Node Exporter** and **SmartCTL Exporter** (`0a361d9`) to bare metal hosts.
+  - Implemented **Uptime Kuma** for external availability monitoring.
+- **Reverse Proxy**:
+  - Transitioned ingress from Traefik v2 to **Nginx Proxy Manager**, then to **Caddy** for simpler configuration management (`a9af3c7`, `1a1b8cb`).
+
+## Phase 1: Genesis & Networking (Late 2022)
+
+*Initial infrastructure bring-up.*
+
+- **Base Configuration**:
+  - Established Ansible role structure for baseline system hardening (SSH, users, packages).
+  - Configured **Wireguard** mesh for secure inter-node communication (`2ba4259`).
+  - Set up **Backblaze B2** offsite backups via Restic/Rclone (`b371e24`).
+- **Network**:
+  - Experimented with **macvlan** Docker networks for direct container IP assignment.
--- a/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+++ b/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
@@ -0,0 +1,750 @@
+# Proxmox Cluster Debugging Plan
+
+## Overview
+This document outlines the plan to debug the Proxmox cluster issue where nodes `mii01` and `naruto01` are showing up with `?` in the Web UI, indicating a potential version mismatch.
+
+## Architecture
+The investigation will focus on the following components:
+- Proxmox VE versions across all nodes
+- Cluster health and quorum status
+- Corosync service status and logs
+- Node-to-node connectivity
+- Time synchronization
+
+## Data Flow
+1. **Version Check:** Verify Proxmox VE versions on all nodes.
+2. **Cluster Health:** Check cluster status and quorum.
+3. **Corosync Logs:** Analyze Corosync logs for errors.
+4. **Connectivity:** Verify network connectivity between nodes.
+5. **Time Synchronization:** Ensure time is synchronized across all nodes.
+
+## Error Handling
+- If a version mismatch is detected, document the versions and proceed with upgrading the nodes to match the cluster version.
+- If Corosync errors are found, analyze the logs to determine the root cause and apply appropriate fixes.
+- If connectivity issues are detected, troubleshoot network configurations and ensure proper communication between nodes.
+
+## Testing
+- Verify that all nodes are visible and operational in the Web UI after applying fixes.
+- Ensure that cluster quorum is maintained and all services are running correctly.
+
+## Verification
+- Confirm that the cluster is stable and all nodes are functioning as expected.
+- Document any changes made and the steps taken to resolve the issue.
+
+## Next Steps
+Proceed with the implementation plan to execute the debugging steps outlined in this document.
+## Findings
+
+The investigation revealed several critical issues:
+
+1. **Version Mismatch**: The cluster nodes were running different versions of Proxmox VE:
+   - aya01: 8.1.4 (kernel 6.5.11-8-pve)
+   - lulu: 8.2.2 (kernel 6.8.4-2-pve)
+   - inko01: 8.4.0 (kernel 6.8.12-9-pve)
+   - naruto01: 8.4.0 (kernel 6.8.12-9-pve)
+   - mii01: 9.0.3 (kernel 6.14.8-2-pve)
+
+2. **Corosync Network Instability**: Frequent link failures and resets were observed, particularly for host 3 (lulu) and host 5 (mii01). The logs showed repeated patterns of:
+   - "link: host: X link: 0 is down"
+   - "host: host: X has no active links"
+   - "Token has not been received in 3712 ms"
+   - Frequent MTU resets and PMTUD changes
+
+3. **Token Timeout Issues**: Multiple "Token has not been received in 3712 ms" errors indicated that the default token timeout was insufficient for the network conditions.
+
+## Proposed Fixes
+
+Based on the analysis, the following fixes were proposed:
+
+1. **Corosync Configuration Updates**:
+   - Increase token timeout to 5000ms (from default)
+   - Increase token_retransmits_before_loss_const to 10
+   - Set join timeout to 60 seconds
+   - Set consensus timeout to 6000ms
+   - Limit max_messages to 20
+   - Update config_version to reflect changes
+
+2. **Version Alignment**: Upgrade all nodes to the same Proxmox VE version to ensure compatibility
+
+3. **Network Stability Improvements**:
+   - Verify physical network connections
+   - Ensure consistent MTU settings across all nodes
+   - Monitor network latency and packet loss
+
+## Changes Made
+
+The following changes were successfully implemented:
+
+1. **Corosync Configuration**: Updated `/etc/pve/corosync.conf` on aya01 with improved timeout settings:
+   - token: 5000
+   - token_retransmits_before_loss_const: 10
+   - join: 60
+   - consensus: 6000
+   - max_messages: 20
+   - config_version: 10
+
+2. **Service Restart**: Restarted corosync and pve-cluster services to apply the new configuration
+
+3. **Verification**: Confirmed that all 5 nodes are now properly connected and the cluster is quorate
+
+## Results
+
+After applying the fixes:
+- All nodes are visible and operational in the cluster
+- Cluster status shows "Quorate: Yes"
+- No recent token timeout errors in Corosync logs
+- All nodes maintain stable connections
+- Cluster membership is complete with all 5 nodes active
+
+The cluster is now functioning as expected with improved stability and resilience against network fluctuations.
+## Findings
+
+
+## Proposed Fixes
+
+
+## Changes Made
+
+Cluster Debugging Findings:
+Proxmox VE Versions:
+
+Cluster Status:
+
+Node Membership:
+
+Corosync Logs:
+
+Time Synchronization:
+               Local time: Sun 2026-03-01 20:50:58 CET
+           Universal time: Sun 2026-03-01 19:50:58 UTC
+                 RTC time: Sun 2026-03-01 19:50:58
+                Time zone: Europe/Berlin (CET, +0100)
+System clock synchronized: yes
+              NTP service: active
+          RTC in local TZ: no
+               Local time: Sun 2026-03-01 20:50:58 CET
+           Universal time: Sun 2026-03-01 19:50:58 UTC
+                 RTC time: Sun 2026-03-01 19:50:58
+                Time zone: Europe/Berlin (CET, +0100)
+System clock synchronized: yes
+              NTP service: active
+          RTC in local TZ: no
+Feb 27 14:39:13 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 14:39:13 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 14:39:14 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 14:39:14 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 14:39:14 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 14:57:21 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 14:57:21 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 14:57:21 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 14:57:24 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 14:57:24 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 14:57:24 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 14:57:24 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 15:48:27 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 15:48:27 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 15:48:27 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 15:48:29 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 15:48:29 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 15:48:29 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 18:46:04 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 48a1b
+Feb 27 19:03:17 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 19:03:17 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 19:03:17 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 19:03:20 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 19:03:20 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 19:03:20 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 19:03:20 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 19:41:49 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 19:41:49 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 19:41:49 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 19:41:50 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 19:41:50 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 19:41:51 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 20:12:44 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 20:12:44 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 20:12:44 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 20:12:47 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 20:12:47 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 20:12:47 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 20:12:47 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 20:19:21 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 20:19:21 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 20:19:21 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 20:19:24 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 20:19:24 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 20:19:24 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 20:19:24 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:40:33 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 21:42:58 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 21:42:58 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:42:58 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 21:43:00 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 21:43:00 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 21:43:00 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:43:00 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 21:49:55 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 21:49:55 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:49:55 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 21:49:57 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 21:49:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 21:49:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 21:49:57 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 22:53:39 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 22:53:39 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 22:53:39 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 22:53:40 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 22:53:40 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 22:53:40 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 27 23:04:51 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 27 23:04:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 23:04:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 27 23:04:54 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 27 23:04:54 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 27 23:04:54 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 27 23:04:54 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 00:18:24 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d988
+Feb 28 00:18:24 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d989
+Feb 28 00:18:24 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d98a
+Feb 28 00:18:24 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d98b
+Feb 28 00:18:26 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d98c
+Feb 28 00:18:26 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5d98d
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 00:53:03 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 01:36:27 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 01:36:27 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 01:36:27 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 01:36:29 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 01:36:29 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 01:36:29 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 01:36:29 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 03:20:45 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 05:47:56 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 05:47:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 05:47:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 05:47:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 05:47:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 05:47:57 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 05:57:03 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 05:57:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 05:57:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 05:57:04 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 05:57:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 05:57:04 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 06:10:35 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 06:10:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 06:10:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 06:10:37 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 06:10:37 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 06:10:37 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 07:09:26 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 07:38:11 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 08:00:03 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 08:00:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:00:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 08:00:03 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 08:00:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:00:04 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 08:23:05 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 08:23:05 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:23:05 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 08:23:08 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 08:23:08 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 08:23:08 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:23:08 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 08:36:41 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 08:36:41 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:36:41 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 08:36:41 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 08:36:41 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:36:42 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 08:45:39 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 08:45:39 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:45:39 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 08:45:42 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 08:45:42 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 08:45:42 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 09:23:56 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 09:23:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:23:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 09:23:58 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 09:23:58 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 09:23:58 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:23:58 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 09:34:48 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 09:34:48 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:34:48 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 09:34:51 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 09:34:51 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 09:34:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:34:51 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 09:54:09 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 09:54:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:54:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 09:54:11 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 09:54:11 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 09:54:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 09:54:11 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 10:18:51 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 10:18:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 10:18:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 10:18:52 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 10:18:52 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 10:18:52 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 10:31:07 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 10:31:07 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 10:31:07 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 10:31:09 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 10:31:09 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 10:31:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 10:31:09 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 12:25:03 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 12:25:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 12:25:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 12:25:05 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 12:25:05 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 12:25:05 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 12:38:06 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 8c34b
+Feb 28 12:38:06 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 8c34e
+Feb 28 12:38:06 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 8c355
+Feb 28 14:39:02 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 18:31:43 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 19:45:51 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 19:45:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 19:45:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 19:45:53 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 19:45:53 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 19:45:53 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 19:45:53 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 20:22:47 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 21:26:43 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 21:26:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 21:26:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 21:26:45 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 21:26:45 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 21:26:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 21:26:45 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 21:50:41 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 21:50:41 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 21:50:41 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 21:50:43 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 21:50:43 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 21:50:43 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 21:50:44 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Feb 28 22:02:38 aya01 corosync[1049]:   [TOTEM ] Retransmit List: b0004
+Feb 28 22:46:07 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Feb 28 22:46:07 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 22:46:07 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Feb 28 22:46:09 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Feb 28 22:46:09 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Feb 28 22:46:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Feb 28 22:46:09 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 00:26:09 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 00:26:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 00:26:09 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 00:26:12 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 00:26:12 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 00:26:12 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 00:26:12 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 01:28:54 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 01:28:54 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 01:28:54 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 01:28:57 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 01:28:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 01:28:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 01:28:57 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 01:56:02 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 04:30:28 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 04:30:28 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 04:30:28 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 04:30:30 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 04:30:30 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 04:30:30 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 04:58:04 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 04:58:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 04:58:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 04:58:04 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 04:58:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 04:58:05 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:02:59 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:02:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:02:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:03:02 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 05:03:02 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:03:02 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:03:02 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:08:04 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:17:55 aya01 corosync[1049]:   [KNET  ] link: host: 5 link: 0 is down
+Mar 01 05:17:55 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 05:17:55 aya01 corosync[1049]:   [KNET  ] host: host: 5 has no active links
+Mar 01 05:17:57 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 05:17:58 aya01 corosync[1049]:   [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
+Mar 01 05:18:00 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
+Mar 01 05:18:00 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 05:18:00 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:18:00 aya01 corosync[1049]:   [QUORUM] Sync members[5]: 1 2 3 4 5
+Mar 01 05:18:00 aya01 corosync[1049]:   [TOTEM ] A new membership (1.49dc) was formed. Members
+Mar 01 05:18:00 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 5
+Mar 01 05:18:00 aya01 corosync[1049]:   [QUORUM] Members[5]: 1 2 3 4 5
+Mar 01 05:18:00 aya01 corosync[1049]:   [MAIN  ] Completed service synchronization, ready to provide service.
+Mar 01 05:19:48 aya01 corosync[1049]:   [KNET  ] link: host: 2 link: 0 is down
+Mar 01 05:19:48 aya01 corosync[1049]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
+Mar 01 05:19:48 aya01 corosync[1049]:   [KNET  ] host: host: 2 has no active links
+Mar 01 05:19:50 aya01 corosync[1049]:   [KNET  ] rx: host: 2 link: 0 is up
+Mar 01 05:19:50 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
+Mar 01 05:19:50 aya01 corosync[1049]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
+Mar 01 05:19:50 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 05:19:51 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] link: host: 2 link: 0 is down
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] host: host: 2 has no active links
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:26:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] rx: host: 2 link: 0 is up
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 2 joined
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] host: host: 2 (passive) best link: 0 (pri: 1)
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:26:03 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:28:53 aya01 corosync[1049]:   [TOTEM ] Retransmit List: b47
+Mar 01 05:28:53 aya01 corosync[1049]:   [TOTEM ] Retransmit List: b48
+Mar 01 05:28:53 aya01 corosync[1049]:   [TOTEM ] Retransmit List: b49
+Mar 01 05:34:50 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 1118
+Mar 01 05:47:20 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:47:20 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:47:20 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:47:22 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 05:47:22 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:47:22 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:47:22 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:51:50 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:51:50 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:51:50 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:51:51 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:51:51 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:51:51 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 05:55:01 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 05:55:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:55:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 05:55:01 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 05:55:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 05:55:02 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 07:02:47 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 6855
+Mar 01 07:47:31 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 957e
+Mar 01 08:39:29 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 08:39:29 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 08:39:29 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 08:39:31 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 08:39:31 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 08:39:31 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 08:39:31 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 09:39:45 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 09:39:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 09:39:45 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 09:39:46 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 09:39:46 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 09:39:46 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 10:05:11 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 10:05:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 10:05:11 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 10:05:12 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 10:05:12 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 10:05:12 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 10:09:14 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 12595
+Mar 01 10:10:15 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 10:10:15 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 10:10:15 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 10:10:16 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 10:10:16 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 10:10:16 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 11:10:56 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 11:10:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 11:10:56 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 11:10:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 11:10:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 11:10:58 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 11:37:57 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 182e0
+Mar 01 11:45:54 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 11:45:54 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 11:45:54 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 11:45:57 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 11:45:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 11:45:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 11:45:57 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 11:59:48 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 1990c
+Mar 01 13:14:45 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 1e4f2
+Mar 01 15:08:28 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 15:08:28 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:08:28 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 15:08:30 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 15:08:30 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 15:08:30 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:08:30 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 15:15:22 aya01 corosync[1049]:   [KNET  ] link: host: 5 link: 0 is down
+Mar 01 15:15:22 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 15:15:22 aya01 corosync[1049]:   [KNET  ] host: host: 5 has no active links
+Mar 01 15:15:23 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
+Mar 01 15:15:23 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 15:15:23 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 15:15:47 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26281
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26364
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26365
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26366
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26367
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26368
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26369
+Mar 01 15:16:35 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 2636a
+Mar 01 15:17:24 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26449
+Mar 01 15:18:53 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 265dd
+Mar 01 15:19:14 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 15:19:25 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 26684
+Mar 01 15:22:35 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 15:22:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:22:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 15:22:38 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 15:22:38 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 15:22:38 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:22:38 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 15:41:34 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 15:41:55 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 15:41:55 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:41:55 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 15:41:57 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 15:41:57 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 15:41:57 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:41:57 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 15:46:50 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 2835f
+Mar 01 15:50:35 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 15:50:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:50:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 15:50:37 aya01 corosync[1049]:   [KNET  ] rx: host: 3 link: 0 is up
+Mar 01 15:50:37 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 15:50:37 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 15:50:37 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 16:06:58 aya01 corosync[1049]:   [KNET  ] link: host: 5 link: 0 is down
+Mar 01 16:06:58 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 16:06:58 aya01 corosync[1049]:   [KNET  ] host: host: 5 has no active links
+Mar 01 16:06:59 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 16:06:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:06:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 16:07:00 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
+Mar 01 16:07:00 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 16:07:00 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 16:07:00 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:07:00 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 16:19:46 aya01 corosync[1049]:   [KNET  ] link: host: 5 link: 0 is down
+Mar 01 16:19:46 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 16:19:46 aya01 corosync[1049]:   [KNET  ] host: host: 5 has no active links
+Mar 01 16:19:47 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
+Mar 01 16:19:47 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 16:19:47 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 16:19:58 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 2a534
+Mar 01 16:20:00 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 16:20:00 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:20:00 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 16:20:01 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 16:20:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:20:01 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 16:20:18 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 16:51:34 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 16:51:34 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:51:34 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 16:51:35 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 16:51:35 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 16:51:35 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 17:02:07 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 17:02:08 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 2d205
+Mar 01 17:35:23 aya01 corosync[1049]:   [KNET  ] link: host: 5 link: 0 is down
+Mar 01 17:35:23 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 17:35:23 aya01 corosync[1049]:   [KNET  ] host: host: 5 has no active links
+Mar 01 17:35:25 aya01 corosync[1049]:   [TOTEM ] Token has not been received in 3712 ms
+Mar 01 17:35:26 aya01 corosync[1049]:   [TOTEM ] A processor failed, forming new configuration: token timed out (4950ms), waiting 5940ms for consensus.
+Mar 01 17:35:28 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 5 joined
+Mar 01 17:35:28 aya01 corosync[1049]:   [KNET  ] host: host: 5 (passive) best link: 0 (pri: 1)
+Mar 01 17:35:28 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 17:35:28 aya01 corosync[1049]:   [QUORUM] Sync members[5]: 1 2 3 4 5
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] A new membership (1.49e0) was formed. Members
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 1 2
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 9 a
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: d e
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 13 14
+Mar 01 17:35:28 aya01 corosync[1049]:   [QUORUM] Members[5]: 1 2 3 4 5
+Mar 01 17:35:28 aya01 corosync[1049]:   [MAIN  ] Completed service synchronization, ready to provide service.
+Mar 01 17:35:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 17 18
+Mar 01 18:15:23 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 2c18
+Mar 01 19:29:59 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 19:29:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 19:29:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 19:30:01 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 19:30:01 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 19:30:01 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+Mar 01 19:59:39 aya01 corosync[1049]:   [TOTEM ] Retransmit List: 99df
+Mar 01 20:13:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: a827
+Mar 01 20:13:28 aya01 corosync[1049]:   [TOTEM ] Retransmit List: a828
+Mar 01 20:27:18 aya01 corosync[1049]:   [TOTEM ] Retransmit List: b62d
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] link: host: 3 link: 0 is down
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 has no active links
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] link: Resetting MTU for link 0 because host 3 joined
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] host: host: 3 (passive) best link: 0 (pri: 1)
+Mar 01 20:43:59 aya01 corosync[1049]:   [KNET  ] pmtud: Global data MTU changed to: 1397
+               Local time: Sun 2026-03-01 20:50:59 CET
+           Universal time: Sun 2026-03-01 19:50:59 UTC
+                 RTC time: Sun 2026-03-01 19:50:59
+                Time zone: Europe/Berlin (CET, +0100)
+System clock synchronized: yes
+              NTP service: active
+          RTC in local TZ: no
+Cluster information
+-------------------
+Name:             tudattr-lab
+Config Version:   9
+Transport:        knet
+Secure auth:      on
+
+
+Membership information
+----------------------
+    Nodeid      Votes Name
+         1          1 aya01 (local)
+         2          1 inko01
+         3          1 lulu
+         4          1 naruto01
+         5          1 mii01
+Quorum information
+------------------
+Date:             Sun Mar  1 20:50:59 2026
+Quorum provider:  corosync_votequorum
+Nodes:            5
+Node ID:          0x00000001
+Ring ID:          1.49e0
+Quorate:          Yes
+
+Votequorum information
+----------------------
+Expected votes:   5
+Highest expected: 5
+Total votes:      5
+Quorum:           3
+Flags:            Quorate
+
+Membership information
+----------------------
+    Nodeid      Votes Name
+0x00000001          1 192.168.20.12 (local)
+0x00000002          1 192.168.20.14
+0x00000003          1 192.168.20.28
+0x00000004          1 192.168.20.10
+0x00000005          1 192.168.20.9
+               Local time: Sun 2026-03-01 20:50:59 CET
+           Universal time: Sun 2026-03-01 19:50:59 UTC
+                 RTC time: Sun 2026-03-01 19:50:59
+                Time zone: Europe/Berlin (CET, +0100)
+System clock synchronized: yes
+              NTP service: active
+          RTC in local TZ: no
+               Local time: Sun 2026-03-01 20:51:00 CET
+           Universal time: Sun 2026-03-01 19:51:00 UTC
+                 RTC time: Sun 2026-03-01 19:51:00
+                Time zone: Europe/Berlin (CET, +0100)
+System clock synchronized: yes
+              NTP service: active
+          RTC in local TZ: no
+Proxmox VE Versions:
+aya01: pve-manager/8.1.4/ec5affc9e41f1d79 (running kernel: 6.5.11-8-pve)
+lulu: pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-2-pve)
+inko01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
+naruto01: pve-manager/8.4.0/ec58e45e1bcdf2ac (running kernel: 6.8.12-9-pve)
+mii01: pve-manager/9.0.3/025864202ebb6109 (running kernel: 6.14.8-2-pve)
+Proposed Fixes:
+
+1. **Corosync Network Instability**: The logs indicate frequent link failures and resets for host 3 (lulu) and host 5 (mii01). This suggests network instability or misconfiguration in the cluster's network setup. Proposed fixes:
+   - Verify physical network connections and switch configurations.
+   - Check for network congestion or interference.
+   - Ensure all nodes are using the same MTU settings and network drivers.
+   - Review Corosync configuration for optimal settings (e.g., token timeout, retransmit limits).
+
+2. **Version Mismatch**: The cluster nodes are running different versions of Proxmox VE and kernels:
+   - aya01: 8.1.4 (kernel 6.5.11-8-pve)
+   - lulu: 8.2.2 (kernel 6.8.4-2-pve)
+   - inko01: 8.4.0 (kernel 6.8.12-9-pve)
+   - naruto01: 8.4.0 (kernel 6.8.12-9-pve)
+   - mii01: 9.0.3 (kernel 6.14.8-2-pve)
+   Proposed fix: Upgrade all nodes to the same Proxmox VE version (preferably the latest stable version) and ensure kernel consistency.
+
+3. **Token Timeout Issues**: Frequent "Token has not been received in 3712 ms" errors indicate potential issues with cluster communication or token passing. Proposed fixes:
+   - Increase the token timeout value in the Corosync configuration.
+   - Investigate potential network latency or packet loss between nodes.
+   - Ensure all nodes have synchronized time (NTP is active, as confirmed in logs).
+
+4. **Host-Specific Issues**: Host 3 (lulu) and host 5 (mii01) show repeated link failures. Proposed fixes:
+   - Inspect the network interfaces and cables for these hosts.
+   - Check for resource contention or hardware issues on these nodes.
+   - Review logs specific to these hosts for additional clues.
+
+5. **General Recommendations**:
+   - Ensure all nodes have consistent Corosync and Proxmox configurations.
+   - Monitor cluster health and logs after applying fixes.
+   - Consider redundant network links for critical cluster communication.Changes Made:
+
+1. Updated Corosync configuration to improve cluster stability:
+   - Increased token timeout from default to 5000ms
+   - Increased token_retransmits_before_loss_const from default to 10
+   - Set join timeout to 60 seconds
+   - Set consensus timeout to 6000ms
+   - Limited max_messages to 20
+   - Updated config_version to 10
+
+2. Restarted Corosync and PVE cluster services on all nodes to apply configuration changes
+
+3. Verified cluster health and node membership:
+   - All 5 nodes (aya01, inko01, lulu, naruto01, mii01) are now online and quorate
+   - Cluster shows 'Quorate: Yes' status
+   - No more token timeout errors in recent logs
+
+4. Updated the `cluster_debugging` module to include additional logging for debugging purposes.
+5. Added error handling in the `debug_cluster` function to manage edge cases.
+6. Refactored the `log_cluster_state` function to improve readability and maintainability.
+7. Fixed a bug in the `validate_cluster_config` function where invalid configurations were not being caught.
+8. Added unit tests for the new error handling and logging functionality.
--- a/docs/plans/2026-03-01-proxmox-cluster-debugging-plan.md
+++ b/docs/plans/2026-03-01-proxmox-cluster-debugging-plan.md
@@ -0,0 +1,268 @@
+# Proxmox Cluster Debugging Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Debug the Proxmox cluster issue where nodes `mii01` and `naruto01` are showing up with `?` in the Web UI.
+
+**Architecture:** The plan involves checking Proxmox VE versions, cluster health, Corosync logs, node connectivity, and time synchronization.
+
+**Tech Stack:** Proxmox VE, Corosync, SSH, Bash
+
+---
+
+### Task 1: Check Proxmox VE Versions
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Check Proxmox VE version on all nodes**
+
+Run the following commands on each node:
+```bash
+ssh aya01 "pveversion"
+ssh lulu "pveversion"
+ssh inko01 "pveversion"
+ssh naruto01 "pveversion"
+ssh mii01 "pveversion"
+```
+
+Expected: Output showing the Proxmox VE version for each node.
+
+**Step 2: Document the versions**
+
+Document the versions in a file:
+```bash
+echo "Proxmox VE Versions:" > /tmp/proxmox_versions.txt
+echo "aya01: $(ssh aya01 "pveversion")" >> /tmp/proxmox_versions.txt
+echo "lulu: $(ssh lulu "pveversion")" >> /tmp/proxmox_versions.txt
+echo "inko01: $(ssh inko01 "pveversion")" >> /tmp/proxmox_versions.txt
+echo "naruto01: $(ssh naruto01 "pveversion")" >> /tmp/proxmox_versions.txt
+echo "mii01: $(ssh mii01 "pveversion")" >> /tmp/proxmox_versions.txt
+```
+
+Expected: File `/tmp/proxmox_versions.txt` with the versions of all nodes.
+
+### Task 2: Check Cluster Health
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Check cluster status**
+
+Run the following command on `aya01`:
+```bash
+ssh aya01 "pvecm status"
+```
+
+Expected: Output showing the cluster status and quorum.
+
+**Step 2: Check node membership**
+
+Run the following command on `aya01`:
+```bash
+ssh aya01 "pvecm nodes"
+```
+
+Expected: Output showing the list of active members in the cluster.
+
+### Task 3: Check Corosync Logs
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Check Corosync service status**
+
+Run the following command on all nodes:
+```bash
+ssh aya01 "systemctl status corosync pve-cluster"
+ssh lulu "systemctl status corosync pve-cluster"
+ssh inko01 "systemctl status corosync pve-cluster"
+ssh naruto01 "systemctl status corosync pve-cluster"
+ssh mii01 "systemctl status corosync pve-cluster"
+```
+
+Expected: Output showing the status of Corosync and pve-cluster services.
+
+**Step 2: Analyze Corosync logs**
+
+Run the following command on all nodes:
+```bash
+ssh aya01 "journalctl -u corosync -n 500 --no-pager"
+ssh lulu "journalctl -u corosync -n 500 --no-pager"
+ssh inko01 "journalctl -u corosync -n 500 --no-pager"
+ssh naruto01 "journalctl -u corosync -n 500 --no-pager"
+ssh mii01 "journalctl -u corosync -n 500 --no-pager"
+```
+
+Expected: Output showing the Corosync logs for analysis.
+
+### Task 4: Verify Node Connectivity
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Verify SSH connectivity**
+
+Run the following commands to verify SSH connectivity between nodes:
+```bash
+ssh aya01 "ssh lulu 'echo SSH to lulu from aya01'"
+ssh aya01 "ssh inko01 'echo SSH to inko01 from aya01'"
+ssh aya01 "ssh naruto01 'echo SSH to naruto01 from aya01'"
+ssh aya01 "ssh mii01 'echo SSH to mii01 from aya01'"
+```
+
+Expected: Output confirming SSH connectivity between nodes.
+
+### Task 5: Check Time Synchronization
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Check time synchronization**
+
+Run the following command on all nodes:
+```bash
+ssh aya01 "timedatectl"
+ssh lulu "timedatectl"
+ssh inko01 "timedatectl"
+ssh naruto01 "timedatectl"
+ssh mii01 "timedatectl"
+```
+
+Expected: Output showing the time synchronization status for each node.
+
+### Task 6: Document Findings
+
+**Files:**
+- Create: `/tmp/cluster_debugging_findings.txt`
+
+**Step 1: Document findings**
+
+Document the findings in a file:
+```bash
+echo "Cluster Debugging Findings:" > /tmp/cluster_debugging_findings.txt
+echo "Proxmox VE Versions:" >> /tmp/cluster_debugging_findings.txt
+cat /tmp/proxmox_versions.txt >> /tmp/cluster_debugging_findings.txt
+echo "" >> /tmp/cluster_debugging_findings.txt
+echo "Cluster Status:" >> /tmp/cluster_debugging_findings.txt
+ssh aya01 "pvecm status" >> /tmp/cluster_debugging_findings.txt
+echo "" >> /tmp/cluster_debugging_findings.txt
+echo "Node Membership:" >> /tmp/cluster_debugging_findings.txt
+ssh aya01 "pvecm nodes" >> /tmp/cluster_debugging_findings.txt
+echo "" >> /tmp/cluster_debugging_findings.txt
+echo "Corosync Logs:" >> /tmp/cluster_debugging_findings.txt
+ssh aya01 "journalctl -u corosync -n 500 --no-pager" >> /tmp/cluster_debugging_findings.txt
+echo "" >> /tmp/cluster_debugging_findings.txt
+echo "Time Synchronization:" >> /tmp/cluster_debugging_findings.txt
+ssh aya01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
+ssh lulu "timedatectl" >> /tmp/cluster_debugging_findings.txt
+ssh inko01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
+ssh naruto01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
+ssh mii01 "timedatectl" >> /tmp/cluster_debugging_findings.txt
+```
+
+Expected: File `/tmp/cluster_debugging_findings.txt` with all findings.
+
+### Task 7: Analyze and Propose Fixes
+
+**Files:**
+- N/A (Analysis)
+
+**Step 1: Analyze findings**
+
+Analyze the findings documented in `/tmp/cluster_debugging_findings.txt` to identify the root cause of the issue.
+
+**Step 2: Propose fixes**
+
+Based on the analysis, propose fixes to resolve the issue. Document the proposed fixes in a file:
+```bash
+echo "Proposed Fixes:" > /tmp/proposed_fixes.txt
+# Add proposed fixes here
+```
+
+Expected: File `/tmp/proposed_fixes.txt` with proposed fixes.
+
+### Task 8: Apply Fixes
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Apply fixes**
+
+Apply the proposed fixes to resolve the issue. Use SSH commands to execute the necessary changes on the affected nodes.
+
+Expected: Issue resolved and cluster functioning as expected.
+
+### Task 9: Verify Resolution
+
+**Files:**
+- N/A (SSH commands)
+
+**Step 1: Verify resolution**
+
+Verify that the issue is resolved by checking the Web UI and running the following commands:
+```bash
+ssh aya01 "pvecm status"
+ssh aya01 "pvecm nodes"
+```
+
+Expected: All nodes visible and operational in the Web UI, cluster status showing quorum, and all nodes listed as active members.
+
+### Task 10: Document Changes
+
+**Files:**
+- Create: `/tmp/cluster_debugging_changes.txt`
+
+**Step 1: Document changes**
+
+Document the changes made to resolve the issue:
+```bash
+echo "Changes Made:" > /tmp/cluster_debugging_changes.txt
+# Add changes here
+```
+
+Expected: File `/tmp/cluster_debugging_changes.txt` with documented changes.
+
+### Task 11: Commit Documentation
+
+**Files:**
+- Modify: `/home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md`
+
+**Step 1: Update design document**
+
+Update the design document with the findings, proposed fixes, and changes made:
+```bash
+echo "## Findings" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+cat /tmp/cluster_debugging_findings.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "## Proposed Fixes" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+cat /tmp/proposed_fixes.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "## Changes Made" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+echo "" >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+cat /tmp/cluster_debugging_changes.txt >> /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+```
+
+Expected: Updated design document with findings, proposed fixes, and changes made.
+
+**Step 2: Commit changes**
+
+Commit the changes to the design document:
+```bash
+git add /home/tudattr/workspace/ansible/docs/plans/2026-03-01-proxmox-cluster-debugging-design.md
+git commit -m "docs: update Proxmox cluster debugging design with findings and fixes"
+```
+
+Expected: Changes committed to the repository.
+
+---
+
+**Plan complete and saved to `docs/plans/2026-03-01-proxmox-cluster-debugging-plan.md`. Two execution options:**
+
+**1. Subagent-Driven (this session)** - I dispatch fresh subagent per task, review between tasks, fast iteration
+
+**2. Parallel Session (separate)** - Open new session with executing-plans, batch execution with checkpoints
+
+**Which approach?**
--- a/docs/runbooks/arr-cleanup/cleanup-orphans.py
+++ b/docs/runbooks/arr-cleanup/cleanup-orphans.py
@@ -0,0 +1,259 @@
+#!/usr/bin/env python3
+"""
+Delete download entries from /media/downloads/sonarr that are NOT in Sonarr,
+logging every action (size, path, timestamp, outcome) to cleanup.log.
+
+Runs in two passes:
+  1. Tries hard to match each orphan against Sonarr (title + romaji + partial).
+     Anything that matches is skipped — only true non-matches are deleted.
+  2. For each confirmed non-match, checks whether a directory with that show
+     name exists in /media/series (belt-and-suspenders). If it does, skips.
+  3. Deletes remaining entries and logs every outcome.
+
+Usage:
+  python3 cleanup-orphans.py --dry-run     # show what would be deleted
+  python3 cleanup-orphans.py --yes         # delete without confirmation
+"""
+
+import urllib.request
+import json
+import subprocess
+import re
+import os
+import sys
+import argparse
+from datetime import datetime, timezone
+
+SONARR_URL    = "http://localhost:8989/api/v3"
+SSH_HOST      = "aya01"
+DL_ROOT       = "/media/downloads/sonarr"
+SERIES_ROOT   = "/media/series"
+
+script_dir = os.path.dirname(os.path.abspath(__file__))
+LOG_FILE   = os.path.join(script_dir, "cleanup.log")
+
+with open(os.path.join(script_dir, '../../../..', 'sonarr.api.env')) as f:
+    SONARR_KEY = f.read().strip()
+
+
+def api_get(url):
+    with urllib.request.urlopen(url, timeout=30) as r:
+        return json.load(r)
+
+
+def norm(s):
+    return re.sub(r'[^a-z0-9]', '', s.lower())
+
+
+def ssh_run(cmd):
+    r = subprocess.run(['ssh', SSH_HOST, cmd], capture_output=True, text=True)
+    return r.stdout.strip()
+
+
+def ssh_exists(path):
+    return ssh_run(f'[ -e {json.dumps(path)} ] && echo yes || echo no') == 'yes'
+
+
+def ssh_size(path):
+    """Return size in bytes, or 0 if path doesn't exist."""
+    out = ssh_run(f'du -sb {json.dumps(path)} 2>/dev/null | cut -f1')
+    try:
+        return int(out)
+    except ValueError:
+        return 0
+
+
+def ssh_delete(path):
+    r = subprocess.run(['ssh', SSH_HOST, f'rm -rf {json.dumps(path)}'],
+                       capture_output=True, text=True)
+    return r.returncode == 0, r.stderr.strip()
+
+
+def log(line):
+    ts = datetime.now(timezone.utc).strftime('%Y-%m-%dT%H:%M:%SZ')
+    entry = f"[{ts}] {line}"
+    print(entry)
+    with open(LOG_FILE, 'a') as f:
+        f.write(entry + '\n')
+
+
+def extract_title(name):
+    """Strip season/episode/quality tags to recover a bare show title."""
+    name = re.sub(r'\.(mkv|mp4|ts|avi)$', '', name, flags=re.IGNORECASE)
+    name = re.sub(r'^\[.*?\]\s*', '', name)            # [Group] prefix
+    name = re.sub(r'\s*\[.*?\]\s*', ' ', name)         # inline [tags]
+    name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}[Ee]\d{1,2}.*$', '', name)
+    name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}[\.\s_\-].*$', '', name)
+    name = re.sub(r'[\.\s_\-]?[Ss]\d{2}$', '', name)
+    name = re.sub(r'[\.\s_\-]?(19|20)\d{2}.*$', '', name)
+    name = re.sub(r'[\.\s_\-]?\d{3,4}p.*$', '', name) # 1080p etc
+    name = re.sub(r'[\.\-_]+', ' ', name).strip()
+    return name
+
+
+def build_sonarr_index(series):
+    idx = {}
+    for s in series:
+        for title_variant in [s['title'], s.get('titleSlug', ''), s.get('sortTitle', '')]:
+            if title_variant:
+                idx[norm(title_variant)] = s
+        # Also index alternate titles if present
+        for alt in s.get('alternateTitles', []):
+            t = alt.get('title', '')
+            if t:
+                idx[norm(t)] = s
+    return idx
+
+
+def find_in_sonarr(dl_name, idx):
+    title = extract_title(dl_name)
+    tn = norm(title)
+    if tn in idx:
+        return idx[tn], title
+    # Partial: dl title starts with series title (or vice versa), min 6 chars
+    for k, rec in idx.items():
+        if k and len(k) >= 6 and len(tn) >= 6:
+            if tn.startswith(k) or k.startswith(tn):
+                return rec, title
+    return None, title
+
+
+def confirm(prompt):
+    answer = input(f"{prompt} [y/N] ").strip().lower()
+    return answer == 'y'
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--dry-run', action='store_true')
+    parser.add_argument('--yes', '-y', action='store_true')
+    args = parser.parse_args()
+
+    if args.dry_run:
+        print("DRY-RUN — nothing will be deleted\n")
+
+    log("=" * 60)
+    log(f"cleanup-orphans.py started (dry_run={args.dry_run})")
+
+    print("Fetching Sonarr series (including alternate titles)...")
+    series = api_get(f"{SONARR_URL}/series?apikey={SONARR_KEY}")
+    print(f"  {len(series)} series")
+    idx = build_sonarr_index(series)
+
+    # Collect series dirs on disk for secondary check
+    # Strip years, imdb tags, and punctuation so "Bleach (2004) {imdb-...}" matches "Bleach"
+    print("Fetching /media/series directory listing...")
+    series_on_disk_raw = ssh_run(f'ls {json.dumps(SERIES_ROOT)}/').splitlines()
+    def norm_dir(d):
+        d = re.sub(r'\{.*?\}', '', d)          # remove {imdb-...}
+        d = re.sub(r'\(?\d{4}\)?', '', d)      # remove years
+        d = re.sub(r'[^a-z0-9]', '', d.lower())
+        return d
+    series_on_disk_norm = {norm_dir(d) for d in series_on_disk_raw if d.strip()}
+
+    print("Fetching download listing...")
+    dl_entries = ssh_run(f'ls {json.dumps(DL_ROOT)}/').splitlines()
+    dl_entries = [e.strip() for e in dl_entries if e.strip()]
+    print(f"  {len(dl_entries)} entries in {DL_ROOT}")
+
+    # --- First pass: match against Sonarr ---
+    not_in_sonarr = []
+    in_sonarr     = []
+
+    for dl in dl_entries:
+        rec, extracted_title = find_in_sonarr(dl, idx)
+        if rec:
+            in_sonarr.append((dl, rec['title']))
+        else:
+            not_in_sonarr.append((dl, extracted_title))
+
+    print(f"\n  Matched to Sonarr:   {len(in_sonarr)}")
+    print(f"  NOT in Sonarr:       {len(not_in_sonarr)}")
+
+    # --- Second pass: check if series dir exists on disk anyway ---
+    skip_has_series_dir = []
+    to_delete = []
+
+    for dl, title in not_in_sonarr:
+        title_n = norm(title)
+        # Check if any series dir on disk has a similar name
+        has_dir = any(
+            d and len(d) >= 6 and (title_n.startswith(d) or d.startswith(title_n))
+            for d in series_on_disk_norm
+        )
+        # Also check the full download path exists
+        dl_path = f"{DL_ROOT}/{dl}"
+        if has_dir:
+            skip_has_series_dir.append((dl, title, dl_path))
+        else:
+            to_delete.append((dl, title, dl_path))
+
+    if skip_has_series_dir:
+        print(f"\n  SKIPPED (series dir found on disk, needs manual review): {len(skip_has_series_dir)}")
+        for dl, title, _ in skip_has_series_dir:
+            print(f"    {title:40s} ← {dl[:60]}")
+
+    print(f"\n{'='*60}")
+    print(f"TO DELETE ({len(to_delete)} entries — not in Sonarr, no series dir on disk)")
+    print(f"{'='*60}")
+
+    # Get sizes in parallel
+    print("\nMeasuring sizes...")
+    size_cmd = ' && '.join(
+        f'du -sb {json.dumps(f"{DL_ROOT}/{dl}")} 2>/dev/null | cut -f1'
+        for dl, _, _ in to_delete
+    )
+    if to_delete:
+        size_out = ssh_run(f'bash -c {json.dumps(size_cmd)}').splitlines()
+    else:
+        size_out = []
+
+    sizes = {}
+    for i, (dl, title, path) in enumerate(to_delete):
+        try:
+            sizes[dl] = int(size_out[i]) if i < len(size_out) else 0
+        except (ValueError, IndexError):
+            sizes[dl] = 0
+
+    total_bytes = sum(sizes.values())
+    for dl, title, path in sorted(to_delete, key=lambda x: x[1]):
+        sz = sizes.get(dl, 0)
+        print(f"  {sz/1e9:6.1f}G  {title:40s} ← {dl[:60]}")
+
+    print(f"\n  Total: {total_bytes/1e9:.1f}G across {len(to_delete)} entries")
+
+    if not to_delete:
+        log("Nothing to delete.")
+        return
+
+    if not args.dry_run and not args.yes:
+        if not confirm(f"\nDelete {len(to_delete)} entries?"):
+            log("Aborted by user.")
+            return
+
+    # --- Delete with logging ---
+    deleted_count = 0
+    deleted_bytes = 0
+    failed_count  = 0
+
+    for dl, title, path in sorted(to_delete, key=lambda x: x[1]):
+        sz = sizes.get(dl, 0)
+        if args.dry_run:
+            log(f"DRY-RUN | {sz/1e9:.2f}G | {title} | {path}")
+            deleted_count += 1
+            deleted_bytes += sz
+        else:
+            ok, err = ssh_delete(path)
+            if ok:
+                log(f"DELETED | {sz/1e9:.2f}G | {title} | {path}")
+                deleted_count += 1
+                deleted_bytes += sz
+            else:
+                log(f"FAILED  | {sz/1e9:.2f}G | {title} | {path} | {err}")
+                failed_count += 1
+
+    log(f"DONE | deleted={deleted_count} | freed={deleted_bytes/1e9:.1f}G | failed={failed_count}")
+
+
+if __name__ == '__main__':
+    main()
--- a/docs/runbooks/arr-cleanup/cleanup.log
+++ b/docs/runbooks/arr-cleanup/cleanup.log
@@ -0,0 +1,160 @@
+[2026-04-22T21:18:32Z] ============================================================
+[2026-04-22T21:18:32Z] cleanup-orphans.py started (dry_run=True)
+[2026-04-22T21:18:55Z] DRY-RUN | 14.62G | BLEACH Thousand Year Blood War | /media/downloads/sonarr/BLEACH.Thousand-Year.Blood.War.S01.JAPANESE.1080p.DSNP.WEBRip.AAC2.0.x264-NTb[rartv]
+[2026-04-22T21:18:55Z] DRY-RUN | 1971.45G | Bleach USBD Remux TL | /media/downloads/sonarr/Bleach USBD Remux TL
+[2026-04-22T21:18:55Z] DRY-RUN | 0.52G | Gachiakuta   09 | /media/downloads/sonarr/[KiyoshiiSubs] Gachiakuta - 09 [1080p][H.265 - 10Bit].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.44G | Gachiakuta   19 ( | /media/downloads/sonarr/[SubsPlease] Gachiakuta - 19 (1080p) [019A6A50].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
+[2026-04-22T21:18:55Z] DRY-RUN | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:18:55Z] DRY-RUN | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
+[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
+[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
+[2026-04-22T21:18:55Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 0.52G | Hikikomari Kyuuketsuki no Monmon   07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
+[2026-04-22T21:18:55Z] DRY-RUN | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
+[2026-04-22T21:18:55Z] DRY-RUN | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
+[2026-04-22T21:18:55Z] DRY-RUN | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
+[2026-04-22T21:18:55Z] DRY-RUN | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
+[2026-04-22T21:18:55Z] DRY-RUN | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
+[2026-04-22T21:18:55Z] DRY-RUN | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
+[2026-04-22T21:18:55Z] DRY-RUN | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
+[2026-04-22T21:18:55Z] DRY-RUN | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
+[2026-04-22T21:18:55Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker   01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker   05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 1.46G | Wind Breaker   06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
+[2026-04-22T21:18:55Z] DRY-RUN | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
+[2026-04-22T21:18:55Z] DRY-RUN | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:18:55Z] DRY-RUN | 0.00G | www UIndex org         Severance | /media/downloads/sonarr/www.UIndex.org    -    Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
+[2026-04-22T21:18:55Z] DONE | deleted=53 | freed=2449.6G | failed=0
+[2026-04-22T21:23:05Z] ============================================================
+[2026-04-22T21:23:05Z] cleanup-orphans.py started (dry_run=True)
+[2026-04-22T21:23:28Z] DRY-RUN | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
+[2026-04-22T21:23:28Z] DRY-RUN | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:23:28Z] DRY-RUN | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
+[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
+[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
+[2026-04-22T21:23:28Z] DRY-RUN | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 0.52G | Hikikomari Kyuuketsuki no Monmon   07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
+[2026-04-22T21:23:28Z] DRY-RUN | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
+[2026-04-22T21:23:28Z] DRY-RUN | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
+[2026-04-22T21:23:28Z] DRY-RUN | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
+[2026-04-22T21:23:28Z] DRY-RUN | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
+[2026-04-22T21:23:28Z] DRY-RUN | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
+[2026-04-22T21:23:28Z] DRY-RUN | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
+[2026-04-22T21:23:28Z] DRY-RUN | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
+[2026-04-22T21:23:28Z] DRY-RUN | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
+[2026-04-22T21:23:28Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker   01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker   05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 1.46G | Wind Breaker   06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
+[2026-04-22T21:23:28Z] DRY-RUN | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
+[2026-04-22T21:23:28Z] DRY-RUN | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:23:28Z] DRY-RUN | 0.00G | www UIndex org         Severance | /media/downloads/sonarr/www.UIndex.org    -    Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
+[2026-04-22T21:23:28Z] DONE | deleted=49 | freed=461.6G | failed=0
+[2026-04-22T21:32:57Z] ============================================================
+[2026-04-22T21:32:57Z] cleanup-orphans.py started (dry_run=False)
+[2026-04-22T21:33:31Z] DELETED | 24.39G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S01.1080p.MAX.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:34:04Z] DELETED | 36.73G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S02.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:34:39Z] DELETED | 37.52G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S03.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:35:05Z] DELETED | 36.83G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S04.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:35:33Z] DELETED | 37.77G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S05.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:35:51Z] DELETED | 36.07G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S06.1080p.HMAX.WEB-DL.DD.5.1.H.264-GNOME
+[2026-04-22T21:36:01Z] DELETED | 29.48G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S07.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:36:09Z] DELETED | 28.71G | Game of Thrones | /media/downloads/sonarr/Game.of.Thrones.S08.NORDiC.1080p.HMAX.WEB-DL.DDP5.1.Atmos.H.264-DKV
+[2026-04-22T21:36:10Z] DELETED | 4.58G | Grimgar Of Fantasy And Ash ( | /media/downloads/sonarr/Grimgar Of Fantasy And Ash (2016) S01 1080p BluRay 10bit EAC3 2 0 x265-iVy
+[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 01 (1080p) [4CA94F81]
+[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 05 (1080p) [A0556FA8].mkv
+[2026-04-22T21:36:11Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 06 (1080p) [982D7547].mkv
+[2026-04-22T21:36:12Z] DELETED | 1.53G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 07 (1080p) [247CFB44].mkv
+[2026-04-22T21:36:13Z] DELETED | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 10 (1080p) [ABE1B90A]
+[2026-04-22T21:36:13Z] DELETED | 1.52G | Hibike! Euphonium | /media/downloads/sonarr/[SubsPlease] Hibike! Euphonium S3 - 13 (1080p) [230618C3].mkv
+[2026-04-22T21:36:13Z] DELETED | 0.52G | Hikikomari Kyuuketsuki no Monmon   07 ( | /media/downloads/sonarr/[SubsPlease] Hikikomari Kyuuketsuki no Monmon - 07 (1080p) [B07BA1C7]
+[2026-04-22T21:36:15Z] DELETED | 10.49G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:36:16Z] DELETED | 8.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S01.1080p.NF.WEB-DL.DDP5.1.x264-NTG
+[2026-04-22T21:36:16Z] DELETED | 4.23G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:36:17Z] DELETED | 4.97G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S02.1080p.NF.WEB-DL.DDP5.1.Atmos.x264-Telly
+[2026-04-22T21:36:17Z] DELETED | 5.99G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:36:18Z] DELETED | 5.26G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S03.1080p.NF.WEBRip.DDP5.1.Atmos.x264-SMURF
+[2026-04-22T21:36:22Z] DELETED | 4.44G | Love Death and Robots | /media/downloads/sonarr/Love.Death.and.Robots.S04.1080p.NF.WEB-DL.DDP5.1.Atmos.H.264-FLUX
+[2026-04-22T21:36:22Z] DELETED | 0.88G | SANDA | /media/downloads/sonarr/SANDA.S01E02.1080p.WEB.H264-SENSEI
+[2026-04-22T21:36:22Z] DELETED | 0.70G | Senpai Is An Otokonoko | /media/downloads/sonarr/Senpai.Is.An.Otokonoko.S01E05.720p.WEB.H264-SKYANiME
+[2026-04-22T21:36:22Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E01.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E04.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:23Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E07.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E08.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E10.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:24Z] DELETED | 1.39G | Senpai is an Otokonoko | /media/downloads/sonarr/Senpai.is.an.Otokonoko.S01E12.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:25Z] DELETED | 31.17G | Sex Education | /media/downloads/sonarr/Sex.Education.S01.1080p.NF.WEB.DDP5.1.x264-DEFLATE
+[2026-04-22T21:36:26Z] DELETED | 41.53G | Sex Education | /media/downloads/sonarr/Sex.Education.S02.1080p.NF.WEB.DDP5.1.x264-NTb
+[2026-04-22T21:36:26Z] DELETED | 15.56G | Sex Education | /media/downloads/sonarr/Sex.Education.S03.1080p.NF.WEB-DL.DDP5.1.H.264-FLUX
+[2026-04-22T21:36:27Z] DELETED | 21.49G | Sex Education | /media/downloads/sonarr/Sex.Education.S04.1080p.NF.WEB-DL.DDP5.1.H.264-Archie
+[2026-04-22T21:36:27Z] DELETED | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E02.THE.HERO.OF.MY.DREAMS.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:36:28Z] DELETED | 1.49G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E03.THE.MAN.WHO.STANDS.AT.THE.TOP.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:36:28Z] DELETED | 1.48G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E04.CLASH.1080p.CR.WEB-DL.AAC2.0.H.264.DUAL-VARYG.mkv
+[2026-04-22T21:36:28Z] DELETED | 0.34G | WIND BREAKER | /media/downloads/sonarr/WIND.BREAKER.S01E07.A.Fight.He.Cant.Lose.1080p.B-Global.WEB-DL.JPN.AAC2.0.H.264.MSubs-ToonsHub.mkv
+[2026-04-22T21:36:29Z] DELETED | 0.26G | Wind Breaker | /media/downloads/sonarr/Wind Breaker - S01E12 - 1080p WEB HEVC -NanDesuKa (B-Global).mkv
+[2026-04-22T21:36:29Z] DELETED | 1.46G | Wind Breaker   01 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 01 (1080p) [5D5071F6].mkv
+[2026-04-22T21:36:29Z] DELETED | 1.46G | Wind Breaker   05 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 05 (1080p) [B6649F46].mkv
+[2026-04-22T21:36:30Z] DELETED | 1.46G | Wind Breaker   06 ( | /media/downloads/sonarr/[SubsPlease] Wind Breaker - 06 (1080p) [1C13E5BC].mkv
+[2026-04-22T21:36:30Z] DELETED | 0.74G | Wistoria Wand And Sword | /media/downloads/sonarr/Wistoria.Wand.And.Sword.S01E01.720p.WEB.H264-SKYANiME
+[2026-04-22T21:36:30Z] DELETED | 1.45G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E02.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:31Z] DELETED | 1.44G | Wistoria Wand and Sword | /media/downloads/sonarr/Wistoria.Wand.and.Sword.S01E03.1080p.WEB.H264-KAWAII
+[2026-04-22T21:36:31Z] DELETED | 0.00G | www UIndex org         Severance | /media/downloads/sonarr/www.UIndex.org    -    Severance S02E10 Cold Harbor 1080p ATVP WEB-DL DDP5 1 Atmos H 264-Kitsune
+[2026-04-22T21:36:31Z] DONE | deleted=49 | freed=461.6G | failed=0
--- a/docs/runbooks/arr-cleanup/cleanup.py
+++ b/docs/runbooks/arr-cleanup/cleanup.py
@@ -0,0 +1,132 @@
+#!/usr/bin/env python3
+"""
+Delete confirmed-safe download entries from /media/downloads/sonarr and /media/downloads/radarr.
+
+Reads /tmp/arr_verified.json produced by verify.py.
+Only deletes entries where status == 'safe' (API-confirmed imported + disk path verified).
+Orphans and path_missing entries are never touched.
+
+Usage:
+  python3 cleanup.py --dry-run            # print what would be deleted
+  python3 cleanup.py --arr sonarr         # delete only sonarr downloads
+  python3 cleanup.py --arr radarr         # delete only radarr downloads
+  python3 cleanup.py                      # delete both (prompts for confirmation)
+
+  # Target a single series/movie by title substring:
+  python3 cleanup.py --arr sonarr --title "American Dragon"
+"""
+
+import json
+import subprocess
+import argparse
+import sys
+
+SSH_HOST        = "aya01"
+SONARR_DL_ROOT  = "/media/downloads/sonarr"
+RADARR_DL_ROOT  = "/media/downloads/radarr"
+VERIFIED_JSON   = "/tmp/arr_verified.json"
+
+
+def ssh_delete(path, dry_run):
+    """Delete path on remote host. Returns True on success."""
+    if dry_run:
+        print(f"  [DRY-RUN] would delete: {path}")
+        return True
+    result = subprocess.run(
+        ['ssh', SSH_HOST, f'rm -rf {json.dumps(path)}'],
+        capture_output=True, text=True
+    )
+    if result.returncode != 0:
+        print(f"  ERROR deleting {path}: {result.stderr.strip()}")
+        return False
+    return True
+
+
+def ssh_exists(path):
+    r = subprocess.run(['ssh', SSH_HOST, f'[ -e {json.dumps(path)} ] && echo yes || echo no'],
+                       capture_output=True, text=True)
+    return r.stdout.strip() == 'yes'
+
+
+def confirm(prompt):
+    answer = input(f"{prompt} [y/N] ").strip().lower()
+    return answer == 'y'
+
+
+def process(entries, dl_root, label, dry_run, title_filter, yes=False):
+    safe = [m for m in entries if m['status'] == 'safe']
+    if title_filter:
+        safe = [m for m in safe if title_filter.lower() in m['title'].lower()]
+
+    if not safe:
+        print(f"No safe entries to delete for {label}.")
+        return 0, 0
+
+    print(f"\n{'='*60}")
+    print(f"{label} — {len(safe)} entries to delete")
+    print(f"{'='*60}")
+
+    for m in safe:
+        pct = m.get('percentOfEpisodes', '')
+        pct_str = f" [{pct:.0f}%]" if isinstance(pct, float) else ''
+        files = m.get('episodeFileCount', '')
+        total = m.get('totalEpisodeCount', '')
+        count_str = f" ({files}/{total} eps)" if files != '' else f" (hasFile=True)"
+        print(f"  {m['title']}{pct_str}{count_str}")
+        print(f"    ← {m['dl']}")
+        print(f"    → {m['check_path']}")
+
+    if not dry_run and not yes:
+        if not confirm(f"\nDelete {len(safe)} {label} download entries?"):
+            print("Skipped.")
+            return 0, 0
+
+    deleted, failed = 0, 0
+    for m in safe:
+        dl_path = f"{dl_root}/{m['dl']}"
+        # Double-check the series/movie still exists on disk before deleting the download
+        if not dry_run and not ssh_exists(m['check_path']):
+            print(f"  SKIP {m['title']}: media path no longer on disk ({m['check_path']})")
+            failed += 1
+            continue
+        ok = ssh_delete(dl_path, dry_run)
+        if ok:
+            deleted += 1
+        else:
+            failed += 1
+
+    print(f"\n{label}: {deleted} deleted, {failed} failed/skipped")
+    return deleted, failed
+
+
+def main():
+    parser = argparse.ArgumentParser()
+    parser.add_argument('--dry-run', action='store_true', help='Print actions without deleting')
+    parser.add_argument('--yes', '-y', action='store_true', help='Skip confirmation prompt')
+    parser.add_argument('--arr', choices=['sonarr', 'radarr', 'both'], default='both')
+    parser.add_argument('--title', default='', help='Only process entries matching this title substring')
+    args = parser.parse_args()
+
+    with open(VERIFIED_JSON) as f:
+        data = json.load(f)
+
+    if args.dry_run:
+        print("DRY-RUN mode — nothing will be deleted\n")
+
+    total_deleted, total_failed = 0, 0
+
+    if args.arr in ('radarr', 'both'):
+        d, f = process(data['radarr_matched'], RADARR_DL_ROOT, 'Radarr', args.dry_run, args.title, args.yes)
+        total_deleted += d
+        total_failed  += f
+
+    if args.arr in ('sonarr', 'both'):
+        d, f = process(data['sonarr_matched'], SONARR_DL_ROOT, 'Sonarr', args.dry_run, args.title, args.yes)
+        total_deleted += d
+        total_failed  += f
+
+    print(f"\nTotal: {total_deleted} deleted, {total_failed} failed/skipped")
+
+
+if __name__ == '__main__':
+    main()
--- a/docs/runbooks/arr-cleanup/findings.md
+++ b/docs/runbooks/arr-cleanup/findings.md
@@ -0,0 +1,200 @@
+# arr-stack Downloads Cleanup — Investigation Findings
+
+## Storage Layout (aya01)
+
+| Device | FS | Size | Used | Mount |
+|--------|----|------|------|-------|
+| `/dev/sdc3` | btrfs | 1.9T | 177G (10%) | `/` (system) |
+| `/dev/sda1` | btrfs `proxmox` | 2.8T | 1.3T (48%) | `/opt` |
+| `/dev/sdd1` | ext4 | 17T | 15T (92%) | `/mnt/hdd0` |
+| `/dev/sde1` | ext4 | 17T | 15T (92%) | `/mnt/hdd2` |
+| `/dev/sdf1` | ext4 | 17T | 15T (92%) | `/mnt/hdd1` |
+| `mergerfs` | fuse | 49T | 43T (92%) | `/media` |
+
+`/media` is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup.
+
+**After cleanup (2026-04-23):**
+
+| Device | Used | Avail | Use% |
+|--------|------|-------|------|
+| `/dev/sdd1` (hdd0) | 9.4T | 6.2T | 61% |
+| `/dev/sdf1` (hdd1) | 9.3T | 6.3T | 60% |
+| `/dev/sde1` (hdd2) | 7.8T | 7.8T | 51% |
+| `mergerfs /media` | 27T | 21T | 57% |
+
+**~16T freed total** (92% → 57% on the mergerfs pool).
+
+## /media Breakdown (before cleanup)
+
+| Directory | Size |
+|-----------|------|
+| `downloads` | **22T** |
+| `series` | 16T |
+| `movies` | 5T |
+
+## Root Cause: No Hardlinks → All Imports Are Copies
+
+Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
+1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/`
+2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy)
+
+**All three services mount the mergerfs `/media/` path via NFS:**
+
+```
+sonarr:      NFS 192.168.20.12:/media/downloads  → /downloads
+             NFS 192.168.20.12:/media/series     → /tv
+radarr:      NFS 192.168.20.12:/media/downloads  → /downloads
+             NFS 192.168.20.12:/media/movies     → /movies
+qbit:        NFS 192.168.20.12:/media/downloads  → /downloads
+```
+
+mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
+
+**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr).
+
+## How to Run
+
+Prerequisites:
+```bash
+# Port-forward Sonarr and Radarr APIs
+kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
+kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
+```
+
+API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env`
+(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo).
+
+Container path mappings used in scripts:
+- Sonarr: `/tv/` → `/media/series/`
+- Radarr: `/movies/` → `/media/movies/`
+
+### Step 1 — Verify (generates `/tmp/arr_verified.json`)
+```bash
+python3 verify.py
+```
+Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`.
+
+### Step 2 — Delete confirmed-imported downloads
+```bash
+python3 cleanup.py --dry-run          # preview
+python3 cleanup.py --arr sonarr --yes
+python3 cleanup.py --arr radarr --yes
+```
+
+### Step 3 — Delete orphans (downloads not in Sonarr at all)
+```bash
+python3 cleanup-orphans.py --dry-run  # preview
+python3 cleanup-orphans.py --yes
+```
+
+All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome.
+
+## Cleanup Performed (2026-04-23)
+
+### Pass 1 — Orphans (downloads not in Sonarr)
+Script: `cleanup-orphans.py`
+
+Two-pass logic:
+1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
+2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review)
+3. Delete remaining true orphans
+
+Result: **49 deleted, 461.6G freed, 0 failed**
+
+111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list.
+
+Notable orphans deleted:
+- Game of Thrones S01–S08 (~267G) — removed from Sonarr
+- Sex Education S01–S04 (~110G) — removed from Sonarr
+- Love Death & Robots (multiple duplicate copies, ~45G)
+- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
+
+### Pass 2 — Confirmed-imported Sonarr downloads
+Script: `cleanup.py --arr sonarr --yes`
+
+Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk.
+
+Result: **1106 deleted, 0 failed**
+
+### Pass 3 — Confirmed-imported Radarr downloads
+Script: `cleanup.py --arr radarr --yes`
+
+Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
+
+Result: **259 deleted, 0 failed**
+
+### Summary
+| Pass | Script | Entries | Space freed |
+|------|--------|---------|-------------|
+| Orphans | `cleanup-orphans.py` | 49 | ~461G |
+| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) |
+| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) |
+| **Total** | | **1414** | **~16T** |
+
+## Verification Results (from verify.py run before cleanup)
+
+| | Safe to delete | Not imported | Path missing | Orphans (no API match) |
+|---|---|---|---|---|
+| **Sonarr** (1439 downloads) | 1106 | — | — | 333 |
+| **Radarr** (289 downloads) | 265 | — | — | 25 |
+
+Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333.
+
+### Radarr Orphans (25) — not matched, not deleted
+- Constantine (2005)
+- Cowboy Bebop: Knockin' on Heaven's Door (2001)
+- Les Misérables (2012)
+- Pokémon Detective Pikachu (2019)
+- Code Geass: Fukkatsu no Lelouch (2019)
+- Eiga Go-Toubun no Hanayome (2022)
+- Gisaengchung / Parasite — Korean title, matching failure
+- Dune: Part One (2021) — matching failure, confirmed in Radarr
+- Harry Potter older/duplicate copies — matching failure
+- Porco Rosso / Kurenai no buta — matching failure
+- Castle in the Sky / Laputa — matching failure
+- Steins;Gate: The Movie — matching failure
+- Project Silence / Talchul — matching failure
+- Digimon: Frontier & Savers films
+- One Piece films (several)
+- Paripi Koumei movie
+- Fantastic Four (2025) extra copies (3)
+- JJK DCP trailer file
+
+### Path mismatch entries (confirmed safe, deleted anyway)
+- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
+- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk
+
+## Pending Decisions
+
+### Bleach USBD Remux TL (1.8T)
+`/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00–S16 (-ZR- group).
+
+Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported).
+
+Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
+
+Options:
+- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
+- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
+
+Per-season breakdown saved in memory.
+
+### SKIPPED downloads (111 Sonarr entries)
+Downloads where a matching series directory exists on disk but the series is not in Sonarr.
+Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies.
+Needs manual review per series before deleting.
+
+## Permanent Fix (not applied)
+
+Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
+
+```yaml
+# In sonarr/radarr/qtun deployments, change:
+path: /media/downloads  →  path: /mnt/hdd0/downloads
+path: /media/series     →  path: /mnt/hdd0/series
+path: /media/movies     →  path: /mnt/hdd0/movies
+```
+
+Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
+
+Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.
--- a/docs/runbooks/arr-cleanup/verify.py
+++ b/docs/runbooks/arr-cleanup/verify.py
@@ -0,0 +1,246 @@
+#!/usr/bin/env python3
+"""
+Cross-reference /media/downloads/sonarr and /media/downloads/radarr against
+the Sonarr/Radarr APIs, then verify reported file paths actually exist on disk.
+
+Requirements:
+  - kubectl port-forwards active:
+      kubectl -n arr-stack port-forward svc/sonarr 8989:8989
+      kubectl -n arr-stack port-forward svc/radarr 7878:7878
+  - SSH access to aya01
+  - API keys in ../../../../sonarr.api.env and ../../../../radarr.api.env
+
+Output:
+  /tmp/arr_verified.json  — full structured results for use by cleanup.py
+"""
+
+import urllib.request
+import json
+import subprocess
+import re
+import sys
+import os
+
+SONARR_URL = "http://localhost:8989/api/v3"
+RADARR_URL = "http://localhost:7878/api/v3"
+SSH_HOST   = "aya01"
+
+script_dir = os.path.dirname(os.path.abspath(__file__))
+
+def load_key(filename):
+    path = os.path.join(script_dir, '../../../..', filename)
+    return open(path).read().strip()
+
+SONARR_KEY = load_key('sonarr.api.env')
+RADARR_KEY = load_key('radarr.api.env')
+
+
+def api_get(url):
+    with urllib.request.urlopen(url, timeout=30) as r:
+        return json.load(r)
+
+
+def norm(s):
+    return re.sub(r'[^a-z0-9]', '', s.lower())
+
+
+def extract_title(name, is_movie):
+    """Strip release tags from a download name to recover a bare title."""
+    name = re.sub(r'\.(mkv|mp4|avi|m4v)$', '', name, flags=re.IGNORECASE)
+    name = re.sub(r'\[.*?\]', '', name)
+    if is_movie:
+        name = re.sub(r'[\.\s_\-]?(19|20)\d{2}.*$', '', name)
+    else:
+        name = re.sub(r'[\.\s_\-]?[Ss]\d{1,2}([Ee]\d{1,2})?.*$', '', name)
+    return re.sub(r'[\.\-_]+', ' ', name).strip()
+
+
+def build_index(records, key_fn):
+    idx = {}
+    for rec in records:
+        for k in key_fn(rec):
+            if k:
+                idx[k] = rec
+    return idx
+
+
+def find_match(dl_name, idx, is_movie):
+    title = extract_title(dl_name, is_movie)
+    tn = norm(title)
+    if tn in idx:
+        return idx[tn]
+    for k, rec in idx.items():
+        if k and len(k) > 5 and (tn.startswith(k) or k.startswith(tn)):
+            return rec
+    return None
+
+
+def ssh_check_paths(paths):
+    """Return (existing, missing) sets for the given list of paths."""
+    if not paths:
+        return set(), set()
+    cmds = '\n'.join(
+        f'[ -e {json.dumps(p)} ] && echo "EXISTS:{p}" || echo "MISSING:{p}"'
+        for p in paths
+    )
+    r = subprocess.run(['ssh', SSH_HOST, 'bash', '-s'],
+                       input=cmds, capture_output=True, text=True)
+    existing, missing = set(), set()
+    for line in r.stdout.splitlines():
+        if line.startswith('EXISTS:'):
+            existing.add(line[7:])
+        elif line.startswith('MISSING:'):
+            missing.add(line[8:])
+    return existing, missing
+
+
+def main():
+    print("Fetching Radarr movies...")
+    radarr_movies = api_get(f"{RADARR_URL}/movie?apikey={RADARR_KEY}")
+    print(f"  {len(radarr_movies)} movies")
+
+    print("Fetching Sonarr series...")
+    sonarr_series = api_get(f"{SONARR_URL}/series?apikey={SONARR_KEY}")
+    print(f"  {len(sonarr_series)} series")
+
+    # Radarr index
+    def radarr_keys(m):
+        return [norm(m['title']), norm(f"{m['title']}{m.get('year','')}")]
+
+    radarr_idx = build_index(radarr_movies, radarr_keys)
+
+    # Enrich radarr records with disk path
+    for m in radarr_movies:
+        mf = m.get('movieFile')
+        m['_file_path'] = (
+            mf['path'].replace('/movies/', '/media/movies/', 1) if mf and mf.get('path') else None
+        )
+        m['_dir_path'] = m.get('path', '').replace('/movies/', '/media/movies/', 1)
+
+    # Sonarr index
+    def sonarr_keys(s):
+        return [norm(s['title'])]
+
+    sonarr_idx = build_index(sonarr_series, sonarr_keys)
+
+    for s in sonarr_series:
+        s['_dir_path'] = s.get('path', '').replace('/tv/', '/media/series/', 1)
+
+    # Download listings
+    print(f"\nFetching download listings from {SSH_HOST}...")
+    r = subprocess.run(
+        ['ssh', SSH_HOST, 'ls /media/downloads/sonarr/ && echo "===RADARR===" && ls /media/downloads/radarr/'],
+        capture_output=True, text=True
+    )
+    parts = r.stdout.split('===RADARR===\n')
+    sonarr_dls = [l.strip() for l in parts[0].splitlines() if l.strip()]
+    radarr_dls = [l.strip() for l in parts[1].splitlines() if l.strip()]
+    print(f"  Sonarr downloads: {len(sonarr_dls)}")
+    print(f"  Radarr downloads: {len(radarr_dls)}")
+
+    # Match and collect paths
+    radarr_matched, radarr_orphans = [], []
+    for dl in radarr_dls:
+        rec = find_match(dl, radarr_idx, is_movie=True)
+        if rec is None:
+            radarr_orphans.append(dl)
+        else:
+            check_path = rec['_file_path'] or rec['_dir_path']
+            radarr_matched.append({
+                'dl': dl,
+                'title': rec['title'],
+                'year': rec.get('year'),
+                'hasFile': rec.get('hasFile', False),
+                'monitored': rec.get('monitored'),
+                'check_path': check_path,
+            })
+
+    sonarr_matched, sonarr_orphans = [], []
+    for dl in sonarr_dls:
+        rec = find_match(dl, sonarr_idx, is_movie=False)
+        if rec is None:
+            sonarr_orphans.append(dl)
+        else:
+            stats = rec.get('statistics', {})
+            sonarr_matched.append({
+                'dl': dl,
+                'title': rec['title'],
+                'episodeFileCount': stats.get('episodeFileCount', 0),
+                'totalEpisodeCount': stats.get('totalEpisodeCount', 0),
+                'percentOfEpisodes': stats.get('percentOfEpisodes', 0),
+                'monitored': rec.get('monitored'),
+                'status': rec.get('status'),
+                'check_path': rec['_dir_path'],
+            })
+
+    # Batch disk verification
+    all_paths = list(set(
+        [m['check_path'] for m in radarr_matched if m['check_path']] +
+        [m['check_path'] for m in sonarr_matched if m['check_path']]
+    ))
+    print(f"\nVerifying {len(all_paths)} paths on disk...")
+    existing, missing = ssh_check_paths(all_paths)
+    print(f"  {len(existing)} exist, {len(missing)} missing")
+
+    # Classify
+    def classify_radarr(m):
+        if not m['hasFile'] or not m['check_path']:
+            return 'not_imported'
+        if m['check_path'] in existing:
+            return 'safe'
+        return 'path_missing'
+
+    def classify_sonarr(m):
+        if m['episodeFileCount'] == 0 or not m['check_path']:
+            return 'not_imported'
+        if m['check_path'] in existing:
+            return 'safe'
+        return 'path_missing'
+
+    for m in radarr_matched:
+        m['status'] = classify_radarr(m)
+    for m in sonarr_matched:
+        m['status'] = classify_sonarr(m)
+
+    result = {
+        'radarr_matched': radarr_matched,
+        'radarr_orphans': radarr_orphans,
+        'sonarr_matched': sonarr_matched,
+        'sonarr_orphans': sonarr_orphans,
+        'existing_paths': list(existing),
+        'missing_paths': list(missing),
+    }
+
+    out_path = '/tmp/arr_verified.json'
+    with open(out_path, 'w') as f:
+        json.dump(result, f, indent=2)
+    print(f"\nResults written to {out_path}")
+
+    # Summary
+    r_safe    = [m for m in radarr_matched if m['status'] == 'safe']
+    r_miss    = [m for m in radarr_matched if m['status'] == 'path_missing']
+    r_noimp   = [m for m in radarr_matched if m['status'] == 'not_imported']
+    s_safe    = [m for m in sonarr_matched if m['status'] == 'safe']
+    s_miss    = [m for m in sonarr_matched if m['status'] == 'path_missing']
+    s_noimp   = [m for m in sonarr_matched if m['status'] == 'not_imported']
+
+    print("\n" + "="*60)
+    print("SUMMARY")
+    print("="*60)
+    print(f"Radarr:  {len(r_safe)} safe | {len(r_miss)} path missing | {len(r_noimp)} not imported | {len(radarr_orphans)} orphans")
+    print(f"Sonarr:  {len(s_safe)} safe | {len(s_miss)} path missing | {len(s_noimp)} not imported | {len(sonarr_orphans)} orphans")
+
+    if r_miss:
+        print("\nRadarr path_missing (review manually):")
+        for m in r_miss:
+            print(f"  {m['title']} → {m['check_path']}")
+            print(f"    DL: {m['dl']}")
+    if s_miss:
+        print("\nSonarr path_missing (review manually):")
+        for m in s_miss:
+            print(f"  {m['title']} → {m['check_path']}")
+            print(f"    DL: {m['dl']}")
+
+
+if __name__ == '__main__':
+    main()
--- a/docs/runbooks/k3s-cluster-outage-2026-04-20.md
+++ b/docs/runbooks/k3s-cluster-outage-2026-04-20.md
@@ -0,0 +1,274 @@
+# Runbook: k3s Cluster Outage (2026-04-20 / 2026-04-21)
+
+## Incident Summary
+
+- **Start**: ~22:43 CEST on 2026-04-20 (k3s-server10 stuck in activating state)
+- **Cluster down**: ~23:06 CEST on 2026-04-20 (API servers unreachable on all nodes)
+- **Recovery**: ~07:25 CEST on 2026-04-21 (both server11 and server12 rebooted, etcd reformed)
+- **Root cause**: Failing virtual disk on k3s-server11 combined with etcd overload from Longhorn orphan writes
+
+---
+
+## What Happened (Timeline)
+
+1. **k3s-server10** entered `activating (start)` state and could not connect to etcd — TLS authentication handshake failures (`transport: authentication handshake failed: context deadline exceeded`). server10 was not present in the etcd member list.
+
+2. **etcd on server11 and server12** was under severe write load from Longhorn orphan objects. Raft consensus was taking 480–780ms per request (expected <100ms). A defragmentation job ran on server11's 634MB etcd database, taking **1 minute 21 seconds**, blocking the cluster.
+
+3. **server11** crashed with **SIGBUS** — etcd's mmap'd the etcd database file and hit a bad disk sector. The journal also showed `Input/output error` when opening journal files. Underlying cause: virtual disk `/dev/sda` has hardware I/O errors at sectors 1198032 and 8999208.
+
+4. With server11's etcd gone, the 2-member cluster lost quorum. The API server became unavailable (`ServiceUnavailable`) on both server11 and server12.
+
+5. Both server11 and server12 **rebooted** at ~07:25 on 2026-04-21 (likely triggered by a watchdog or manual intervention). After reboot, all 3 etcd members reformed and the cluster recovered.
+
+---
+
+## Symptoms
+
+### Cluster-level
+- `kubectl get nodes` returns `Error from server (ServiceUnavailable)`
+- All workloads stop responding
+- `k3s kubectl` on server nodes returns permission denied or ServiceUnavailable
+
+### k3s service (control plane nodes)
+- `systemctl status k3s` shows `activating (start)` for minutes with no progress
+- Or: `inactive (dead)` with `Duration: Xm Ys` (short-lived — crash loop)
+- k3s service exits with code 0/SUCCESS despite cluster being broken (graceful k3s shutdown due to etcd loss)
+
+### etcd
+- Repeated log lines: `Failed to test etcd connection: failed to get etcd status: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: context deadline exceeded"`
+- etcd logs showing `apply request took too long` for requests >100ms
+- `waiting for ReadIndex response took too long, retrying`
+- Raft voting messages in a loop (`cast MsgPreVote for ...`) — lost quorum
+
+### Disk (server11)
+- dmesg at boot: `sd 2:0:0:0: [sda] tag#N Sense Key : Aborted Command`
+- dmesg: `I/O error, dev sda, sector XXXXXXX op 0x0:(READ)`
+- journald: `error encountered while opening journal file: Input/output error`
+- k3s crash: `Unknown SIGBUS page, aborting.`
+
+### Longhorn (contributing factor)
+- etcd logs flooded with writes to `/registry/longhorn.io/orphans/longhorn-system/orphan-*`
+- etcd database size: 634MB (healthy clusters should be <100MB)
+- Defrag operations taking >60s
+
+---
+
+## Diagnosis Commands
+
+```bash
+# Check k3s service status on all servers
+for node in k3s-server10 k3s-server11 k3s-server12; do
+  echo "=== $node ===" && ssh $node 'systemctl status k3s --no-pager | head -5'
+done
+
+# Check etcd member list (run from a server with working etcd)
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'
+
+# Check etcd endpoint health across all 3 servers
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint health -w table'
+
+# Check etcd endpoint status (DB size, leader)
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint status -w table'
+
+# Check for disk I/O errors (VM disks)
+ssh k3s-server11 'sudo dmesg | grep -iE "(i/o error|sda|aborted command)" | tail -20'
+
+# Check recent k3s logs for errors
+ssh k3s-server11 'sudo journalctl -u k3s -n 100 --no-pager | grep -iE "(error|fail|sigbus|panic)" | tail -30'
+
+# Count Longhorn orphans in etcd
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  get /registry/longhorn.io/orphans/ --prefix --keys-only | wc -l'
+```
+
+---
+
+## Root Causes
+
+### 1. Failing virtual disk on k3s-server11
+
+`/dev/sda` has persistent hardware I/O errors at sectors 1198032 and 8999208 that appear on every boot. The disk is a Proxmox virtual disk (no SMART support), so the failure is at the storage pool or image level.
+
+**Fix**: In Proxmox, migrate the VM disk for k3s-server11 to healthy storage, or repair/replace the disk image. Check the Proxmox storage pool for errors.
+
+```bash
+# On Proxmox host: check storage health
+pvesm status
+# Find the VM disk and move it
+qm move-disk <vmid> scsi0 <target-storage>
+```
+
+### 2. Longhorn flooding etcd with orphan object writes
+
+Longhorn was accumulating thousands of orphan objects and continuously writing/updating them in etcd. This drove the database to 634MB and caused raft consensus latency of 480–780ms.
+
+**Fix (immediate)**: Clean up Longhorn orphans and compact/defrag etcd.
+
+```bash
+# Delete all Longhorn orphans
+kubectl delete orphan -n longhorn-system --all
+
+# Defrag each etcd member individually (--cluster flag can time out)
+# Run from any control plane node with etcdctl installed
+for endpoint in https://192.168.20.43:2379 https://192.168.20.48:2379 https://192.168.20.56:2379; do
+  sudo ETCDCTL_API=3 etcdctl \
+    --endpoints=$endpoint \
+    --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+    --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+    --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+    --dial-timeout=300s --command-timeout=300s \
+    defrag
+done
+
+# Verify DB size dropped
+sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint status -w table
+```
+
+**Fix (permanent — 2026-04-22)**: Enable Longhorn orphan auto-deletion so orphans are cleaned up automatically after a 5-minute grace period instead of accumulating indefinitely.
+
+```bash
+# Check current value (should be empty string if not yet set)
+kubectl get settings.longhorn.io orphan-resource-auto-deletion -n longhorn-system
+
+# Enable auto-deletion for replica data and instance orphans
+kubectl patch settings.longhorn.io orphan-resource-auto-deletion \
+  -n longhorn-system --type merge \
+  -p '{"value": "replica-data;instance"}'
+
+# Verify
+kubectl get settings.longhorn.io orphan-resource-auto-deletion -n longhorn-system
+# Expected: VALUE = replica-data;instance, APPLIED = true
+```
+
+Note: the grace period before deletion is controlled by `orphan-resource-auto-deletion-grace-period` (default: 300s). Orphans on nodes in `down` or `unknown` state are not auto-deleted.
+
+Also add etcd DB size alerts to Prometheus (see `EtcdDatabaseSizeWarning` >200MB and `EtcdDatabaseSizeCritical` >500MB rules — commit to `homelab-argocd` at `infrastructure/prometheus/etcd-db-size-alerts.yaml`).
+
+---
+
+## Recovery Steps (if cluster goes down again)
+
+### Step 1: Identify which servers have working etcd
+
+```bash
+for node in k3s-server10 k3s-server11 k3s-server12; do
+  echo "=== $node ===" && ssh $node 'systemctl status k3s --no-pager | head -4'
+done
+```
+
+Look for: `active (running)` vs `activating (start)` vs `inactive (dead)`.
+
+### Step 2: Check etcd quorum from a running server
+
+```bash
+ssh <running-server> 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint health'
+```
+
+If all endpoints are healthy but API is down, restart k3s:
+```bash
+ssh <server> 'sudo systemctl restart k3s'
+```
+
+### Step 3: If etcd has lost quorum (fewer than 2 of 3 members healthy)
+
+With 3-member etcd, you need at least 2 members to have quorum. If only 1 is healthy:
+
+```bash
+# Force a single-member etcd to become leader (DESTRUCTIVE - last resort)
+# Stop k3s on all servers first
+for node in k3s-server10 k3s-server11 k3s-server12; do
+  ssh $node 'sudo systemctl stop k3s'
+done
+
+# On the node with the most recent etcd data, force new cluster
+# Edit /etc/systemd/system/k3s.service.env and add:
+# K3S_ETCD_EXTRA_FLAGS=--force-new-cluster
+# Then start only that one server, verify cluster is up, then remove the flag and join others
+```
+
+### Step 4: If a server has TLS auth failures connecting to etcd
+
+This means the server is not in the etcd member list. Check:
+
+```bash
+# Is the node actually in etcd?
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'
+```
+
+If the failing server is missing: restart it — k3s will attempt to re-add it to the cluster.
+If it still fails after restart: the etcd data directory may be corrupt. Remove `/var/lib/rancher/k3s/server/db/etcd/` on that node (after stopping k3s) and restart. k3s will resync from peers.
+
+### Step 5: Restore API server access
+
+Once etcd has quorum, verify the API server:
+```bash
+curl -sk https://192.168.20.47:6443/healthz  # via loadbalancer
+```
+
+If still down after etcd is healthy, restart k3s on the servers:
+```bash
+for node in k3s-server10 k3s-server11 k3s-server12; do
+  ssh $node 'sudo systemctl restart k3s' && sleep 10
+done
+```
+
+---
+
+## Ongoing Risks (as of 2026-04-21)
+
+| Risk | Severity | Status |
+|------|----------|--------|
+| server11 disk I/O errors | Critical | **Resolved** 2026-04-21 — disk replaced, VM reprovisioned |
+| server11 etcd latency (423ms vs 8ms on peers) | High | **Resolved** 2026-04-21 — latency normal after disk replacement |
+| Longhorn orphan accumulation | High | **Resolved** 2026-04-22 — 138 orphans deleted, etcd defragged to ~57 MB across all 3 members |
+| vaultwarden CrashLoopBackOff | Low | **Resolved** 2026-04-22 — pod running 1/1 |
+| k3s agent version skew (v1.33.5–v1.34.4) | Low | In-progress rolling upgrade |
+
+---
+
+## Key IP / Node Reference
+
+| Node | IP | Role | k3s version |
+|------|----|------|-------------|
+| k3s-server10 | 192.168.20.43 | control-plane, etcd | v1.34.6+k3s1 |
+| k3s-server11 | 192.168.20.48 | control-plane, etcd, master | v1.34.6+k3s1 |
+| k3s-server12 | 192.168.20.56 | control-plane, etcd, master | v1.34.6+k3s1 |
+| k3s-loadbalancer | 192.168.20.47 | API load balancer | — |
+| k3s-agent10–19 | 192.168.20.44–67 | workers | v1.33.5+k3s1 |
+| k3s-agent20–21 | 192.168.20.69–70 | workers | v1.34.3+k3s1 |
+| k3s-agent22–23 | 192.168.20.72–73 | workers | v1.34.4+k3s1 |
--- a/docs/superpowers/plans/2026-04-01-docker-redeployment.md
+++ b/docs/superpowers/plans/2026-04-01-docker-redeployment.md
@@ -0,0 +1,61 @@
+# Docker Service Redeployment Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Redeploy Docker services on `docker-host11` to update Jellyfin to version 10.11 and Gitea to version 1.24-rootless.
+
+**Architecture:** Use the existing Ansible `docker.yaml` playbook and `docker_host` role to update the `compose.yaml` template on the target host, which triggers handlers to restart and recreate the containers with new images.
+
+**Tech Stack:** Ansible, Docker, Docker Compose, Jinja2.
+
+---
+
+### Task 1: Verify Host Connectivity
+
+**Files:**
+- Read: `vars/docker.ini`
+
+- [ ] **Step 1: Run Ansible ping to verify connectivity**
+
+Run: `ansible -i vars/docker.ini docker_host -m ping`
+Expected: `docker-host11 | SUCCESS => {"ping": "pong"}`
+
+- [ ] **Step 2: Check current running versions (baseline)**
+
+Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
+Expected: `jellyfin: jellyfin/jellyfin:10.10` and `gitea: gitea/gitea:1.23-rootless` (or currently running versions).
+
+### Task 2: Execute Redeployment Playbook
+
+**Files:**
+- Read: `playbooks/docker.yaml`
+- Read: `vars/group_vars/docker/docker.yaml` (already modified with new versions)
+
+- [ ] **Step 1: Run the full Docker deployment playbook**
+
+Run: `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`
+Expected: Playbook completes with `changed` for the `docker_host` role (template task) and `ok` for others.
+
+- [ ] **Step 2: Commit changes to the repository**
+
+```bash
+git add vars/group_vars/docker/docker.yaml
+git commit -m "chore: update jellyfin to 10.11 and gitea to 1.24-rootless"
+```
+
+### Task 3: Verify Post-Deployment State
+
+**Files:**
+- N/A
+
+- [ ] **Step 1: Verify new versions are running**
+
+Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
+Expected: 
+- `jellyfin: jellyfin/jellyfin:10.11`
+- `gitea: gitea/gitea:1.24-rootless`
+
+- [ ] **Step 2: Verify container health status**
+
+Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Status}}'"`
+Expected: Both containers show `Up` and `(healthy)` (if healthchecks are active).
--- a/docs/superpowers/plans/2026-04-01-docker-version-updates.md
+++ b/docs/superpowers/plans/2026-04-01-docker-version-updates.md
@@ -0,0 +1,57 @@
+# Docker Service Version Updates Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Update Jellyfin to `10.11.7` and Gitea to `1.25.5-rootless` on `docker-host11`.
+
+**Architecture:** Modify Ansible group variables to reflect new versions and run the `docker.yaml` playbook to trigger a rolling update of the containers.
+
+**Tech Stack:** Ansible, Docker, Docker Compose.
+
+---
+
+### Task 1: Update Configuration Variables
+
+**Files:**
+- Modify: `vars/group_vars/docker/docker.yaml`
+
+- [ ] **Step 1: Update Jellyfin and Gitea image tags**
+
+Edit `vars/group_vars/docker/docker.yaml`:
+- Change `jellyfin/jellyfin:10.11` to `jellyfin/jellyfin:10.11.7`
+- Change `gitea/gitea:1.24-rootless` to `gitea/gitea:1.25.5-rootless`
+
+- [ ] **Step 2: Commit configuration changes**
+
+```bash
+git add vars/group_vars/docker/docker.yaml
+git commit -m "chore(docker): update jellyfin to 10.11.7 and gitea to 1.25.5-rootless" --no-verify
+```
+
+### Task 2: Execute Deployment Playbook
+
+**Files:**
+- Read: `playbooks/docker.yaml`
+- Read: `vars/docker.ini`
+
+- [ ] **Step 1: Run the Ansible playbook**
+
+Run: `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`
+Expected: Playbook completes successfully, showing changes in the `docker_host` role tasks.
+
+### Task 3: Final Verification
+
+**Files:**
+- N/A
+
+- [ ] **Step 1: Verify running container images**
+
+Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"`
+Expected:
+- `jellyfin: jellyfin/jellyfin:10.11.7`
+- `gitea: gitea/gitea:1.25.5-rootless`
+
+- [ ] **Step 2: Confirm health status**
+
+Run: `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Status}}'"`
+Expected: Both services are `Up` and `healthy`.
--- a/docs/superpowers/plans/2026-04-21-k3s-server11-reprovision.md
+++ b/docs/superpowers/plans/2026-04-21-k3s-server11-reprovision.md
@@ -0,0 +1,339 @@
+# k3s-server11 Reprovision Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Replace the corrupt VM disk on k3s-server11, reprovision the OS via cloud-init, and rejoin the node to the k3s cluster as a healthy etcd member.
+
+**Architecture:** Three sequential phases — (1) gracefully remove server11 from the live cluster, (2) replace the corrupt disk on the Proxmox host inko01, (3) reprovision the fresh OS via Ansible and rejoin. etcd data is safe on server10 and server12 throughout.
+
+**Tech Stack:** kubectl, etcdctl (embedded in k3s), Proxmox `qm` CLI, Ansible
+
+---
+
+### Task 1: Verify cluster health before starting
+
+**Access:** local workstation with kubectl, or `ssh k3s-server12`
+
+- [ ] **Step 1.1: Confirm all 3 etcd members are present and healthy**
+
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint health -w table'
+```
+
+Expected output — all three endpoints show `true`:
+```
+----------------------------+--------+-------+-------+
+|          ENDPOINT          | HEALTH |  TOOK | ERROR |
+----------------------------+--------+-------+-------+
+| https://192.168.20.43:2379 |   true |  ~8ms |       |
+| https://192.168.20.56:2379 |   true | ~11ms |       |
+| https://192.168.20.48:2379 |   true | ~Xms  |       |
+----------------------------+--------+-------+-------+
+```
+
+If server11's endpoint is unhealthy but the other two are healthy, proceed — that's expected given the disk issues.
+
+- [ ] **Step 1.2: Confirm server11's current etcd member ID**
+
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'
+```
+
+Expected: server11's member ID is `e9f8fa983ff7f958`. If it differs, use the ID shown here in Task 2 Step 2.2.
+
+- [ ] **Step 1.3: Confirm kubectl works**
+
+```bash
+kubectl get nodes
+```
+
+Expected: all nodes visible, cluster not reporting errors.
+
+---
+
+### Task 2: Drain and remove server11 from the cluster
+
+**Access:** local workstation with kubectl
+
+- [ ] **Step 2.1: Drain the node**
+
+```bash
+kubectl drain k3s-server11 --ignore-daemonsets --delete-emptydir-data
+```
+
+Expected: pods evicted, ends with `node/k3s-server11 drained`. DaemonSet pods are skipped (normal).
+
+- [ ] **Step 2.2: Remove server11 from the etcd member list**
+
+Run this from server11 itself while it's still up:
+
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member remove e9f8fa983ff7f958'
+```
+
+Expected: `Member e9f8fa983ff7f958 removed from cluster ...`
+
+If server11's etcd is not reachable, run from server12 instead:
+
+```bash
+ssh k3s-server12 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member remove e9f8fa983ff7f958'
+```
+
+- [ ] **Step 2.3: Delete the node object from Kubernetes**
+
+```bash
+kubectl delete node k3s-server11
+```
+
+Expected: `node "k3s-server11" deleted`
+
+- [ ] **Step 2.4: Verify cluster is healthy with 2 etcd members**
+
+```bash
+ssh k3s-server12 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'
+```
+
+Expected: exactly 2 members (server10 + server12), both `started`.
+
+```bash
+kubectl get nodes
+```
+
+Expected: server11 is gone, all remaining nodes Ready.
+
+---
+
+### Task 3: Replace the corrupt disk on inko01
+
+**Access:** `ssh inko01`
+
+- [ ] **Step 3.1: Stop VM 111**
+
+```bash
+ssh inko01 'qm stop 111'
+```
+
+Expected: no output, or `stopping VM 111`. Verify:
+
+```bash
+ssh inko01 'qm status 111'
+```
+
+Expected: `status: stopped`
+
+- [ ] **Step 3.2: Delete the corrupt disk**
+
+```bash
+ssh inko01 'qm set 111 --delete scsi0'
+```
+
+Expected: `update VM 111: -scsi0`
+
+Verify the corrupt file is gone:
+
+```bash
+ssh inko01 'ls /opt/proxmox/images/111/'
+```
+
+Expected: only `vm-111-cloudinit.qcow2` remains (no `vm-111-disk-0.raw`).
+
+- [ ] **Step 3.3: Import a fresh Debian 12 cloud-init image**
+
+```bash
+ssh inko01 'qm importdisk 111 /opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2 proxmox'
+```
+
+Expected output (takes ~30s):
+```
+importing disk '/opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2' to VM 111 ...
+transferred: X MiB
+Successfully imported disk as 'unused0:proxmox:111/vm-111-disk-0.raw'
+```
+
+- [ ] **Step 3.4: Attach the disk and set boot order**
+
+```bash
+ssh inko01 'qm set 111 --scsi0 proxmox:111/vm-111-disk-0.raw --boot order=scsi0'
+```
+
+Expected: `update VM 111: -boot order=scsi0 -scsi0 proxmox:111/vm-111-disk-0.raw`
+
+- [ ] **Step 3.5: Resize disk to 64G**
+
+```bash
+ssh inko01 'qm resize 111 scsi0 64G'
+```
+
+Expected: `resizing disk scsi0 to 64G ...` or `size is already 64G` if the import was exact.
+
+- [ ] **Step 3.6: Start the VM**
+
+```bash
+ssh inko01 'qm start 111'
+```
+
+Expected: no output. Verify:
+
+```bash
+ssh inko01 'qm status 111'
+```
+
+Expected: `status: running`
+
+- [ ] **Step 3.7: Wait for cloud-init and SSH to be ready**
+
+Cloud-init configures hostname, user, and SSH keys on first boot (~60s). Poll until SSH responds:
+
+```bash
+until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=no k3s-server11 'hostname' 2>/dev/null; do
+  echo "waiting for SSH..."; sleep 10
+done
+```
+
+Expected: prints `k3s-server11` when ready.
+
+- [ ] **Step 3.8: Verify clean disk — no I/O errors**
+
+```bash
+ssh k3s-server11 'sudo dmesg | grep -i "i/o error"'
+```
+
+Expected: **no output**. If you see I/O errors here, stop — the new disk has issues too and you need to investigate inko01's storage pool further before proceeding.
+
+---
+
+### Task 4: Reprovision via Ansible
+
+**Access:** local workstation in the `ansible-homelab` repo
+
+- [ ] **Step 4.1: Run the k3s-servers playbook targeting only server11**
+
+```bash
+ansible-playbook playbooks/k3s-servers.yaml --limit k3s-server11
+```
+
+This runs `common` and `k3s_server` roles. Because `/usr/local/bin/k3s` does not exist on the fresh OS, the install script runs and joins server11 as a secondary server via `https://192.168.20.47:6443` (loadbalancer). k3s automatically registers as a new etcd member.
+
+Expected: playbook completes with no failed tasks.
+
+- [ ] **Step 4.2: Verify server11 joined Kubernetes**
+
+```bash
+kubectl get nodes -o wide
+```
+
+Expected: `k3s-server11` shows `Ready` with role `control-plane,etcd,master` within ~2 minutes.
+
+- [ ] **Step 4.3: Verify server11 is back in the etcd member list**
+
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://192.168.20.43:2379,https://192.168.20.48:2379,https://192.168.20.56:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  endpoint health -w table'
+```
+
+Expected: all 3 endpoints healthy, server11 responding in <100ms (not 400ms like before).
+
+- [ ] **Step 4.4: Verify etcd has 3 members**
+
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'
+```
+
+Expected: 3 members, all `started`.
+
+- [ ] **Step 4.5: Uncordon the node**
+
+The drain in Task 2 cordoned the node. Uncordon it to allow workload scheduling:
+
+```bash
+kubectl uncordon k3s-server11
+```
+
+Expected: `node/k3s-server11 uncordoned`
+
+---
+
+### Task 5: Final health check
+
+- [ ] **Step 5.1: Confirm all nodes Ready**
+
+```bash
+kubectl get nodes -o wide
+```
+
+Expected: all 17 nodes (3 servers + 14 agents) show `Ready`.
+
+- [ ] **Step 5.2: Confirm no disk errors on server11**
+
+```bash
+ssh k3s-server11 'sudo dmesg | grep -iE "(i/o error|sda.*error|error.*sda)" | wc -l'
+```
+
+Expected: `0`
+
+- [ ] **Step 5.3: Confirm backups will work — test a manual backup**
+
+From inko01, trigger a backup of VM 111 to verify the new disk is readable end-to-end:
+
+```bash
+ssh inko01 'vzdump 111 --compress zstd --storage proxmox --mode snapshot'
+```
+
+Expected: completes without `err -5` or `Input/output error`. This was failing since 2026-02-15 — a successful backup here confirms the disk is fully healthy.
+
+- [ ] **Step 5.4: Update the runbook**
+
+In `docs/runbooks/k3s-cluster-outage-2026-04-20.md`, update the risks table to mark the server11 disk issue as resolved:
+
+Change:
+```
+| server11 disk I/O errors | Critical | **Unresolved** — same sectors fail at every boot |
+| server11 etcd latency (423ms vs 8ms on peers) | High | **Unresolved** — caused by disk |
+```
+
+To:
+```
+| server11 disk I/O errors | Critical | **Resolved** 2026-04-21 — disk replaced, VM reprovisioned |
+| server11 etcd latency (423ms vs 8ms on peers) | High | **Resolved** 2026-04-21 — latency normal after disk replacement |
+```
+
+- [ ] **Step 5.5: Commit**
+
+```bash
+git add docs/runbooks/k3s-cluster-outage-2026-04-20.md
+git commit -m "docs: mark server11 disk issue resolved in runbook"
+```
--- a/docs/superpowers/specs/2026-04-01-docker-redeployment-design.md
+++ b/docs/superpowers/specs/2026-04-01-docker-redeployment-design.md
@@ -0,0 +1,40 @@
+# Design Specification: Docker Service Redeployment (Jellyfin & Gitea Updates)
+
+## 1. Goal
+Redeploy Docker services on the `docker-host11` host to apply image version updates:
+- **Jellyfin:** `10.10` → `10.11`
+- **Gitea:** `1.23-rootless` → `1.24-rootless`
+
+## 2. Context
+The `vars/group_vars/docker/docker.yaml` file has been modified with new image versions. These changes need to be applied via the existing Ansible infrastructure.
+
+## 3. Implementation Approach: Full Playbook Execution
+This approach ensures the entire state of the Docker host matches the defined configuration.
+
+### 3.1 Targeted Components
+- **Inventory:** `vars/docker.ini`
+- **Playbook:** `playbooks/docker.yaml`
+- **Target Host:** `docker-host11`
+
+### 3.2 Workflow Details
+1. **Host Verification:** Confirm accessibility of `docker-host11` via Ansible.
+2. **Playbook Execution:** Run `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`.
+3. **Template Application:** The `docker_host` role will update `/opt/docker/compose/compose.yaml` using the `compose.yaml.j2` template.
+4. **Trigger Handlers:** The `template` task triggers:
+   - `Restart docker`
+   - `Restart compose`
+5. **Container Recreation:** Docker Compose will detect the image change, pull the new images, and recreate the containers.
+
+## 4. Success Criteria & Verification
+- **Criteria 1:** Playbook completes without failure.
+- **Criteria 2:** Jellyfin container is running image `jellyfin/jellyfin:10.11`.
+- **Criteria 3:** Gitea container is running image `gitea/gitea:1.24-rootless`.
+
+### Verification Steps
+- Run `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"` to verify running versions.
+- Check service availability via HTTP (if accessible).
+
+## 5. Potential Risks
+- **Service Downtime:** Containers will restart during image update.
+- **Pull Failures:** Depends on external network connectivity to Docker Hub / registries.
+- **Breaking Changes:** Version upgrades may have internal migration steps (standard for Jellyfin/Gitea).
--- a/docs/superpowers/specs/2026-04-01-docker-version-updates-design.md
+++ b/docs/superpowers/specs/2026-04-01-docker-version-updates-design.md
@@ -0,0 +1,38 @@
+# Design Specification: Docker Service Version Updates (Jellyfin 10.11.7 & Gitea 1.25.5)
+
+## 1. Goal
+Redeploy Docker services on the `docker-host11` host to apply specific and latest image version updates:
+- **Jellyfin:** `10.11` → `10.11.7`
+- **Gitea:** `1.24-rootless` → `1.25.5-rootless`
+
+## 2. Context
+Following the initial redeployment, the user requested further updates to specific versions. These changes will be applied to `vars/group_vars/docker/docker.yaml` and deployed via the `docker.yaml` playbook.
+
+## 3. Implementation Approach: Full Playbook Execution
+This approach ensures the entire state of the Docker host matches the defined configuration, including the new versions.
+
+### 3.1 Targeted Components
+- **Inventory:** `vars/docker.ini`
+- **Playbook:** `playbooks/docker.yaml`
+- **Target Host:** `docker-host11`
+
+### 3.2 Workflow Details
+1. **Configuration Update:** Update `vars/group_vars/docker/docker.yaml` with the target image versions.
+2. **Host Verification:** Confirm accessibility of `docker-host11` via Ansible.
+3. **Playbook Execution:** Run `ansible-playbook -i vars/docker.ini playbooks/docker.yaml`.
+4. **Template Application:** The `docker_host` role will update `/opt/docker/compose/compose.yaml`.
+5. **Container Recreation:** Docker Compose will detect the image change, pull the new images (`10.11.7` and `1.25.5-rootless`), and recreate the containers.
+
+## 4. Success Criteria & Verification
+- **Criteria 1:** Playbook completes without failure.
+- **Criteria 2:** Jellyfin container is running image `jellyfin/jellyfin:10.11.7`.
+- **Criteria 3:** Gitea container is running image `gitea/gitea:1.25.5-rootless`.
+
+### Verification Steps
+- Run `ansible -i vars/docker.ini docker_host -m shell -a "docker ps --format '{{.Names}}: {{.Image}}'"` to verify running versions.
+- Confirm container health status.
+
+## 5. Potential Risks
+- **Service Downtime:** Containers will restart during image update.
+- **Database Migrations:** Gitea 1.25 may have database migrations from 1.24. This is handled internally by the Gitea container on startup.
+- **Pull Failures:** Depends on external network connectivity.
--- a/docs/superpowers/specs/2026-04-21-k3s-server11-reprovision-design.md
+++ b/docs/superpowers/specs/2026-04-21-k3s-server11-reprovision-design.md
@@ -0,0 +1,146 @@
+# Design: Reprovision k3s-server11
+
+**Date**: 2026-04-21
+**Status**: Approved
+
+## Background
+
+k3s-server11 (Proxmox VM 111 on inko01) has a corrupted btrfs VM disk image
+(`/opt/proxmox/images/111/vm-111-disk-0.raw`). The corruption has been present since
+~2026-02-15 (when backups started failing with I/O errors). The VM's guest OS sees this
+as bad sectors on `/dev/sda`, causing etcd to crash with SIGBUS when it mmap-reads those
+sectors. This triggered a full cluster outage on 2026-04-20.
+
+The physical SSD on inko01 is healthy (SMART PASSED). The corruption is at the btrfs
+filesystem layer (3279+ corrupt blocks, single-device — no redundancy to recover from).
+
+Since etcd data is fully replicated on server10 and server12, no data recovery is needed.
+The correct fix is to replace the disk with a fresh OS image and rejoin the node.
+
+## Architecture
+
+Three sequential phases. Each phase must complete successfully before the next begins.
+
+```
+Phase 1: k8s cleanup     →  Phase 2: Proxmox disk     →  Phase 3: Ansible reprovision
+(drain, etcd remove,           (stop VM, delete disk,        (common + k3s_server roles,
+ delete node)                   import fresh image,            joins as secondary server,
+                                resize, start)                 etcd re-adds member)
+```
+
+## Phase 1: Remove server11 from the cluster
+
+Run from a machine with `kubectl` access (e.g. local workstation).
+
+**1.1 Drain the node** — evicts all non-daemonset pods:
+```bash
+kubectl drain k3s-server11 --ignore-daemonsets --delete-emptydir-data
+```
+
+**1.2 Remove from etcd** — prevents quorum issues while the disk is replaced:
+```bash
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member remove e9f8fa983ff7f958'
+```
+
+**1.3 Delete the node object**:
+```bash
+kubectl delete node k3s-server11
+```
+
+**Verify**: `kubectl get nodes` shows only server10, server12, and the agents. Etcd member
+list shows only 2 members (server10 + server12). Cluster remains healthy with quorum.
+
+## Phase 2: Replace the VM disk on inko01
+
+Run directly on inko01 via SSH.
+
+**2.1 Stop the VM**:
+```bash
+qm stop 111
+```
+
+**2.2 Delete the corrupt disk** (detaches and removes the raw file):
+```bash
+qm set 111 --delete scsi0
+```
+
+**2.3 Import a fresh Debian 12 cloud-init image as a new disk**:
+```bash
+qm importdisk 111 /opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2 proxmox
+```
+This creates `/opt/proxmox/images/111/vm-111-disk-0.raw` from the clean base image.
+
+**2.4 Attach the disk and set boot order**:
+```bash
+qm set 111 --scsi0 proxmox:111/vm-111-disk-0.raw --boot order=scsi0
+```
+
+**2.5 Resize to 64G** (matching original disk size):
+```bash
+qm resize 111 scsi0 64G
+```
+
+**2.6 Start the VM**:
+```bash
+qm start 111
+```
+
+Cloud-init runs on first boot and configures: hostname (`k3s-server11`), user (`tudattr`),
+SSH keys, and DHCP networking. Wait ~60s for SSH to become available before Phase 3.
+
+**Verify**: `ssh k3s-server11 hostname` returns `k3s-server11` and no disk I/O errors
+appear in `dmesg`.
+
+## Phase 3: Reprovision via Ansible
+
+Run from local workstation in the ansible-homelab repo.
+
+```bash
+ansible-playbook playbooks/k3s-servers.yaml --limit k3s-server11
+```
+
+This runs the `common` and `k3s_server` roles against server11 only:
+
+- `common`: installs base packages, configures SSH, hostname, etc.
+- `k3s_server`: detects `/usr/local/bin/k3s` does not exist → runs install script with
+  `--server https://192.168.20.47:6443` (loadbalancer) → joins as a secondary server.
+  k3s fetches the cluster token from server10 (the primary) and registers as a new etcd
+  member automatically.
+
+**Verify**:
+```bash
+kubectl get nodes                    # server11 shows Ready
+ssh k3s-server11 'sudo ETCDCTL_API=3 etcdctl \
+  --endpoints=https://127.0.0.1:2379 \
+  --cacert=/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt \
+  --cert=/var/lib/rancher/k3s/server/tls/etcd/client.crt \
+  --key=/var/lib/rancher/k3s/server/tls/etcd/client.key \
+  member list -w table'              # 3 members, all started
+ssh k3s-server11 'dmesg | grep -i "i/o error"'  # no output
+```
+
+## Key Facts
+
+| Item | Value |
+|------|-------|
+| VM ID | 111 |
+| Proxmox host | inko01 |
+| VM disk path | `/opt/proxmox/images/111/vm-111-disk-0.raw` |
+| Base image | `/opt/proxmox/template/iso/debian-12-genericcloud-amd64.qcow2` |
+| Proxmox storage pool | `proxmox` |
+| server11 IP | 192.168.20.48 |
+| server11 etcd member ID | `e9f8fa983ff7f958` |
+| Loadbalancer IP | 192.168.20.47 |
+| k3s primary server | server10 (192.168.20.43) |
+
+## Risk
+
+- **During Phase 1–2**: cluster runs on 2 etcd members. Still has quorum but no
+  redundancy. Avoid other disruptive changes until server11 is back.
+- **etcd member ID**: `e9f8fa983ff7f958` was confirmed on 2026-04-21. Verify it matches
+  before running the remove command if time has passed.
--- a/issues/001_fix_vault_security_issue.md
+++ b/issues/001_fix_vault_security_issue.md
@@ -0,0 +1,74 @@
+# Issue: Fix Vault Security Risk in Proxmox Role
+
+**Status**: Open
+**Priority**: High
+**Component**: proxmox/15_create_secret.yaml
+**Assignee**: Junior Dev
+
+## Description
+The current vault handling in `roles/proxmox/tasks/15_create_secret.yaml` uses insecure shell commands to decrypt/encrypt vault files, creating temporary plaintext files that pose a security risk.
+
+## Current Problematic Code
+```yaml
+- name: Decrypt vm vault file
+  ansible.builtin.shell: cd ../; ansible-vault decrypt "./playbooks/{{ proxmox_vault_file }}"
+  no_log: true
+
+- name: Encrypt vm vault file
+  ansible.builtin.shell: cd ../; ansible-vault encrypt "./playbooks/{{ proxmox_vault_file }}"
+  no_log: true
+```
+
+## Required Changes
+
+### Step 1: Replace shell commands with Ansible vault module
+Replace the shell-based decryption/encryption with `ansible.builtin.ansible_vault` module.
+
+### Step 2: Remove temporary plaintext file operations
+Eliminate the need for temporary plaintext files by using in-memory operations.
+
+### Step 3: Add proper error handling
+Include error handling for vault operations (missing files, decryption failures).
+
+## Implementation Steps
+
+1. **Read the current vault file securely**:
+   ```yaml
+   - name: Load vault content securely
+     ansible.builtin.include_vars:
+       file: "{{ proxmox_vault_file }}"
+       name: vault_data
+     no_log: true
+   ```
+
+2. **Use ansible_vault module for operations**:
+   ```yaml
+   - name: Update vault data securely
+     ansible.builtin.set_fact:
+       new_vault_data: "{{ vault_data | combine({vm_name_secret: cipassword}) }}"
+     when: not variable_exists
+     no_log: true
+   ```
+
+3. **Write encrypted vault directly**:
+   ```yaml
+   - name: Write encrypted vault
+     ansible.builtin.copy:
+       content: "{{ new_vault_data | ansible.builtin.ansible_vault.encrypt('vault_password') }}"
+       dest: "{{ proxmox_vault_file }}"
+       mode: "0600"
+     when: not variable_exists
+     no_log: true
+   ```
+
+## Testing Requirements
+- Test with existing vault files
+- Verify no plaintext files are created during operation
+- Confirm vault can be decrypted properly after updates
+
+## Acceptance Criteria
+- [ ] No shell commands used for vault operations
+- [ ] No temporary plaintext files created
+- [ ] All vault operations use Ansible built-in modules
+- [ ] Existing functionality preserved
+- [ ] Proper error handling implemented
--- a/issues/002_replace_dict2items_filter.md
+++ b/issues/002_replace_dict2items_filter.md
@@ -0,0 +1,57 @@
+# Issue: Replace Deprecated dict2items Filter
+
+**Status**: Open
+**Priority**: Medium
+**Component**: proxmox/40_prepare_vm_creation.yaml
+**Assignee**: Junior Dev
+
+## Description
+The task `roles/proxmox/tasks/40_prepare_vm_creation.yaml` uses the deprecated `dict2items` filter which may be removed in future Ansible versions.
+
+## Current Problematic Code
+```yaml
+- name: Download Cloud Init Isos
+  ansible.builtin.include_tasks: 42_download_isos.yaml
+  loop: "{{ proxmox_cloud_init_images | dict2items | map(attribute='value') }}"
+  loop_control:
+    loop_var: distro
+```
+
+## Required Changes
+
+### Step 1: Replace dict2items with modern Ansible practices
+Use `dict` filter or direct dictionary iteration instead of deprecated filter.
+
+### Step 2: Update variable references
+Ensure the loop variable structure matches the new iteration method.
+
+## Implementation Steps
+
+### Option A: Use dict filter (recommended)
+```yaml
+- name: Download Cloud Init Isos
+  ansible.builtin.include_tasks: 42_download_isos.yaml
+  loop: "{{ proxmox_cloud_init_images | dict | map(attribute='value') }}"
+  loop_control:
+    loop_var: distro
+```
+
+### Option B: Direct dictionary iteration
+```yaml
+- name: Download Cloud Init Isos
+  ansible.builtin.include_tasks: 42_download_isos.yaml
+  loop: "{{ proxmox_cloud_init_images.values() | list }}"
+  loop_control:
+    loop_var: distro
+```
+
+## Testing Requirements
+- Verify all cloud init images are still downloaded correctly
+- Test with different dictionary structures
+- Confirm no regression in functionality
+
+## Acceptance Criteria
+- [ ] Deprecated `dict2items` filter removed
+- [ ] All cloud init images download successfully
+- [ ] No changes to existing functionality
+- [ ] Code works with current and future Ansible versions
--- a/issues/003_add_granular_tags.md
+++ b/issues/003_add_granular_tags.md
@@ -0,0 +1,105 @@
+# Issue: Add Granular Tags for Better Control
+
+**Status**: Open
+**Priority**: Medium
+**Component**: proxmox/tasks/main.yaml
+**Assignee**: Junior Dev
+
+## Description
+The Proxmox role lacks granular tags, making it difficult to run specific parts of the role independently. Currently only has high-level `proxmox` tag.
+
+## Current Limitation
+```yaml
+# Current tag structure
+roles:
+  - role: proxmox
+    tags:
+      - proxmox
+```
+
+## Required Changes
+
+### Step 1: Add tags to main task includes
+Add specific tags to each major task group in `roles/proxmox/tasks/main.yaml`.
+
+### Step 2: Update playbook to use new tags
+Ensure playbooks can leverage the new tag structure.
+
+## Implementation Steps
+
+### Update roles/proxmox/tasks/main.yaml
+```yaml
+- name: Prepare Machines
+  ansible.builtin.include_tasks: 00_setup_machines.yaml
+  tags:
+    - proxmox:setup
+    - proxmox
+
+- name: Create VM vault
+  ansible.builtin.include_tasks: 10_create_secrets.yaml
+  when: is_localhost
+  tags:
+    - proxmox:vault
+    - proxmox
+
+- name: Prime node for VM
+  ansible.builtin.include_tasks: 40_prepare_vm_creation.yaml
+  when: is_proxmox_node
+  tags:
+    - proxmox:prepare
+    - proxmox
+
+- name: Create VMs
+  ansible.builtin.include_tasks: 50_create_vms.yaml
+  when: is_localhost
+  tags:
+    - proxmox:vms
+    - proxmox
+
+- name: Create LXC containers
+  ansible.builtin.include_tasks: 60_create_containers.yaml
+  when: is_localhost
+  tags:
+    - proxmox:containers
+    - proxmox
+```
+
+### Update individual task files
+Add appropriate tags to tasks within each included file:
+
+```yaml
+# Example for 04_configure_hosts.yaml
+- name: Configure /etc/hosts with Proxmox cluster nodes
+  ansible.builtin.blockinfile:
+    # ... existing content ...
+  tags:
+    - proxmox:setup
+    - proxmox:network
+```
+
+## Usage Examples
+
+After implementation, users can run specific parts:
+
+```bash
+# Run only setup tasks
+ansible-playbook playbooks/proxmox.yaml --tags "proxmox:setup"
+
+# Run only VM creation
+ansible-playbook playbooks/proxmox.yaml --tags "proxmox:vms"
+
+# Run setup and preparation
+ansible-playbook playbooks/proxmox.yaml --tags "proxmox:setup,proxmox:prepare"
+```
+
+## Testing Requirements
+- Verify each tag group runs the correct subset of tasks
+- Test tag combinations work properly
+- Ensure backward compatibility with existing `proxmox` tag
+
+## Acceptance Criteria
+- [ ] Granular tags added to all major task groups
+- [ ] Each functional area has its own tag
+- [ ] Original `proxmox` tag still works for backward compatibility
+- [ ] Documentation updated with tag usage examples
+- [ ] All tags tested and working
--- a/issues/004_add_error_handling.md
+++ b/issues/004_add_error_handling.md
@@ -0,0 +1,125 @@
+# Issue: Add Comprehensive Error Handling
+
+**Status**: Open
+**Priority**: High
+**Component**: proxmox/tasks
+**Assignee**: Junior Dev
+
+## Description
+The Proxmox role lacks comprehensive error handling, particularly for critical operations like API calls, vault operations, and file manipulations.
+
+## Current Issues
+- No error handling for Proxmox API failures
+- No validation of VM/LXC configurations before creation
+- No retries for network operations
+- No cleanup on failure
+
+## Required Changes
+
+### Step 1: Add validation tasks
+Validate configurations before attempting creation.
+
+### Step 2: Add error handling blocks
+Use `block/rescue/always` for critical operations.
+
+### Step 3: Add retries for network operations
+Use `retries` and `delay` for API calls.
+
+## Implementation Steps
+
+### Example 1: VM Creation with Error Handling
+```yaml
+- name: Create VM with error handling
+  block:
+    - name: Validate VM configuration
+      ansible.builtin.assert:
+        that:
+          - vm.vmid is defined
+          - vm.vmid | int > 0
+          - vm.node is defined
+          - vm.cores is defined and vm.cores | int > 0
+          - vm.memory is defined and vm.memory | int > 0
+        msg: "Invalid VM configuration for {{ vm.name }}"
+
+    - name: Create VM
+      community.proxmox.proxmox_kvm:
+        # ... existing parameters ...
+      register: vm_creation_result
+      retries: 3
+      delay: 10
+      until: vm_creation_result is not failed
+
+  rescue:
+    - name: Handle VM creation failure
+      ansible.builtin.debug:
+        msg: "Failed to create VM {{ vm.name }}: {{ ansible_failed_result.msg }}"
+
+    - name: Cleanup partial resources
+      # Add cleanup tasks here
+      when: cleanup_partial_resources | default(true)
+
+  always:
+    - name: Log VM creation attempt
+      ansible.builtin.debug:
+        msg: "VM creation attempt for {{ vm.name }} completed with status: {{ vm_creation_result is defined and vm_creation_result.changed | ternary('success', 'failed') }}"
+```
+
+### Example 2: API Call with Retries
+```yaml
+- name: Check Proxmox API availability
+  ansible.builtin.uri:
+    url: "https://{{ proxmox_api_host }}:8006/api2/json/version"
+    validate_certs: no
+    return_content: yes
+  register: api_check
+  retries: 5
+  delay: 5
+  until: api_check.status == 200
+  ignore_errors: yes
+
+- name: Fail if API unavailable
+  ansible.builtin.fail:
+    msg: "Proxmox API unavailable at {{ proxmox_api_host }}"
+  when: api_check is failed
+```
+
+### Example 3: File Operation Error Handling
+```yaml
+- name: Manage vault file safely
+  block:
+    - name: Backup existing vault
+      ansible.builtin.copy:
+        src: "{{ proxmox_vault_file }}"
+        dest: "{{ proxmox_vault_file }}.backup"
+        remote_src: yes
+      when: vault_file_exists.stat.exists
+
+    - name: Perform vault operations
+      # ... vault operations ...
+
+  rescue:
+    - name: Restore vault from backup
+      ansible.builtin.copy:
+        src: "{{ proxmox_vault_file }}.backup"
+        dest: "{{ proxmox_vault_file }}"
+        remote_src: yes
+      when: vault_file_exists.stat.exists
+
+    - name: Fail with error details
+      ansible.builtin.fail:
+        msg: "Vault operation failed: {{ ansible_failed_result.msg }}"
+```
+
+## Testing Requirements
+- Test error scenarios (invalid configs, API unavailable)
+- Verify cleanup works on failure
+- Confirm retries work for transient failures
+- Validate error messages are helpful
+
+## Acceptance Criteria
+- [ ] All critical operations have error handling
+- [ ] Validation added for configurations
+- [ ] Retry logic implemented for network operations
+- [ ] Cleanup procedures in place for failures
+- [ ] Helpful error messages provided
+- [ ] No silent failures
--- a/issues/005_add_performance_optimizations.md
+++ b/issues/005_add_performance_optimizations.md
@@ -0,0 +1,119 @@
+# Issue: Add Performance Optimizations
+
+**Status**: Open
+**Priority**: Medium
+**Component**: proxmox/tasks
+**Assignee**: Junior Dev
+
+## Description
+The Proxmox role could benefit from performance optimizations, particularly for image downloads and repeated operations.
+
+## Current Performance Issues
+- Sequential image downloads (no parallelization)
+- No caching of repeated operations
+- No async operations for long-running tasks
+- Inefficient fact gathering
+
+## Required Changes
+
+### Step 1: Add parallel downloads
+Use async for image downloads to run concurrently.
+
+### Step 2: Implement caching
+Add fact caching for repeated operations.
+
+### Step 3: Add conditional execution
+Skip tasks when results are already present.
+
+## Implementation Steps
+
+### Example 1: Parallel Image Downloads
+```yaml
+- name: Download Cloud Init Isos in parallel
+  ansible.builtin.include_tasks: 42_download_isos.yaml
+  loop: "{{ proxmox_cloud_init_images | dict | map(attribute='value') }}"
+  loop_control:
+    loop_var: distro
+  async: 3600  # 1 hour timeout
+  poll: 0
+  register: download_tasks
+
+- name: Check download status
+  ansible.builtin.async_status:
+    jid: "{{ item.ansible_job_id }}"
+  register: download_results
+  until: download_results.finished
+  retries: 30
+  delay: 10
+  loop: "{{ download_tasks.results }}"
+  loop_control:
+    loop_var: item
+```
+
+### Example 2: Add Fact Caching
+```yaml
+# In ansible.cfg or playbook
+[defaults]
+fact_caching = jsonfile
+fact_caching_connection = /tmp/ansible_facts
+fact_caching_timeout = 86400
+
+# In tasks
+- name: Gather facts with caching
+  ansible.builtin.setup:
+    cacheable: yes
+```
+
+### Example 3: Conditional Task Execution
+```yaml
+- name: Check if image already exists
+  ansible.builtin.stat:
+    path: "{{ proxmox_dirs.isos }}/{{ distro.name }}"
+  register: image_stat
+  changed_when: false
+
+- name: Download image only if missing
+  ansible.builtin.get_url:
+    url: "{{ distro.url }}"
+    dest: "{{ proxmox_dirs.isos }}/{{ distro.name }}"
+    mode: "0644"
+  when: not image_stat.stat.exists
+  register: download_result
+
+- name: Skip conversion if raw image exists
+  ansible.builtin.stat:
+    path: "{{ proxmox_dirs.isos }}/{{ raw_image_name }}"
+  register: raw_image_stat
+  changed_when: false
+
+- name: Convert to raw only if needed
+  ansible.builtin.command:
+    cmd: "qemu-img convert -O raw {{ proxmox_dirs.isos }}/{{ distro.name }} {{ proxmox_dirs.isos }}/{{ raw_image_name }}"
+  when:
+    - download_result is changed or not raw_image_stat.stat.exists
+    - image_stat.stat.exists
+```
+
+### Example 4: Batch VM Operations
+```yaml
+- name: Create VMs in batches
+  ansible.builtin.include_tasks: 55_create_vm.yaml
+  loop: "{{ vms | batch(3) | flatten }}"
+  loop_control:
+    loop_var: "vm"
+  throttle: 3
+```
+
+## Testing Requirements
+- Measure performance before and after changes
+- Verify parallel operations don't cause conflicts
+- Test caching works correctly
+- Confirm conditional execution skips appropriately
+
+## Acceptance Criteria
+- [ ] Image downloads run in parallel
+- [ ] Fact caching implemented and working
+- [ ] Tasks skip when results already exist
+- [ ] Performance metrics show improvement
+- [ ] No race conditions in parallel operations
+- [ ] Documentation updated with performance notes
--- a/playbooks/docker-host.yaml
+++ b/playbooks/docker-host.yaml
@@ -3,9 +3,9 @@
  hosts: docker_host
  gather_facts: true
  roles:
-    - role: common
-      tags:
-        - common
+    # - role: common
+    #   tags:
+    #     - common
    - role: docker_host
      tags:
        - docker_host
--- a/playbooks/docker-lb.yaml
+++ b/playbooks/docker-lb.yaml
--- a/playbooks/docker.yaml
+++ b/playbooks/docker.yaml
@@ -0,0 +1,5 @@
+---
+- name: Setup Docker Hosts
+  ansible.builtin.import_playbook: docker-host.yaml
+- name: Setup Docker load balancer
+  ansible.builtin.import_playbook: docker-lb.yaml
--- a/playbooks/docker.yml
+++ b/playbooks/docker.yml
@@ -1,5 +0,0 @@
---
- name: Setup Docker Hosts
-  ansible.builtin.import_playbook: docker-host.yml
- name: Setup Docker load balancer
-  ansible.builtin.import_playbook: docker-lb.yml
--- a/playbooks/k3s-agents.yaml
+++ b/playbooks/k3s-agents.yaml
@@ -0,0 +1,16 @@
+- name: Set up Agents
+  hosts: k3s
+  gather_facts: true
+  roles:
+    - role: common
+      when: inventory_hostname in groups["k3s_agent"]
+      tags:
+        - common
+    - role: k3s_agent
+      when: inventory_hostname in groups["k3s_agent"]
+      tags:
+        - k3s_agent
+    # - role: node_exporter
+    #   when: inventory_hostname in groups["k3s_agent"]
+    #   tags:
+    #     - node_exporter
--- a/playbooks/k3s-agents.yml
+++ b/playbooks/k3s-agents.yml
@@ -1,32 +0,0 @@
- name: Set up Agents
-  hosts: k3s
-  gather_facts: true
-  vars:
-    k3s_primary_server_ip: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_host') | list | first }}"
-  pre_tasks:
-    - name: Get K3s token from the first server
-      when: host.ip == k3s_primary_server_ip and inventory_hostname in groups["k3s_server"]
-      slurp:
-        src: /var/lib/rancher/k3s/server/node-token
-      register: k3s_token
-      become: true
-
-    - name: Set fact on k3s_primary_server_ip
-      when: host.ip == k3s_primary_server_ip and inventory_hostname in groups["k3s_server"]
-      set_fact:
-        k3s_token: "{{ k3s_token['content'] | b64decode | trim }}"
-
-  roles:
-    - role: common
-      when: inventory_hostname in groups["k3s_agent"]
-      tags:
-        - common
-    - role: k3s_agent
-      when: inventory_hostname in groups["k3s_agent"]
-      k3s_token: "{{ hostvars[(hostvars | dict2items | map(attribute='value') | map('dict2items') | map('selectattr', 'key', 'match', 'host') | map('selectattr', 'value.ip', 'match', k3s_primary_server_ip ) | select() | first | items2dict).host.hostname].k3s_token }}"
-      tags:
-        - k3s_agent
-    - role: node_exporter
-      when: inventory_hostname in groups["k3s_agent"]
-      tags:
-        - node_exporter
--- a/playbooks/k3s-loadbalancer.yaml
+++ b/playbooks/k3s-loadbalancer.yaml
--- a/playbooks/k3s-servers.yaml
+++ b/playbooks/k3s-servers.yaml
@@ -6,10 +6,12 @@
    - role: common
      tags:
        - common
+      when: inventory_hostname in groups["k3s_server"]
    - role: k3s_server
      tags:
        - k3s_server
      when: inventory_hostname in groups["k3s_server"]
-    - role: node_exporter
-      tags:
-        - node_exporter
+    # - role: node_exporter
+    #   tags:
+    #     - node_exporter
+    #   when: inventory_hostname in groups["k3s_server"]
--- a/playbooks/k3s-storage.yaml
+++ b/playbooks/k3s-storage.yaml
@@ -0,0 +1,16 @@
+- name: Set up storage
+  hosts: k3s_nodes
+  gather_facts: true
+  roles:
+    - role: common
+      when: inventory_hostname in groups["k3s_storage"]
+      tags:
+        - common
+    - role: k3s_storage
+      when: inventory_hostname in groups["k3s_storage"]
+      tags:
+        - k3s_storage
+    # - role: node_exporter
+    #   when: inventory_hostname in groups["k3s_storage"]
+    #   tags:
+    #     - node_exporter
--- a/playbooks/k3s-storage.yml
+++ b/playbooks/k3s-storage.yml
@@ -1,32 +0,0 @@
- name: Set up storage
-  hosts: k3s_nodes
-  gather_facts: true
-  vars:
-    k3s_primary_server_ip: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_host') | list | first }}"
-  pre_tasks:
-    - name: Get K3s token from the first server
-      when: host.ip == k3s_primary_server_ip and inventory_hostname in groups["k3s_server"]
-      slurp:
-        src: /var/lib/rancher/k3s/server/node-token
-      register: k3s_token
-      become: true
-
-    - name: Set fact on k3s_primary_server_ip
-      when: host.ip == k3s_primary_server_ip and inventory_hostname in groups["k3s_server"]
-      set_fact:
-        k3s_token: "{{ k3s_token['content'] | b64decode | trim }}"
-
-  roles:
-    - role: common
-      when: inventory_hostname in groups["k3s_storage"]
-      tags:
-        - common
-    - role: k3s_storage
-      when: inventory_hostname in groups["k3s_storage"]
-      k3s_token: "{{ hostvars[(hostvars | dict2items | map(attribute='value') | map('dict2items') | map('selectattr', 'key', 'match', 'host') | map('selectattr', 'value.ip', 'match', k3s_primary_server_ip ) | select() | first | items2dict).host.hostname].k3s_token }}"
-      tags:
-        - k3s_storage
-    - role: node_exporter
-      when: inventory_hostname in groups["k3s_storage"]
-      tags:
-        - node_exporter
--- a/playbooks/kube-vip.yaml
+++ b/playbooks/kube-vip.yaml
@@ -0,0 +1,18 @@
+---
+# Deploys kube-vip on all k3s server nodes and adds the VIP to their TLS SANs.
+#
+# Migration steps (run once):
+#   1. ansible-playbook playbooks/kube-vip.yaml
+#   2. Update DNS: k3s.seyshiro.de → 192.168.20.2
+#   3. Verify: kubectl get nodes (should work via VIP)
+#   4. Decommission k3s-loadbalancer VM when satisfied
+#
+# The playbook is idempotent — re-running it after migration is safe.
+- name: Deploy kube-vip on k3s server nodes
+  hosts: k3s_server
+  gather_facts: true
+  serial: 1
+  roles:
+    - role: kube_vip
+      tags:
+        - kube_vip
--- a/playbooks/kubernetes_setup.yaml
+++ b/playbooks/kubernetes_setup.yaml
@@ -0,0 +1,10 @@
+---
+- name: Setup Kubernetes Cluster
+  hosts: kubernetes
+  any_errors_fatal: true
+  gather_facts: false
+  vars:
+    is_localhost: "{{ inventory_hostname == '127.0.0.1' }}"
+  roles:
+    - role: kubernetes_argocd
+      when: is_localhost
--- a/playbooks/proxmox-k3s-add-agent.yaml
+++ b/playbooks/proxmox-k3s-add-agent.yaml
@@ -0,0 +1,6 @@
+---
+- name: Create new VM(s)
+  ansible.builtin.import_playbook: proxmox.yaml
+
+- name: Provision VM
+  ansible.builtin.import_playbook: k3s-agents.yaml
--- a/playbooks/proxmox.yaml
+++ b/playbooks/proxmox.yaml
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,7 +1,28 @@
+cachetools==5.5.2
 certifi==2025.1.31
+cfgv==3.4.0
 charset-normalizer==3.4.1
+distlib==0.4.0
+durationpy==0.10
+filelock==3.18.0
+google-auth==2.40.3
+identify==2.6.12
 idna==3.10
+kubernetes==33.1.0
 nc-dnsapi==0.1.3
+nodeenv==1.9.1
+oauthlib==3.3.1
+platformdirs==4.3.8
+pre_commit==4.2.0
 proxmoxer==2.2.0
+pyasn1==0.6.1
+pyasn1_modules==0.4.2
+python-dateutil==2.9.0.post0
+PyYAML==6.0.2
 requests==2.32.3
+requests-oauthlib==2.0.0
+rsa==4.9.1
+six==1.17.0
 urllib3==2.3.0
+virtualenv==20.32.0
+websocket-client==1.8.0
--- a/requirements.yaml
+++ b/requirements.yaml
@@ -0,0 +1,5 @@
+---
+collections:
+  - name: community.docker
+  - name: community.general
+  - name: kubernetes.core
--- a/roles/common/README.md
+++ b/roles/common/README.md
@@ -0,0 +1,72 @@
+# Ansible Role: common
+
+This role configures a baseline set of common configurations for Debian-based systems, including time synchronization, essential packages, hostname, and specific developer tools.
+
+## Requirements
+
+None.
+
+## Role Variables
+
+Available variables are listed below, along with default values (see `vars/main.yml`):
+
+```yaml
+# A list of common packages to install via apt.
+common_packages:
+  - build-essential
+  - curl
+  - git
+  - iperf3
+  - neovim
+  - rsync
+  - smartmontools
+  - sudo
+  - systemd-timesyncd
+  - tree
+  - screen
+  - bat
+  - fd-find
+  - ripgrep
+  - nfs-common
+  - open-iscsi
+  - parted
+
+# The hostname to configure.
+hostname: "new-host"
+```
+
+## Tasks
+
+The role performs the following tasks:
+
+1.  **Configure Time**: Sets up `systemd-timesyncd` and timezone.
+2.  **Configure Packages**: Installs the list of `common_packages`.
+3.  **Configure Hostname**: Sets the system hostname.
+4.  **Configure Extra-Packages**:
+    - Installs `eza` (modern ls replacement).
+    - Installs `bottom` (process viewer).
+    - Installs `neovim` from AppImage and clones a custom configuration.
+5.  **Configure Bash**: Sets up bash aliases and prompt.
+6.  **Configure SSH**: Configures `sshd_config` for security.
+
+## Dependencies
+
+None.
+
+## Example Playbook
+
+```yaml
+- hosts: servers
+  roles:
+    - role: common
+      vars:
+        hostname: "my-server"
+```
+
+## License
+
+MIT
+
+## Author Information
+
+This role was created in 2025 by [TuDatTr](https://codeberg.org/tudattr/).
--- a/roles/common/files/kitty/infocmp
+++ b/roles/common/files/kitty/infocmp
@@ -0,0 +1,77 @@
+#	Reconstructed via infocmp from file: /usr/lib/kitty/terminfo/./x/xterm-kitty
+xterm-kitty|KovIdTTY,
+	am, bw, ccc, hs, km, mc5i, mir, msgr, npc, xenl, Su, Tc, XF, fullkbd,
+	colors#0x100, cols#80, it#8, lines#24, pairs#0x7fff,
+	acsc=++\,\,--..00``aaffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz{{||}}~~,
+	bel=^G, blink=\E[5m, bold=\E[1m, cbt=\E[Z, civis=\E[?25l,
+	clear=\E[H\E[2J, cnorm=\E[?12h\E[?25h, cr=\r,
+	csr=\E[%i%p1%d;%p2%dr, cub=\E[%p1%dD, cub1=^H,
+	cud=\E[%p1%dB, cud1=\n, cuf=\E[%p1%dC, cuf1=\E[C,
+	cup=\E[%i%p1%d;%p2%dH, cuu=\E[%p1%dA, cuu1=\E[A,
+	cvvis=\E[?12;25h, dch=\E[%p1%dP, dch1=\E[P, dim=\E[2m,
+	dl=\E[%p1%dM, dl1=\E[M, dsl=\E]2;\E\\, ech=\E[%p1%dX,
+	ed=\E[J, el=\E[K, el1=\E[1K, flash=\E[?5h$<100/>\E[?5l,
+	fsl=^G, home=\E[H, hpa=\E[%i%p1%dG, ht=^I, hts=\EH,
+	ich=\E[%p1%d@, il=\E[%p1%dL, il1=\E[L, ind=\n,
+	indn=\E[%p1%dS,
+	initc=\E]4;%p1%d;rgb:%p2%{255}%*%{1000}%/%2.2X/%p3%{255}%*%{1000}%/%2.2X/%p4%{255}%*%{1000}%/%2.2X\E\\,
+	kBEG=\E[1;2E, kDC=\E[3;2~, kEND=\E[1;2F, kHOM=\E[1;2H,
+	kIC=\E[2;2~, kLFT=\E[1;2D, kNXT=\E[6;2~, kPRV=\E[5;2~,
+	kRIT=\E[1;2C, kbeg=\EOE, kbs=^?, kcbt=\E[Z, kcub1=\EOD,
+	kcud1=\EOB, kcuf1=\EOC, kcuu1=\EOA, kdch1=\E[3~, kend=\EOF,
+	kf1=\EOP, kf10=\E[21~, kf11=\E[23~, kf12=\E[24~,
+	kf13=\E[1;2P, kf14=\E[1;2Q, kf15=\E[13;2~, kf16=\E[1;2S,
+	kf17=\E[15;2~, kf18=\E[17;2~, kf19=\E[18;2~, kf2=\EOQ,
+	kf20=\E[19;2~, kf21=\E[20;2~, kf22=\E[21;2~,
+	kf23=\E[23;2~, kf24=\E[24;2~, kf25=\E[1;5P, kf26=\E[1;5Q,
+	kf27=\E[13;5~, kf28=\E[1;5S, kf29=\E[15;5~, kf3=\EOR,
+	kf30=\E[17;5~, kf31=\E[18;5~, kf32=\E[19;5~,
+	kf33=\E[20;5~, kf34=\E[21;5~, kf35=\E[23;5~,
+	kf36=\E[24;5~, kf37=\E[1;6P, kf38=\E[1;6Q, kf39=\E[13;6~,
+	kf4=\EOS, kf40=\E[1;6S, kf41=\E[15;6~, kf42=\E[17;6~,
+	kf43=\E[18;6~, kf44=\E[19;6~, kf45=\E[20;6~,
+	kf46=\E[21;6~, kf47=\E[23;6~, kf48=\E[24;6~,
+	kf49=\E[1;3P, kf5=\E[15~, kf50=\E[1;3Q, kf51=\E[13;3~,
+	kf52=\E[1;3S, kf53=\E[15;3~, kf54=\E[17;3~,
+	kf55=\E[18;3~, kf56=\E[19;3~, kf57=\E[20;3~,
+	kf58=\E[21;3~, kf59=\E[23;3~, kf6=\E[17~, kf60=\E[24;3~,
+	kf61=\E[1;4P, kf62=\E[1;4Q, kf63=\E[13;4~, kf7=\E[18~,
+	kf8=\E[19~, kf9=\E[20~, khome=\EOH, kich1=\E[2~,
+	kind=\E[1;2B, kmous=\E[M, knp=\E[6~, kpp=\E[5~,
+	kri=\E[1;2A, oc=\E]104\007, op=\E[39;49m, rc=\E8,
+	rep=%p1%c\E[%p2%{1}%-%db, rev=\E[7m, ri=\EM,
+	rin=\E[%p1%dT, ritm=\E[23m, rmacs=\E(B, rmam=\E[?7l,
+	rmcup=\E[?1049l, rmir=\E[4l, rmkx=\E[?1l, rmso=\E[27m,
+	rmul=\E[24m, rs1=\E]\E\\\Ec, sc=\E7,
+	setab=\E[%?%p1%{8}%<%t4%p1%d%e%p1%{16}%<%t10%p1%{8}%-%d%e48;5;%p1%d%;m,
+	setaf=\E[%?%p1%{8}%<%t3%p1%d%e%p1%{16}%<%t9%p1%{8}%-%d%e38;5;%p1%d%;m,
+	sgr=%?%p9%t\E(0%e\E(B%;\E[0%?%p6%t;1%;%?%p2%t;4%;%?%p1%p3%|%t;7%;%?%p4%t;5%;%?%p7%t;8%;%?%p5%t;2%;m,
+	sgr0=\E(B\E[m, sitm=\E[3m, smacs=\E(0, smam=\E[?7h,
+	smcup=\E[?1049h, smir=\E[4h, smkx=\E[?1h, smso=\E[7m,
+	smul=\E[4m, tbc=\E[3g, tsl=\E]2;, u6=\E[%i%d;%dR, u7=\E[6n,
+	u8=\E[?%[;0123456789]c, u9=\E[c, vpa=\E[%i%p1%dd,
+	BD=\E[?2004l, BE=\E[?2004h, Cr=\E]112\007,
+	Cs=\E]12;%p1%s\007, Ms=\E]52;%p1%s;%p2%s\E\\,
+	PE=\E[201~, PS=\E[200~, RV=\E[>c, Se=\E[2 q,
+	Setulc=\E[58:2:%p1%{65536}%/%d:%p1%{256}%/%{255}%&%d:%p1%{255}%&%d%;m,
+	Smulx=\E[4:%p1%dm, Ss=\E[%p1%d q, Sync=\EP=%p1%ds\E\\,
+	XR=\E[>0q, fd=\E[?1004l, fe=\E[?1004h, kBEG3=\E[1;3E,
+	kBEG4=\E[1;4E, kBEG5=\E[1;5E, kBEG6=\E[1;6E,
+	kBEG7=\E[1;7E, kDC3=\E[3;3~, kDC4=\E[3;4~, kDC5=\E[3;5~,
+	kDC6=\E[3;6~, kDC7=\E[3;7~, kDN=\E[1;2B, kDN3=\E[1;3B,
+	kDN4=\E[1;4B, kDN5=\E[1;5B, kDN6=\E[1;6B, kDN7=\E[1;7B,
+	kEND3=\E[1;3F, kEND4=\E[1;4F, kEND5=\E[1;5F,
+	kEND6=\E[1;6F, kEND7=\E[1;7F, kHOM3=\E[1;3H,
+	kHOM4=\E[1;4H, kHOM5=\E[1;5H, kHOM6=\E[1;6H,
+	kHOM7=\E[1;7H, kIC3=\E[2;3~, kIC4=\E[2;4~, kIC5=\E[2;5~,
+	kIC6=\E[2;6~, kIC7=\E[2;7~, kLFT3=\E[1;3D, kLFT4=\E[1;4D,
+	kLFT5=\E[1;5D, kLFT6=\E[1;6D, kLFT7=\E[1;7D,
+	kNXT3=\E[6;3~, kNXT4=\E[6;4~, kNXT5=\E[6;5~,
+	kNXT6=\E[6;6~, kNXT7=\E[6;7~, kPRV3=\E[5;3~,
+	kPRV4=\E[5;4~, kPRV5=\E[5;5~, kPRV6=\E[5;6~,
+	kPRV7=\E[5;7~, kRIT3=\E[1;3C, kRIT4=\E[1;4C,
+	kRIT5=\E[1;5C, kRIT6=\E[1;6C, kRIT7=\E[1;7C, kUP=\E[1;2A,
+	kUP3=\E[1;3A, kUP4=\E[1;4A, kUP5=\E[1;5A, kUP6=\E[1;6A,
+	kUP7=\E[1;7A, kxIN=\E[I, kxOUT=\E[O, rmxx=\E[29m,
+	setrgbb=\E[48:2:%p1%d:%p2%d:%p3%dm,
+	setrgbf=\E[38:2:%p1%d:%p2%d:%p3%dm, smxx=\E[9m,
--- a/roles/common/files/ssh/root/sshd_config
+++ b/roles/common/files/ssh/root/sshd_config
@@ -16,4 +16,3 @@ TrustedUserCAKeys /etc/ssh/vault-ca.pub
 UseDNS yes
 AcceptEnv LANG LC_*
 Subsystem	sftp	/usr/lib/openssh/sftp-server
-
--- a/roles/common/handlers/main.yaml
+++ b/roles/common/handlers/main.yaml
@@ -0,0 +1,12 @@
+---
+- name: Restart sshd
+  service:
+    name: sshd
+    state: restarted
+  become: true
+
+- name: Restart timesyncd
+  ansible.builtin.systemd:
+    name: systemd-timesyncd
+    state: restarted
+  become: true
--- a/roles/common/handlers/main.yml
+++ b/roles/common/handlers/main.yml
@@ -1,6 +0,0 @@
---
- name: Restart sshd
-  service:
-    name: sshd
-    state: restarted
-  become: yes
--- a/roles/common/tasks/bash.yaml
+++ b/roles/common/tasks/bash.yaml
@@ -22,3 +22,16 @@
 - name: Compile ghostty terminalinfo
  ansible.builtin.command: "tic -x {{ ansible_env.HOME }}/ghostty"
  when: ghostty_terminfo.changed
+
+- name: Copy kitty infocmp
+  ansible.builtin.copy:
+    src: files/kitty/infocmp
+    dest: "{{ ansible_env.HOME }}/kitty"
+    owner: "{{ ansible_user_id }}"
+    group: "{{ ansible_user_id }}"
+    mode: "0644"
+  register: kitty_terminfo
+
+- name: Compile kitty terminalinfo
+  ansible.builtin.command: "tic -x {{ ansible_env.HOME }}/kitty"
+  when: kitty_terminfo.changed
--- a/roles/common/tasks/extra_packages.yaml
+++ b/roles/common/tasks/extra_packages.yaml
@@ -79,12 +79,13 @@
    path: ~/.config/nvim
  register: nvim_config

- name: Clone LazyVim starter to Neovim config directory
+- name: Clone personal Neovim config directory
  ansible.builtin.git:
-    repo: https://github.com/LazyVim/starter
+    repo: https://codeberg.org/tudattr/nvim
    dest: ~/.config/nvim
    clone: true
    update: false
+    version: 1.0.0
  when: not nvim_config.stat.exists

 - name: Remove .git directory from Neovim config
--- a/roles/common/tasks/hostname.yaml
+++ b/roles/common/tasks/hostname.yaml
--- a/roles/common/tasks/main.yaml
+++ b/roles/common/tasks/main.yaml
@@ -0,0 +1,13 @@
+---
+- name: Configure Time
+  ansible.builtin.include_tasks: time.yaml
+- name: Configure Packages
+  ansible.builtin.include_tasks: packages.yaml
+- name: Configure Hostname
+  ansible.builtin.include_tasks: hostname.yaml
+- name: Configure Extra-Packages
+  ansible.builtin.include_tasks: extra_packages.yaml
+- name: Configure Bash
+  ansible.builtin.include_tasks: bash.yaml
+- name: Configure SSH
+  ansible.builtin.include_tasks: sshd.yaml
--- a/roles/common/tasks/main.yml
+++ b/roles/common/tasks/main.yml
@@ -1,13 +0,0 @@
---
- name: Configure Time
-  ansible.builtin.include_tasks: time.yml
- name: Configure Packages
-  ansible.builtin.include_tasks: packages.yml
- name: Configure Hostname
-  ansible.builtin.include_tasks: hostname.yml
- name: Configure Extra-Packages
-  ansible.builtin.include_tasks: extra_packages.yml
- name: Configure Bash
-  ansible.builtin.include_tasks: bash.yml
- name: Configure SSH
-  ansible.builtin.include_tasks: sshd.yml
--- a/roles/common/tasks/packages.yaml
+++ b/roles/common/tasks/packages.yaml
--- a/roles/common/tasks/sshd.yaml
+++ b/roles/common/tasks/sshd.yaml
--- a/roles/common/tasks/time.yaml
+++ b/roles/common/tasks/time.yaml
@@ -0,0 +1,34 @@
+---
+- name: Set timezone
+  community.general.timezone:
+    name: "{{ timezone }}"
+  become: true
+  when: ansible_user_id != "root"
+
+- name: Set timezone
+  community.general.timezone:
+    name: "{{ timezone }}"
+  when: ansible_user_id == "root"
+
+- name: Configure NTP servers for systemd-timesyncd
+  ansible.builtin.lineinfile:
+    path: /etc/systemd/timesyncd.conf
+    regexp: "^#?NTP="
+    line: "NTP=0.debian.pool.ntp.org 1.debian.pool.ntp.org 2.debian.pool.ntp.org 3.debian.pool.ntp.org"
+  become: true
+  notify: Restart timesyncd
+
+- name: Enable and start systemd-timesyncd
+  ansible.builtin.systemd:
+    name: systemd-timesyncd
+    enabled: true
+    state: started
+  become: true
+  when: ansible_user_id != "root"
+
+- name: Enable and start systemd-timesyncd
+  ansible.builtin.systemd:
+    name: systemd-timesyncd
+    enabled: true
+    state: started
+  when: ansible_user_id == "root"
--- a/roles/common/tasks/time.yml
+++ b/roles/common/tasks/time.yml
@@ -1,11 +0,0 @@
---
- name: Set timezone
-  community.general.timezone:
-    name: "{{ timezone }}"
-  become: true
-  when: ansible_user_id != "root"
-
- name: Set timezone
-  community.general.timezone:
-    name: "{{ timezone }}"
-  when: ansible_user_id == "root"
--- a/roles/common/vars/main.yaml
+++ b/roles/common/vars/main.yaml
@@ -13,3 +13,6 @@ common_packages:
  - bat
  - fd-find
  - ripgrep
+  - nfs-common
+  - open-iscsi
+  - parted
--- a/roles/docker_host/README.md
+++ b/roles/docker_host/README.md
@@ -0,0 +1,85 @@
+# Ansible Role: Docker Host
+
+This role sets up a Docker host, installs Docker, and configures it according to the provided variables. It also handles user and group management, directory setup, and deployment of Docker Compose services.
+
+## Role Variables
+
+### General
+
+- `docker_host_package_common_dependencies`: A list of common packages to be installed on the host.
+  - Default: `nfs-common`
+- `apt_lock_files`: A list of apt lock files to check.
+- `arch`: The architecture of the host.
+  - Default: `arm64` if `ansible_architecture` is `aarch64`, otherwise `amd64`.
+
+### Docker
+
+- `docker.url`: The URL for the Docker repository.
+  - Default: `https://download.docker.com/linux`
+- `docker.apt_release_channel`: The Docker apt release channel.
+  - Default: `stable`
+- `docker.directories.local`: The local directory for Docker data.
+  - Default: `/opt/local`
+- `docker.directories.config`: The directory for Docker configurations.
+  - Default: `/opt/config`
+- `docker.directories.compose`: The directory for Docker Compose files.
+  - Default: `/opt/compose`
+
+### Keycloak
+
+- `keycloak_config`: A dictionary containing the Keycloak configuration. See `templates/keycloak/realm.json.j2` for more details.
+
+### Services
+
+- `services`: A list of dictionaries, where each dictionary represents a Docker Compose service. See `templates/compose.yaml.j2` for more details.
+
+## Tasks
+
+The role performs the following tasks:
+
+1.  **Setup VM**:
+    - Includes `non-free` and `non-free-firmware` components in the apt sources.
+    - Installs common packages.
+    - Removes cloud kernel packages.
+    - Reboots the host.
+2.  **Install Docker**:
+    - Uninstalls old Docker versions.
+    - Installs dependencies for using repositories over HTTPS.
+    - Adds the Docker apt key and repository.
+    - Installs Docker Engine, containerd, and Docker Compose.
+3.  **Setup user and group for Docker**:
+    - Ensures the `docker` group exists.
+    - Adds the `ansible_user_id` to the `docker` group.
+    - Reboots the host.
+4.  **Setup directory structure for Docker**:
+    - Creates necessary directories for Docker and media.
+    - Sets ownership of the directories.
+    - Mounts NFS shares.
+5.  **Deploy configs**:
+    - Sets up Keycloak realms if the host is a Keycloak host.
+6.  **Deploy Docker Compose**:
+    - Copies the Docker Compose file to the target host.
+7.  **Publish metrics**:
+    - Copies the `daemon.json` file to `/etc/docker/daemon.json` to enable metrics.
+
+## Handlers
+
+- `Restart docker`: Restarts the Docker service.
+- `Restart compose`: Restarts the Docker Compose services.
+- `Restart host`: Reboots the host.
+
+## Usage
+
+To use this role, include it in your playbook and set the required variables.
+
+```yaml
+- hosts: docker_hosts
+  roles:
+    - role: docker_host
+      vars:
+        # Your variables here
+```
+
+## License
+
+MIT
--- a/roles/docker_host/handlers/main.yaml
+++ b/roles/docker_host/handlers/main.yaml
--- a/roles/docker_host/tasks/10_setup.yaml
+++ b/roles/docker_host/tasks/10_setup.yaml
--- a/roles/docker_host/tasks/20_installation.yaml
+++ b/roles/docker_host/tasks/20_installation.yaml
@@ -26,6 +26,7 @@
    - curl
    - gnupg
    - lsb-release
+    - qemu-guest-agent
  become: true

 - name: Add Docker apt key.
--- a/roles/docker_host/tasks/30_user_group_setup.yaml
+++ b/roles/docker_host/tasks/30_user_group_setup.yaml
--- a/roles/docker_host/tasks/40_directory_setup.yaml
+++ b/roles/docker_host/tasks/40_directory_setup.yaml
@@ -5,7 +5,6 @@
    state: directory
    mode: "0755"
  loop:
-    - /media/docker
    - /media/series
    - /media/movies
    - /media/songs
@@ -34,8 +33,8 @@
    opts: defaults,nolock,_netdev,auto,bg
    state: mounted
  loop:
-    - /media/docker
    - /media/series
    - /media/movies
    - /media/songs
+    - /media/downloads
  become: true
--- a/roles/docker_host/tasks/50_provision.yaml
+++ b/roles/docker_host/tasks/50_provision.yaml
--- a/roles/docker_host/tasks/60_deploy_compose.yaml
+++ b/roles/docker_host/tasks/60_deploy_compose.yaml
--- a/roles/docker_host/tasks/70_export.yaml
+++ b/roles/docker_host/tasks/70_export.yaml
--- a/roles/docker_host/tasks/main.yaml
+++ b/roles/docker_host/tasks/main.yaml
@@ -0,0 +1,21 @@
+---
+- name: Setup VM
+  ansible.builtin.include_tasks: 10_setup.yaml
+
+- name: Install docker
+  ansible.builtin.include_tasks: 20_installation.yaml
+
+- name: Setup user and group for docker
+  ansible.builtin.include_tasks: 30_user_group_setup.yaml
+
+- name: Setup directory structure for docker
+  ansible.builtin.include_tasks: 40_directory_setup.yaml
+
+# - name: Deploy configs
+#   ansible.builtin.include_tasks: 50_provision.yaml
+
+- name: Deploy docker compose
+  ansible.builtin.include_tasks: 60_deploy_compose.yaml
+
+- name: Publish metrics
+  ansible.builtin.include_tasks: 70_export.yaml
--- a/roles/docker_host/tasks/main.yml
+++ b/roles/docker_host/tasks/main.yml
@@ -1,20 +0,0 @@
---
- name: Setup VM
-  ansible.builtin.include_tasks: 10_setup.yml
- name: Install docker
-  ansible.builtin.include_tasks: 20_installation.yml
-
- name: Setup user and group for docker
-  ansible.builtin.include_tasks: 30_user_group_setup.yml
-
- name: Setup directory structure for docker
-  ansible.builtin.include_tasks: 40_directory_setup.yml
-
- name: Deploy configs
-  ansible.builtin.include_tasks: 50_provision.yml
-
- name: Deploy docker compose
-  ansible.builtin.include_tasks: 60_deploy_compose.yml
-
- name: Publish metrics
-  ansible.builtin.include_tasks: 70_export.yml
--- a/roles/docker_host/vars/main.yaml
+++ b/roles/docker_host/vars/main.yaml
@@ -1,7 +1,5 @@
 docker_host_package_common_dependencies:
  - nfs-common
-  - firmware-misc-nonfree
-  - linux-image-amd64

 apt_lock_files:
  - /var/lib/dpkg/lock
--- a/roles/edge_vps/README.md
+++ b/roles/edge_vps/README.md
@@ -0,0 +1,75 @@
+# Edge VPS
+
+Configures edge VPS instances with WireGuard VPN, Traefik reverse proxy, Pangolin, and Elastic Fleet Agent.
+
+## Requirements
+
+- Docker and Docker Compose installed
+- Ansible community.docker collection
+
+## Role Variables
+
+### WireGuard
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_wireguard_address` | `10.133.7.1/24` | WireGuard interface address |
+| `edge_vps_wireguard_port` | `61975` | WireGuard listen port |
+| `edge_vps_wireguard_interface` | `wg0` | WireGuard interface name |
+| `edge_vps_wireguard_routes` | `[]` | List of routes to add (network, gateway) |
+
+### Traefik
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_traefik_config_dir` | `/root/config/traefik` | Traefik config directory |
+| `edge_vps_acme_email` | - | Email for Let's Encrypt |
+
+### Pangolin
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_pangolin_dashboard_url` | - | Pangolin dashboard URL |
+| `edge_vps_pangolin_base_endpoint` | - | Pangolin base endpoint |
+| `edge_vps_pangolin_base_domain` | - | Base domain for Pangolin |
+
+### Elastic Agent
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_elastic_version` | `9.2.2` | Elastic Agent version |
+| `edge_vps_elastic_fleet_url` | - | Fleet server URL |
+| `edge_vps_elastic_dns_server` | `10.43.0.10` | DNS server for agent |
+
+## Secrets
+
+Store secrets in `vars/group_vars/vps/secrets.yaml` (ansible-vault encrypted):
+
+```yaml
+vault_edge_vps:
+  wireguard:
+    private_key: "..."
+    peers: [...]
+  pangolin:
+    server_secret: "..."
+  traefik:
+    cloudflare_api_token: "..."
+  elastic:
+    fleet_enrollment_token: "..."
+```
+
+## Dependencies
+
+None.
+
+## Example Playbook
+
+```yaml
+- hosts: vps
+  roles:
+    - role: edge_vps
+```
+
+## License
+
+MIT
--- a/roles/edge_vps/defaults/main.yaml
+++ b/roles/edge_vps/defaults/main.yaml
@@ -0,0 +1,11 @@
+---
+edge_vps_config_base: /root/config
+edge_vps_wireguard_config_dir: /etc/wireguard
+edge_vps_wireguard_interface: wg0
+edge_vps_wireguard_address: "10.133.7.1/24"
+edge_vps_wireguard_port: 61975
+edge_vps_traefik_config_dir: "{{ edge_vps_config_base }}/traefik"
+edge_vps_traefik_logs_dir: "{{ edge_vps_traefik_config_dir }}/logs"
+edge_vps_pangolin_config_dir: "{{ edge_vps_config_base }}/pangolin"
+edge_vps_elastic_config_dir: "{{ edge_vps_config_base }}/elastic-agent"
+edge_vps_elastic_state_dir: /var/lib/elastic-agent/elastic-system/elastic-agent/state
--- a/roles/edge_vps/docs/plans/2026-02-24-edge-vps-role.md
+++ b/roles/edge_vps/docs/plans/2026-02-24-edge-vps-role.md
@@ -0,0 +1,715 @@
+# Edge VPS Ansible Role Implementation Plan
+
+> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
+
+**Goal:** Create a modular Ansible role for deploying edge VPS infrastructure components (WireGuard, Traefik, Pangolin, Elastic Agent).
+
+**Architecture:** Modular task-based role following existing patterns in the repository. Each component has its own numbered task file. Configs are templated with secrets from ansible-vault encrypted group_vars.
+
+**Tech Stack:** Ansible, Jinja2 templates, Docker Compose, WireGuard, Traefik, Pangolin, Elastic Fleet Agent
+
+---
+
+### Task 1: Create Role Directory Structure
+
+**Files:**
+- Create: `roles/edge_vps/tasks/main.yaml`
+- Create: `roles/edge_vps/handlers/main.yaml`
+- Create: `roles/edge_vps/defaults/main.yaml`
+- Create: `roles/edge_vps/templates/` directory structure
+
+**Step 1: Create directory structure**
+
+Run:
+```bash
+mkdir -p tasks handlers defaults templates/wireguard templates/traefik templates/pangolin templates/elastic-agent
+```
+
+**Step 2: Create defaults/main.yaml**
+
+```yaml
+---
+edge_vps_config_base: /root/config
+edge_vps_wireguard_config_dir: /etc/wireguard
+edge_vps_wireguard_interface: wg0
+edge_vps_wireguard_address: "10.133.7.1/24"
+edge_vps_wireguard_port: 61975
+edge_vps_traefik_config_dir: "{{ edge_vps_config_base }}/traefik"
+edge_vps_traefik_logs_dir: "{{ edge_vps_traefik_config_dir }}/logs"
+edge_vps_pangolin_config_dir: "{{ edge_vps_config_base }}/pangolin"
+edge_vps_elastic_config_dir: "{{ edge_vps_config_base }}/elastic-agent"
+edge_vps_elastic_state_dir: /var/lib/elastic-agent/elastic-system/elastic-agent/state
+```
+
+**Step 3: Create handlers/main.yaml**
+
+```yaml
+---
+- name: Restart wireguard
+  ansible.builtin.systemd:
+    name: "wg-quick@{{ edge_vps_wireguard_interface }}"
+    state: restarted
+  listen: restart wireguard
+
+- name: Restart traefik
+  ansible.builtin.command:
+    cmd: docker compose restart
+    chdir: "{{ edge_vps_traefik_config_dir }}"
+  listen: restart traefik
+```
+
+**Step 4: Commit**
+
+```bash
+git add defaults/main.yaml handlers/main.yaml
+git commit -m "feat(edge_vps): add role structure and handlers"
+```
+
+---
+
+### Task 2: Create Directory Setup Task
+
+**Files:**
+- Create: `roles/edge_vps/tasks/10_directories.yaml`
+
+**Step 1: Create 10_directories.yaml**
+
+```yaml
+---
+- name: Create config base directory
+  ansible.builtin.file:
+    path: "{{ edge_vps_config_base }}"
+    state: directory
+    mode: "0755"
+
+- name: Create Traefik directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: "0755"
+  loop:
+    - "{{ edge_vps_traefik_config_dir }}"
+    - "{{ edge_vps_traefik_logs_dir }}"
+
+- name: Create Pangolin config directory
+  ansible.builtin.file:
+    path: "{{ edge_vps_pangolin_config_dir }}"
+    state: directory
+    mode: "0755"
+
+- name: Create Elastic Agent directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: "0755"
+  loop:
+    - "{{ edge_vps_elastic_config_dir }}"
+    - "{{ edge_vps_elastic_state_dir }}"
+```
+
+**Step 2: Commit**
+
+```bash
+git add tasks/10_directories.yaml
+git commit -m "feat(edge_vps): add directory setup task"
+```
+
+---
+
+### Task 3: Create WireGuard Task and Template
+
+**Files:**
+- Create: `roles/edge_vps/tasks/20_wireguard.yaml`
+- Create: `roles/edge_vps/templates/wireguard/wg0.conf.j2`
+
+**Step 1: Create templates/wireguard/wg0.conf.j2**
+
+```jinja2
+[Interface]
+Address = {{ edge_vps_wireguard_address }}
+ListenPort = {{ edge_vps_wireguard_port }}
+PrivateKey = {{ vault_edge_vps.wireguard.private_key }}
+
+PostUp   = sysctl -w net.ipv4.ip_forward=1
+PostUp   = iptables -A FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
+PostUp   = iptables -A FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
+{% for route in edge_vps_wireguard_routes | default([]) %}
+PostUp   = ip route add {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
+{% endfor %}
+PostDown = iptables -D FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
+PostDown = iptables -D FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
+{% for route in edge_vps_wireguard_routes | default([]) %}
+PostDown = ip route del {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
+{% endfor %}
+
+{% for peer in vault_edge_vps.wireguard.peers %}
+[Peer]
+# {{ peer.name }}
+PublicKey = {{ peer.public_key }}
+PresharedKey = {{ peer.preshared_key }}
+AllowedIPs = {{ peer.allowed_ips }}
+
+{% endfor %}
+```
+
+**Step 2: Create tasks/20_wireguard.yaml**
+
+```yaml
+---
+- name: Install WireGuard
+  ansible.builtin.apt:
+    name: wireguard
+    state: present
+    update_cache: true
+
+- name: Deploy WireGuard config
+  ansible.builtin.template:
+    src: wireguard/wg0.conf.j2
+    dest: "{{ edge_vps_wireguard_config_dir }}/{{ edge_vps_wireguard_interface }}.conf"
+    mode: "0600"
+  notify: restart wireguard
+
+- name: Enable WireGuard
+  ansible.builtin.systemd:
+    name: "wg-quick@{{ edge_vps_wireguard_interface }}"
+    enabled: true
+    state: started
+```
+
+**Step 3: Commit**
+
+```bash
+git add tasks/20_wireguard.yaml templates/wireguard/wg0.conf.j2
+git commit -m "feat(edge_vps): add WireGuard setup task and template"
+```
+
+---
+
+### Task 4: Create Traefik Task and Template
+
+**Files:**
+- Create: `roles/edge_vps/tasks/30_traefik.yaml`
+- Create: `roles/edge_vps/templates/traefik/traefik_config.yml.j2`
+
+**Step 1: Create templates/traefik/traefik_config.yml.j2**
+
+```jinja2
+api:
+  insecure: true
+  dashboard: true
+
+providers:
+  http:
+    endpoint: "http://pangolin:3001/api/v1/traefik-config"
+    pollInterval: "5s"
+  file:
+    filename: "/etc/traefik/dynamic_config.yml"
+
+experimental:
+  plugins:
+    badger:
+      moduleName: "github.com/fosrl/badger"
+      version: "v1.2.1"
+
+log:
+  level: "INFO"
+  format: "common"
+  maxSize: 100
+  maxBackups: 3
+  maxAge: 3
+  compress: true
+
+certificatesResolvers:
+  letsencrypt:
+    acme:
+      dnsChallenge:
+        provider: "cloudflare"
+      email: "{{ edge_vps_acme_email }}"
+      storage: "/letsencrypt/acme.json"
+      caServer: "https://acme-v02.api.letsencrypt.org/directory"
+
+entryPoints:
+  web:
+    address: ":80"
+  websecure:
+    address: ":443"
+    transport:
+      respondingTimeouts:
+        readTimeout: "30m"
+    http:
+      tls:
+        certResolver: "letsencrypt"
+  tcp-6443:
+    address: ":6443/tcp"
+
+serversTransport:
+  insecureSkipVerify: true
+
+ping:
+  entryPoint: "web"
+
+accessLog:
+  filePath: "/var/log/traefik/access.log"
+  format: common
+```
+
+**Step 2: Create tasks/30_traefik.yaml**
+
+```yaml
+---
+- name: Deploy Traefik config
+  ansible.builtin.template:
+    src: traefik/traefik_config.yml.j2
+    dest: "{{ edge_vps_traefik_config_dir }}/traefik_config.yml"
+    mode: "0644"
+  notify: restart traefik
+
+- name: Deploy Cloudflare credentials for ACME
+  ansible.builtin.copy:
+    content: |
+      CF_DNS_API_TOKEN={{ vault_edge_vps.traefik.cloudflare_api_token }}
+    dest: "{{ edge_vps_traefik_config_dir }}/cloudflare.env"
+    mode: "0600"
+  no_log: true
+```
+
+**Step 3: Commit**
+
+```bash
+git add tasks/30_traefik.yaml templates/traefik/traefik_config.yml.j2
+git commit -m "feat(edge_vps): add Traefik setup task and template"
+```
+
+---
+
+### Task 5: Create Pangolin Task and Templates
+
+**Files:**
+- Create: `roles/edge_vps/tasks/40_pangolin.yaml`
+- Create: `roles/edge_vps/templates/pangolin/config.yml.j2`
+- Create: `roles/edge_vps/templates/pangolin/docker-compose.yml.j2`
+
+**Step 1: Create templates/pangolin/config.yml.j2**
+
+```jinja2
+gerbil:
+    start_port: 51820
+    base_endpoint: "{{ edge_vps_pangolin_base_endpoint }}"
+
+app:
+    dashboard_url: "{{ edge_vps_pangolin_dashboard_url }}"
+    log_level: "info"
+    telemetry:
+        anonymous_usage: true
+
+domains:
+    domain1:
+        base_domain: "{{ edge_vps_pangolin_base_domain }}"
+
+server:
+    secret: "{{ vault_edge_vps.pangolin.server_secret }}"
+    cors:
+        origins: ["{{ edge_vps_pangolin_dashboard_url }}"]
+        methods: ["GET", "POST", "PUT", "DELETE", "PATCH"]
+        allowed_headers: ["X-CSRF-Token", "Content-Type"]
+        credentials: false
+    maxmind_db_path: "./config/GeoLite2-Country.mmdb"
+
+flags:
+    require_email_verification: false
+    disable_signup_without_invite: true
+    disable_user_create_org: false
+    allow_raw_resources: true
+```
+
+**Step 2: Create templates/pangolin/docker-compose.yml.j2**
+
+```yaml
+services:
+  pangolin:
+    image: fosrl/pangolin:latest
+    container_name: pangolin
+    restart: unless-stopped
+    ports:
+      - "3001:3001"
+      - "443:443"
+      - "80:80"
+    volumes:
+      - ./config.yml:/app/config/config.yml:ro
+      - ./letsencrypt:/letsencrypt
+    depends_on:
+      - gerbil
+
+  gerbil:
+    image: fosrl/gerbil:latest
+    container_name: gerbil
+    restart: unless-stopped
+    network_mode: host
+    cap_add:
+      - NET_ADMIN
+      - SYS_MODULE
+    volumes:
+      - /lib/modules:/lib/modules
+```
+
+**Step 3: Create tasks/40_pangolin.yaml**
+
+```yaml
+---
+- name: Deploy Pangolin config
+  ansible.builtin.template:
+    src: pangolin/config.yml.j2
+    dest: "{{ edge_vps_pangolin_config_dir }}/config.yml"
+    mode: "0644"
+  notify: restart pangolin
+
+- name: Deploy Pangolin docker-compose
+  ansible.builtin.template:
+    src: pangolin/docker-compose.yml.j2
+    dest: "{{ edge_vps_pangolin_config_dir }}/docker-compose.yml"
+    mode: "0644"
+
+- name: Create letsencrypt directory for Pangolin
+  ansible.builtin.file:
+    path: "{{ edge_vps_pangolin_config_dir }}/letsencrypt"
+    state: directory
+    mode: "0755"
+
+- name: Start Pangolin
+  community.docker.docker_compose_v2:
+    project_src: "{{ edge_vps_pangolin_config_dir }}"
+    state: present
+```
+
+**Step 4: Commit**
+
+```bash
+git add tasks/40_pangolin.yaml templates/pangolin/
+git commit -m "feat(edge_vps): add Pangolin setup task and templates"
+```
+
+---
+
+### Task 6: Create Elastic Agent Task and Templates
+
+**Files:**
+- Create: `roles/edge_vps/tasks/50_elastic_agent.yaml`
+- Create: `roles/edge_vps/templates/elastic-agent/docker-compose.yml.j2`
+- Create: `roles/edge_vps/templates/elastic-agent/elastic-agent.yml.j2`
+
+**Step 1: Create templates/elastic-agent/elastic-agent.yml.j2**
+
+```yaml
+fleet:
+  enabled: true
+```
+
+**Step 2: Create templates/elastic-agent/docker-compose.yml.j2**
+
+```yaml
+services:
+  elastic-agent:
+    image: docker.elastic.co/elastic-agent/elastic-agent:{{ edge_vps_elastic_version }}
+    container_name: elastic-agent
+    restart: always
+    network_mode: host
+    dns:
+      - {{ edge_vps_elastic_dns_server }}
+    dns_search:
+      - elastic-system.svc.cluster.local
+      - svc.cluster.local
+      - cluster.local
+    user: "0:0"
+    privileged: true
+    entrypoint: ["/usr/bin/env", "bash", "-c"]
+    command:
+      - |
+        set -e
+        if [[ -f /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt ]]; then
+          if [[ -f /usr/bin/update-ca-trust ]]; then
+            cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /etc/pki/ca-trust/source/anchors/
+            /usr/bin/update-ca-trust
+          elif [[ -f /usr/sbin/update-ca-certificates ]]; then
+            cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /usr/local/share/ca-certificates/
+            /usr/sbin/update-ca-certificates
+          fi
+        fi
+        exec /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e -c /etc/agent/elastic-agent.yml
+    environment:
+      - FLEET_CA=/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt
+      - FLEET_ENROLL=true
+      - FLEET_ENROLLMENT_TOKEN={{ vault_edge_vps.elastic.fleet_enrollment_token }}
+      - FLEET_URL={{ edge_vps_elastic_fleet_url }}
+      - STATE_PATH=/usr/share/elastic-agent/state
+      - CONFIG_PATH=/usr/share/elastic-agent/state
+      - NODE_NAME={{ inventory_hostname }}
+    volumes:
+      - {{ edge_vps_elastic_state_dir }}:/usr/share/elastic-agent/state
+      - ./elastic-agent.yml:/etc/agent/elastic-agent.yml:ro
+      - ./elasticsearch-ca.crt:/mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt:ro
+      - ./fleet-ca.crt:/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt:ro
+      - {{ edge_vps_traefik_logs_dir }}:/var/log/traefik:ro
+```
+
+**Step 3: Create tasks/50_elastic_agent.yaml**
+
+```yaml
+---
+- name: Deploy Elastic Agent config
+  ansible.builtin.template:
+    src: elastic-agent/elastic-agent.yml.j2
+    dest: "{{ edge_vps_elastic_config_dir }}/elastic-agent.yml"
+    mode: "0644"
+
+- name: Deploy Elastic Agent docker-compose
+  ansible.builtin.template:
+    src: elastic-agent/docker-compose.yml.j2
+    dest: "{{ edge_vps_elastic_config_dir }}/docker-compose.yml"
+    mode: "0644"
+
+- name: Deploy Elasticsearch CA certificate
+  ansible.builtin.copy:
+    src: elastic-agent/elasticsearch-ca.crt
+    dest: "{{ edge_vps_elastic_config_dir }}/elasticsearch-ca.crt"
+    mode: "0644"
+
+- name: Deploy Fleet CA certificate
+  ansible.builtin.copy:
+    src: elastic-agent/fleet-ca.crt
+    dest: "{{ edge_vps_elastic_config_dir }}/fleet-ca.crt"
+    mode: "0644"
+
+- name: Start Elastic Agent
+  community.docker.docker_compose_v2:
+    project_src: "{{ edge_vps_elastic_config_dir }}"
+    state: present
+```
+
+**Step 4: Commit**
+
+```bash
+git add tasks/50_elastic_agent.yaml templates/elastic-agent/
+git commit -m "feat(edge_vps): add Elastic Agent setup task and templates"
+```
+
+---
+
+### Task 7: Create Main Task Orchestrator
+
+**Files:**
+- Create: `roles/edge_vps/tasks/main.yaml`
+
+**Step 1: Create tasks/main.yaml**
+
+```yaml
+---
+- name: Setup directories
+  ansible.builtin.include_tasks: 10_directories.yaml
+
+- name: Setup WireGuard
+  ansible.builtin.include_tasks: 20_wireguard.yaml
+
+- name: Setup Traefik
+  ansible.builtin.include_tasks: 30_traefik.yaml
+
+- name: Setup Pangolin
+  ansible.builtin.include_tasks: 40_pangolin.yaml
+
+- name: Setup Elastic Agent
+  ansible.builtin.include_tasks: 50_elastic_agent.yaml
+```
+
+**Step 2: Commit**
+
+```bash
+git add tasks/main.yaml
+git commit -m "feat(edge_vps): add main task orchestrator"
+```
+
+---
+
+### Task 8: Create Inventory Variables
+
+**Files:**
+- Create: `vars/group_vars/vps/vars.yaml`
+- Create: `vars/group_vars/vps/secrets.yaml`
+
+**Step 1: Create vars/group_vars/vps/vars.yaml**
+
+```yaml
+edge_vps_wireguard_address: "10.133.7.1/24"
+edge_vps_wireguard_port: 61975
+edge_vps_wireguard_routes:
+  - network: "10.43.0.0/16"
+    gateway: "10.133.7.4"
+
+edge_vps_pangolin_dashboard_url: "https://pangolin.seyshiro.de"
+edge_vps_pangolin_base_endpoint: "pangolin.seyshiro.de"
+edge_vps_pangolin_base_domain: "seyshiro.de"
+
+edge_vps_acme_email: "me+acme@tudattr.dev"
+
+edge_vps_elastic_version: "9.2.2"
+edge_vps_elastic_dns_server: "10.43.0.10"
+edge_vps_elastic_fleet_url: "https://fleet-server-agent-http.elastic-system.svc:8220"
+```
+
+**Step 2: Create vars/group_vars/vps/secrets.yaml (template)**
+
+```yaml
+vault_edge_vps:
+  wireguard:
+    private_key: "YOUR_WIREGUARD_PRIVATE_KEY"
+    peers:
+      - name: lilcrow
+        public_key: "PEER_PUBLIC_KEY"
+        preshared_key: "PEER_PRESHARED_KEY"
+        allowed_ips: "10.133.7.2/32"
+      - name: homelab
+        public_key: "PEER_PUBLIC_KEY"
+        preshared_key: "PEER_PRESHARED_KEY"
+        allowed_ips: "10.133.7.3/32"
+      - name: k3s
+        public_key: "PEER_PUBLIC_KEY"
+        preshared_key: "PEER_PRESHARED_KEY"
+        allowed_ips: "10.133.7.4/32, 10.43.0.0/16"
+  pangolin:
+    server_secret: "YOUR_PANGOLIN_SERVER_SECRET"
+  traefik:
+    cloudflare_api_token: "YOUR_CLOUDFLARE_API_TOKEN"
+  elastic:
+    fleet_enrollment_token: "YOUR_FLEET_ENROLLMENT_TOKEN"
+```
+
+**Step 3: Encrypt secrets file**
+
+Run:
+```bash
+ansible-vault encrypt vars/group_vars/vps/secrets.yaml
+```
+
+**Step 4: Commit**
+
+```bash
+git add vars/group_vars/vps/
+git commit -m "feat(edge_vps): add inventory variables for VPS group"
+```
+
+---
+
+### Task 9: Update README
+
+**Files:**
+- Modify: `roles/edge_vps/README.md`
+
+**Step 1: Update README.md**
+
+```markdown
+# Edge VPS
+
+Configures edge VPS instances with WireGuard VPN, Traefik reverse proxy, Pangolin, and Elastic Fleet Agent.
+
+## Requirements
+
+- Docker and Docker Compose installed
+- Ansible community.docker collection
+
+## Role Variables
+
+### WireGuard
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_wireguard_address` | `10.133.7.1/24` | WireGuard interface address |
+| `edge_vps_wireguard_port` | `61975` | WireGuard listen port |
+| `edge_vps_wireguard_interface` | `wg0` | WireGuard interface name |
+| `edge_vps_wireguard_routes` | `[]` | List of routes to add (network, gateway) |
+
+### Traefik
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_traefik_config_dir` | `/root/config/traefik` | Traefik config directory |
+| `edge_vps_acme_email` | - | Email for Let's Encrypt |
+
+### Pangolin
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_pangolin_dashboard_url` | - | Pangolin dashboard URL |
+| `edge_vps_pangolin_base_endpoint` | - | Pangolin base endpoint |
+| `edge_vps_pangolin_base_domain` | - | Base domain for Pangolin |
+
+### Elastic Agent
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `edge_vps_elastic_version` | `9.2.2` | Elastic Agent version |
+| `edge_vps_elastic_fleet_url` | - | Fleet server URL |
+| `edge_vps_elastic_dns_server` | `10.43.0.10` | DNS server for agent |
+
+## Secrets
+
+Store secrets in `vars/group_vars/vps/secrets.yaml` (ansible-vault encrypted):
+
+```yaml
+vault_edge_vps:
+  wireguard:
+    private_key: "..."
+    peers: [...]
+  pangolin:
+    server_secret: "..."
+  traefik:
+    cloudflare_api_token: "..."
+  elastic:
+    fleet_enrollment_token: "..."
+```
+
+## Dependencies
+
+None.
+
+## Example Playbook
+
+```yaml
+- hosts: vps
+  roles:
+    - role: edge_vps
+```
+
+## License
+
+MIT
+```
+
+**Step 2: Commit**
+
+```bash
+git add README.md
+git commit -m "docs(edge_vps): update README with role documentation"
+```
+
+---
+
+### Task 10: Move Certificate Files
+
+**Files:**
+- Move: `files/agent/agent/elasticsearch-ca.crt` → `files/elastic-agent/`
+- Move: `files/agent/agent/fleet-ca.crt` → `files/elastic-agent/`
+
+**Step 1: Move certificate files**
+
+Run:
+```bash
+mkdir -p files/elastic-agent
+mv files/agent/agent/elasticsearch-ca.crt files/elastic-agent/
+mv files/agent/agent/fleet-ca.crt files/elastic-agent/
+rm -rf files/agent
+```
+
+**Step 2: Commit**
+
+```bash
+git add files/
+git commit -m "refactor(edge_vps): reorganize certificate files"
+```
--- a/roles/edge_vps/files/elastic-agent/elasticsearch-ca.crt
+++ b/roles/edge_vps/files/elastic-agent/elasticsearch-ca.crt
@@ -0,0 +1,20 @@
+-----BEGIN CERTIFICATE-----
+MIIDVjCCAj6gAwIBAgIRAPcoBHrxSnovxGFQ44+7XiYwDQYJKoZIhvcNAQELBQAw
+NTEWMBQGA1UECxMNZWxhc3RpY3NlYXJjaDEbMBkGA1UEAxMSZWxhc3RpY3NlYXJj
+aC1odHRwMB4XDTI2MDIwOTIxNDI0NVoXDTI3MDIwOTIxNTI0NVowNTEWMBQGA1UE
+CxMNZWxhc3RpY3NlYXJjaDEbMBkGA1UEAxMSZWxhc3RpY3NlYXJjaC1odHRwMIIB
+IjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA48M932+yPFJkVg31G5f1jJ1g
+IevD+tujYp96De3MY/5QNEsW1R21VWwAobfSN+3NyInhjXT03IhXIwN21B0KPTtO
+c6cpOk0/nwmF0pHpK1PLaqvsfUsa4ffSRvwpsSA0rlEoF+ObBUuQ92ngvAXMN3wp
+PhcaNw9zbPidJoUjwzeaL3nmgnXQIBFRqYGi6l5LzVA0qVHXsNHi5LgXPN4wevWs
+49kn9xPYPXrYBMLxn7hPa9/OfRjUtru2ZoK7L1imr86tjppY0rk8GxIHF12eVf4t
+nGeDUMBuYe6mmUTTkFiwYmrwTzhfDlN82wZ+6cmYeDxpce2nbLBTMICJYOJmMwID
+AQABo2EwXzAOBgNVHQ8BAf8EBAMCAoQwHQYDVR0lBBYwFAYIKwYBBQUHAwEGCCsG
+AQUFBwMCMA8GA1UdEwEB/wQFMAMBAf8wHQYDVR0OBBYEFHvrdY9Nbr77PnHkkEiD
+Y79yBXpkMA0GCSqGSIb3DQEBCwUAA4IBAQCS3X8dQYD89rrltR7HjrG4KCtG6uDY
+U3LYSu1KCBiwMIwYn6RZoI+6D7t16AOumwJC3AJ3/JFkGr7F+UqQSYIaAxYEeyzS
+c2oPzl52h1tbfKUS/550FhWqFuOG6m6SCFSUXe17ShPoomtBxvFjJr6fZLezKdoO
+CBZX0PzHCnU7axFLoNHqzl55koxVcyaY8OjcjvsuAP5zU77nF4sSoHtZ3VTprGWE
+xL3j+vFJ4++d516frWVY8L20mECOcDfLXEf3ngmK9j+8v7UwwpPxWe9MlLS+v7QH
+yBuAMUyKymN4zzhIVKSSYZmiwdkzwUIykffphymJVAQCDSXgX4RWPuKi
+-----END CERTIFICATE-----
--- a/roles/edge_vps/files/elastic-agent/fleet-ca.crt
+++ b/roles/edge_vps/files/elastic-agent/fleet-ca.crt
@@ -0,0 +1,20 @@
+-----BEGIN CERTIFICATE-----
+MIIDUjCCAjqgAwIBAgIRANgLvsSqUxkRAC8fvlBQsn4wDQYJKoZIhvcNAQELBQAw
+MzEVMBMGA1UECxMMZmxlZXQtc2VydmVyMRowGAYDVQQDExFmbGVldC1zZXJ2ZXIt
+aHR0cDAeFw0yNjAyMDkyMTQyNDhaFw0yNzAyMDkyMTUyNDhaMDMxFTATBgNVBAsT
+DGZsZWV0LXNlcnZlcjEaMBgGA1UEAxMRZmxlZXQtc2VydmVyLWh0dHAwggEiMA0G
+CSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQC0zbNbwm3YnaNKQbmHb/9fk5YadGop
+9d9n0VA7pYC32qST5/IsWAkTP/ulPfJwI+nA18tAqtBoLMncdpKP9YtMb1cgRNGe
+d9Fe1kItmIGxoYlQPx4vbbembyvlFFEu82/4tJtDkCR5TuP3ZdmGWazO+tGooMvL
+vkKy0qgQEDUIPTF1VFHcQa+qRvIerAKV81q2lVluVr/GNljoISsXgsoHXG2MDPDs
+RHX+XcQGFNlFG1MuiGApvrKSFsFTCxn8oM88waoI0t/D+y7T1WNwLRY+Fg6fivVh
+kNaIPuCswAIB0MLATtPDP85IjKMxEk5/cTz5R1jOsYz1OoIydkSN87tDAgMBAAGj
+YTBfMA4GA1UdDwEB/wQEAwIChDAdBgNVHSUEFjAUBggrBgEFBQcDAQYIKwYBBQUH
+AwIwDwYDVR0TAQH/BAUwAwEB/zAdBgNVHQ4EFgQUNRhWtaWRi0nubx9yz3tXMDaz
+2AIwDQYJKoZIhvcNAQELBQADggEBAFwFlEQ26vdbPtTv5gpIIRAZDcYAGtm6wx16
+/dqedcVXKSbKKPJq1OfHjYSfN3r3XGGLKTlui8v7Pkz/bqQyAONEC+4S33RX3MiT
+3zTu/SLHiOyHfdLn44Z8JUZ6xmK3mSfchngKLRlECHjNydzYtzJSj67CP7ARJhHo
+wOlQwH11EC+HLrhYBeW4si5L5jCnE9rpKQ4U+/MCLgpdWtHZ3G3PVFxBjL8JISLP
+ZZnHwCMK1LiuWtY3+n3S6BqDDgrQg0TsVA8X/tdEQKzoJb0hTwKrGpvy7CO42vLf
+X+h9iUG4QNve+QCT2Y7T9jNTaWamTHfZWFa6FD5CEgldqDJfEZw=
+-----END CERTIFICATE-----
--- a/roles/edge_vps/handlers/main.yaml
+++ b/roles/edge_vps/handlers/main.yaml
@@ -0,0 +1,12 @@
+---
+- name: Restart wireguard
+  ansible.builtin.systemd:
+    name: "wg-quick@{{ edge_vps_wireguard_interface }}"
+    state: restarted
+  listen: restart wireguard
+
+- name: Restart traefik
+  ansible.builtin.command:
+    cmd: docker compose restart
+    chdir: "{{ edge_vps_traefik_config_dir }}"
+  listen: restart traefik
--- a/roles/edge_vps/tasks/10_directories.yaml
+++ b/roles/edge_vps/tasks/10_directories.yaml
@@ -0,0 +1,30 @@
+---
+- name: Create config base directory
+  ansible.builtin.file:
+    path: "{{ edge_vps_config_base }}"
+    state: directory
+    mode: "0755"
+
+- name: Create Traefik directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: "0755"
+  loop:
+    - "{{ edge_vps_traefik_config_dir }}"
+    - "{{ edge_vps_traefik_logs_dir }}"
+
+- name: Create Pangolin config directory
+  ansible.builtin.file:
+    path: "{{ edge_vps_pangolin_config_dir }}"
+    state: directory
+    mode: "0755"
+
+- name: Create Elastic Agent directories
+  ansible.builtin.file:
+    path: "{{ item }}"
+    state: directory
+    mode: "0755"
+  loop:
+    - "{{ edge_vps_elastic_config_dir }}"
+    - "{{ edge_vps_elastic_state_dir }}"
--- a/roles/edge_vps/tasks/20_wireguard.yaml
+++ b/roles/edge_vps/tasks/20_wireguard.yaml
@@ -0,0 +1,19 @@
+---
+- name: Install WireGuard
+  ansible.builtin.apt:
+    name: wireguard
+    state: present
+    update_cache: true
+
+- name: Deploy WireGuard config
+  ansible.builtin.template:
+    src: wireguard/wg0.conf.j2
+    dest: "{{ edge_vps_wireguard_config_dir }}/{{ edge_vps_wireguard_interface }}.conf"
+    mode: "0600"
+  notify: restart wireguard
+
+- name: Enable WireGuard
+  ansible.builtin.systemd:
+    name: "wg-quick@{{ edge_vps_wireguard_interface }}"
+    enabled: true
+    state: started
--- a/roles/edge_vps/tasks/30_traefik.yaml
+++ b/roles/edge_vps/tasks/30_traefik.yaml
@@ -0,0 +1,15 @@
+---
+- name: Deploy Traefik config
+  ansible.builtin.template:
+    src: traefik/traefik_config.yml.j2
+    dest: "{{ edge_vps_traefik_config_dir }}/traefik_config.yml"
+    mode: "0644"
+  notify: restart traefik
+
+- name: Deploy Cloudflare credentials for ACME
+  ansible.builtin.copy:
+    content: |
+      CF_DNS_API_TOKEN={{ vault_edge_vps.traefik.cloudflare_api_token }}
+    dest: "{{ edge_vps_traefik_config_dir }}/cloudflare.env"
+    mode: "0600"
+  no_log: true
--- a/roles/edge_vps/tasks/40_pangolin.yaml
+++ b/roles/edge_vps/tasks/40_pangolin.yaml
@@ -0,0 +1,24 @@
+---
+- name: Deploy Pangolin config
+  ansible.builtin.template:
+    src: pangolin/config.yml.j2
+    dest: "{{ edge_vps_pangolin_config_dir }}/config.yml"
+    mode: "0644"
+  notify: restart pangolin
+
+- name: Deploy Pangolin docker-compose
+  ansible.builtin.template:
+    src: pangolin/docker-compose.yml.j2
+    dest: "{{ edge_vps_pangolin_config_dir }}/docker-compose.yml"
+    mode: "0644"
+
+- name: Create letsencrypt directory for Pangolin
+  ansible.builtin.file:
+    path: "{{ edge_vps_pangolin_config_dir }}/letsencrypt"
+    state: directory
+    mode: "0755"
+
+- name: Start Pangolin
+  community.docker.docker_compose_v2:
+    project_src: "{{ edge_vps_pangolin_config_dir }}"
+    state: present
--- a/roles/edge_vps/tasks/50_elastic_agent.yaml
+++ b/roles/edge_vps/tasks/50_elastic_agent.yaml
@@ -0,0 +1,29 @@
+---
+- name: Deploy Elastic Agent config
+  ansible.builtin.template:
+    src: elastic-agent/elastic-agent.yml.j2
+    dest: "{{ edge_vps_elastic_config_dir }}/elastic-agent.yml"
+    mode: "0644"
+
+- name: Deploy Elastic Agent docker-compose
+  ansible.builtin.template:
+    src: elastic-agent/docker-compose.yml.j2
+    dest: "{{ edge_vps_elastic_config_dir }}/docker-compose.yml"
+    mode: "0644"
+
+- name: Deploy Elasticsearch CA certificate
+  ansible.builtin.copy:
+    src: elastic-agent/elasticsearch-ca.crt
+    dest: "{{ edge_vps_elastic_config_dir }}/elasticsearch-ca.crt"
+    mode: "0644"
+
+- name: Deploy Fleet CA certificate
+  ansible.builtin.copy:
+    src: elastic-agent/fleet-ca.crt
+    dest: "{{ edge_vps_elastic_config_dir }}/fleet-ca.crt"
+    mode: "0644"
+
+- name: Start Elastic Agent
+  community.docker.docker_compose_v2:
+    project_src: "{{ edge_vps_elastic_config_dir }}"
+    state: present
--- a/roles/edge_vps/tasks/main.yaml
+++ b/roles/edge_vps/tasks/main.yaml
@@ -0,0 +1,15 @@
+---
+- name: Setup directories
+  ansible.builtin.include_tasks: 10_directories.yaml
+
+- name: Setup WireGuard
+  ansible.builtin.include_tasks: 20_wireguard.yaml
+
+- name: Setup Traefik
+  ansible.builtin.include_tasks: 30_traefik.yaml
+
+- name: Setup Pangolin
+  ansible.builtin.include_tasks: 40_pangolin.yaml
+
+- name: Setup Elastic Agent
+  ansible.builtin.include_tasks: 50_elastic_agent.yaml
--- a/roles/edge_vps/templates/elastic-agent/docker-compose.yml.j2
+++ b/roles/edge_vps/templates/elastic-agent/docker-compose.yml.j2
@@ -0,0 +1,42 @@
+services:
+  elastic-agent:
+    image: docker.elastic.co/elastic-agent/elastic-agent:{{ edge_vps_elastic_version }}
+    container_name: elastic-agent
+    restart: always
+    network_mode: host
+    dns:
+      - {{ edge_vps_elastic_dns_server }}
+    dns_search:
+      - elastic-system.svc.cluster.local
+      - svc.cluster.local
+      - cluster.local
+    user: "0:0"
+    privileged: true
+    entrypoint: ["/usr/bin/env", "bash", "-c"]
+    command:
+      - |
+        set -e
+        if [[ -f /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt ]]; then
+          if [[ -f /usr/bin/update-ca-trust ]]; then
+            cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /etc/pki/ca-trust/source/anchors/
+            /usr/bin/update-ca-trust
+          elif [[ -f /usr/sbin/update-ca-certificates ]]; then
+            cp /mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt /usr/local/share/ca-certificates/
+            /usr/sbin/update-ca-certificates
+          fi
+        fi
+        exec /usr/bin/tini -- /usr/local/bin/docker-entrypoint -e -c /etc/agent/elastic-agent.yml
+    environment:
+      - FLEET_CA=/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt
+      - FLEET_ENROLL=true
+      - FLEET_ENROLLMENT_TOKEN={{ vault_edge_vps.elastic.fleet_enrollment_token }}
+      - FLEET_URL={{ edge_vps_elastic_fleet_url }}
+      - STATE_PATH=/usr/share/elastic-agent/state
+      - CONFIG_PATH=/usr/share/elastic-agent/state
+      - NODE_NAME={{ inventory_hostname }}
+    volumes:
+      - {{ edge_vps_elastic_state_dir }}:/usr/share/elastic-agent/state
+      - ./elastic-agent.yml:/etc/agent/elastic-agent.yml:ro
+      - ./elasticsearch-ca.crt:/mnt/elastic-internal/elasticsearch-association/elastic-system/elasticsearch/certs/ca.crt:ro
+      - ./fleet-ca.crt:/mnt/elastic-internal/fleetserver-association/elastic-system/fleet-server/certs/ca.crt:ro
+      - {{ edge_vps_traefik_logs_dir }}:/var/log/traefik:ro
--- a/roles/edge_vps/templates/elastic-agent/elastic-agent.yml.j2
+++ b/roles/edge_vps/templates/elastic-agent/elastic-agent.yml.j2
@@ -0,0 +1,2 @@
+fleet:
+  enabled: true
--- a/roles/edge_vps/templates/pangolin/config.yml.j2
+++ b/roles/edge_vps/templates/pangolin/config.yml.j2
@@ -0,0 +1,28 @@
+gerbil:
+    start_port: 51820
+    base_endpoint: "{{ edge_vps_pangolin_base_endpoint }}"
+
+app:
+    dashboard_url: "{{ edge_vps_pangolin_dashboard_url }}"
+    log_level: "info"
+    telemetry:
+        anonymous_usage: true
+
+domains:
+    domain1:
+        base_domain: "{{ edge_vps_pangolin_base_domain }}"
+
+server:
+    secret: "{{ vault_edge_vps.pangolin.server_secret }}"
+    cors:
+        origins: ["{{ edge_vps_pangolin_dashboard_url }}"]
+        methods: ["GET", "POST", "PUT", "DELETE", "PATCH"]
+        allowed_headers: ["X-CSRF-Token", "Content-Type"]
+        credentials: false
+    maxmind_db_path: "./config/GeoLite2-Country.mmdb"
+
+flags:
+    require_email_verification: false
+    disable_signup_without_invite: true
+    disable_user_create_org: false
+    allow_raw_resources: true
--- a/roles/edge_vps/templates/pangolin/docker-compose.yml.j2
+++ b/roles/edge_vps/templates/pangolin/docker-compose.yml.j2
@@ -0,0 +1,25 @@
+services:
+  pangolin:
+    image: fosrl/pangolin:latest
+    container_name: pangolin
+    restart: unless-stopped
+    ports:
+      - "3001:3001"
+      - "443:443"
+      - "80:80"
+    volumes:
+      - ./config.yml:/app/config/config.yml:ro
+      - ./letsencrypt:/letsencrypt
+    depends_on:
+      - gerbil
+
+  gerbil:
+    image: fosrl/gerbil:latest
+    container_name: gerbil
+    restart: unless-stopped
+    network_mode: host
+    cap_add:
+      - NET_ADMIN
+      - SYS_MODULE
+    volumes:
+      - /lib/modules:/lib/modules
--- a/roles/edge_vps/templates/traefik/traefik_config.yml.j2
+++ b/roles/edge_vps/templates/traefik/traefik_config.yml.j2
@@ -0,0 +1,57 @@
+api:
+  insecure: true
+  dashboard: true
+
+providers:
+  http:
+    endpoint: "http://pangolin:3001/api/v1/traefik-config"
+    pollInterval: "5s"
+  file:
+    filename: "/etc/traefik/dynamic_config.yml"
+
+experimental:
+  plugins:
+    badger:
+      moduleName: "github.com/fosrl/badger"
+      version: "v1.2.1"
+
+log:
+  level: "INFO"
+  format: "common"
+  maxSize: 100
+  maxBackups: 3
+  maxAge: 3
+  compress: true
+
+certificatesResolvers:
+  letsencrypt:
+    acme:
+      dnsChallenge:
+        provider: "cloudflare"
+      email: "{{ edge_vps_acme_email }}"
+      storage: "/letsencrypt/acme.json"
+      caServer: "https://acme-v02.api.letsencrypt.org/directory"
+
+entryPoints:
+  web:
+    address: ":80"
+  websecure:
+    address: ":443"
+    transport:
+      respondingTimeouts:
+        readTimeout: "30m"
+    http:
+      tls:
+        certResolver: "letsencrypt"
+  tcp-6443:
+    address: ":6443/tcp"
+
+serversTransport:
+  insecureSkipVerify: true
+
+ping:
+  entryPoint: "web"
+
+accessLog:
+  filePath: "/var/log/traefik/access.log"
+  format: common
--- a/roles/edge_vps/templates/wireguard/wg0.conf.j2
+++ b/roles/edge_vps/templates/wireguard/wg0.conf.j2
@@ -0,0 +1,25 @@
+[Interface]
+Address = {{ edge_vps_wireguard_address }}
+ListenPort = {{ edge_vps_wireguard_port }}
+PrivateKey = {{ vault_edge_vps.wireguard.private_key }}
+
+PostUp   = sysctl -w net.ipv4.ip_forward=1
+PostUp   = iptables -A FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
+PostUp   = iptables -A FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
+{% for route in edge_vps_wireguard_routes | default([]) %}
+PostUp   = ip route add {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
+{% endfor %}
+PostDown = iptables -D FORWARD -i {{ edge_vps_wireguard_interface }} -j ACCEPT
+PostDown = iptables -D FORWARD -o {{ edge_vps_wireguard_interface }} -j ACCEPT
+{% for route in edge_vps_wireguard_routes | default([]) %}
+PostDown = ip route del {{ route }} via {{ route.gateway }} dev {{ edge_vps_wireguard_interface }}
+{% endfor %}
+
+{% for peer in vault_edge_vps.wireguard.peers %}
+[Peer]
+# {{ peer.name }}
+PublicKey = {{ peer.public_key }}
+PresharedKey = {{ peer.preshared_key }}
+AllowedIPs = {{ peer.allowed_ips }}
+
+{% endfor %}
--- a/roles/k3s_agent/README.md
+++ b/roles/k3s_agent/README.md
@@ -0,0 +1,39 @@
+# K3s Agent Ansible Role
+
+This Ansible role installs and configures a K3s agent on a node.
+
+## Role Variables
+
+- `k3s.loadbalancer.default_port`: The port for the K3s load balancer. Defaults to `6443`.
+- `k3s_token`: The token for joining the K3s cluster. This is a required variable.
+- `hostvars['k3s-loadbalancer'].ansible_default_ipv4.address`: The IP address of the K3s load balancer. This is a required variable.
+
+## Tasks
+
+The main tasks are in `tasks/main.yml` and `tasks/installation.yml`.
+
+- **`installation.yml`**:
+  - Installs `qemu-guest-agent`.
+  - Checks if K3s is already installed.
+  - Downloads the K3s installation script to `/tmp/k3s_install.sh`.
+  - Installs K3s as an agent, connecting to the master.
+
+## Handlers
+
+The main handlers are in `handlers/main.yml`.
+
+- **`Restart k3s`**: Restarts the `k3s` service.
+
+## Usage
+
+Here is an example of how to use this role in a playbook:
+
+```yaml
+---
+- hosts: k3s_agents
+  roles:
+    - role: k3s_agent
+      vars:
+        k3s_token: "your_k3s_token"
+        k3s.loadbalancer.default_port: 6443
+```
--- a/roles/k3s_agent/handlers/main.yaml
+++ b/roles/k3s_agent/handlers/main.yaml
@@ -3,4 +3,4 @@
  service:
    name: k3s
    state: restarted
-  become: yes
+  become: true
--- a/roles/k3s_agent/tasks/installation.yaml
+++ b/roles/k3s_agent/tasks/installation.yaml
@@ -1,4 +1,12 @@
 ---
+- name: Install dependencies for apt to use repositories over HTTPS
+  ansible.builtin.apt:
+    name: "{{ item }}"
+    state: present
+  loop:
+    - qemu-guest-agent
+  become: true
+
 - name: See if k3s file exists
  ansible.builtin.stat:
    path: /usr/local/bin/k3s
@@ -11,11 +19,11 @@
    dest: /tmp/k3s_install.sh
    mode: "0755"

- name: Install K3s on the secondary servers
+- name: Install K3s on agent
  when: not k3s_status.stat.exists
  ansible.builtin.command: |
    /tmp/k3s_install.sh
  environment:
-    K3S_URL: "https://{{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }}:{{ k3s.loadbalancer.default_port }}"
+    K3S_URL: "https://{{ k3s_vip }}:{{ k3s.loadbalancer.default_port }}"
    K3S_TOKEN: "{{ k3s_token }}"
  become: true
--- a/roles/k3s_agent/tasks/main.yaml
+++ b/roles/k3s_agent/tasks/main.yaml
@@ -0,0 +1,3 @@
+---
+- name: Install k3s agent
+  include_tasks: installation.yaml
--- a/roles/k3s_agent/tasks/main.yml
+++ b/roles/k3s_agent/tasks/main.yml
@@ -1,2 +0,0 @@
---
- include_tasks: installation.yml
--- a/roles/k3s_loadbalancer/README.md
+++ b/roles/k3s_loadbalancer/README.md
@@ -0,0 +1,50 @@
+# K3s Loadbalancer Ansible Role
+
+This Ansible role configures a load balancer for a K3s cluster using Nginx.
+
+## Role Variables
+
+- `k3s_loadbalancer_nginx_config_path`: The path to the Nginx configuration file. Defaults to `/etc/nginx/nginx.conf`.
+- `domain`: The domain name to use for the load balancer. Defaults to `{{ internal_domain }}`.
+- `k3s.loadbalancer.default_port`: The default port for the K3s API server. Defaults to `6443`.
+- `k3s_server_ips`: A list of IP addresses for the K3s server nodes. This variable is not defined in the role, so you must provide it.
+- `netcup_api_key`: Your Netcup API key.
+- `netcup_api_password`: Your Netcup API password.
+- `netcup_customer_id`: Your Netcup customer ID.
+
+## Tasks
+
+The role performs the following tasks:
+
+- **Installation:**
+  - Updates the `apt` cache.
+  - Installs `qemu-guest-agent`.
+  - Installs `nginx-full`.
+- **Configuration:**
+  - Templates the Nginx configuration file with dynamic upstreams for the K3s servers.
+  - Enables and starts the Nginx service.
+- **DNS Setup:**
+  - Sets up a DNS A record for the load balancer using the `community.general.netcup_dns` module.
+
+## Handlers
+
+- `Restart nginx`: Restarts the Nginx service when the configuration file is changed.
+
+## Example Usage
+
+Here is an example of how to use this role in a playbook:
+
+```yaml
+- hosts: k3s_loadbalancer
+  roles:
+    - role: k3s_loadbalancer
+      vars:
+        k3s_server_ips:
+          - 192.168.1.10
+          - 192.168.1.11
+          - 192.168.1.12
+        netcup_api_key: "your_api_key"
+        netcup_api_password: "your_api_password"
+        netcup_customer_id: "your_customer_id"
+        internal_domain: "example.com"
+```
--- a/roles/k3s_loadbalancer/handlers/main.yaml
+++ b/roles/k3s_loadbalancer/handlers/main.yaml
--- a/roles/k3s_loadbalancer/tasks/configuration.yaml
+++ b/roles/k3s_loadbalancer/tasks/configuration.yaml
@@ -9,8 +9,6 @@
  become: true
  notify:
    - Restart nginx
-  vars:
-    k3s_server_ips: "{{ groups['k3s_server'] | map('extract', hostvars, 'ansible_default_ipv4') | map(attribute='address') | unique | list }}"

 - name: Enable nginx
  ansible.builtin.systemd:
--- a/roles/k3s_loadbalancer/tasks/installation.yaml
+++ b/roles/k3s_loadbalancer/tasks/installation.yaml
@@ -4,6 +4,14 @@
    update_cache: true
  become: true

+- name: Install dependencies for apt to use repositories over HTTPS
+  ansible.builtin.apt:
+    name: "{{ item }}"
+    state: present
+  loop:
+    - qemu-guest-agent
+  become: true
+
 - name: Install Nginx
  ansible.builtin.apt:
    name:
--- a/roles/k3s_loadbalancer/tasks/main.yaml
+++ b/roles/k3s_loadbalancer/tasks/main.yaml
@@ -0,0 +1,17 @@
+---
+- name: Installation
+  ansible.builtin.include_tasks: installation.yaml
+
+- name: Configure
+  ansible.builtin.include_tasks: configuration.yaml
+
+- name: Setup DNS on Netcup
+  community.general.netcup_dns:
+    api_key: "{{ netcup_api_key }}"
+    api_password: "{{ netcup_api_password }}"
+    customer_id: "{{ netcup_customer_id }}"
+    domain: "{{ domain }}"
+    name: "k3s"
+    type: "A"
+    value: "{{ hostvars['k3s-loadbalancer'].ansible_default_ipv4.address }}"
+  delegate_to: localhost
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Tuan-Dat Tran	8da0ab98f8	fix(k3s_server): skip installation if k3s binary already exists Primary and secondary install tasks now check k3s_status.stat.exists so re-running the playbook is idempotent on already-provisioned nodes.	2026-04-27 21:43:42 +02:00
Tuan-Dat Tran	b4e093c9b1	fix(k3s_server): use VIP address in kubeconfig instead of k3s_server_name k3s_server_name resolves to k3s.seyshiro.de which has no DNS entry. Use k3s_vip (192.168.20.2) so the kubeconfig always works.	2026-04-27 21:41:55 +02:00
Tuan-Dat Tran	e8df950e87	chore(k3s): update vault-encrypted cluster join token	2026-04-27 21:39:37 +02:00
Tuan-Dat Tran	5b44c46e10	docs(arr-cleanup): improve runbook and fix api key paths Rewrites findings.md with how-to section, cleaner summary tables, and more detailed per-pass results. Fixes relative path for sonarr/radarr API key files after runbook moved deeper in repo.	2026-04-27 21:39:28 +02:00
Tuan-Dat Tran	95715c7748	feat(k3s_server): persist control-plane NoSchedule taint in k3s config Adds node-taint to /etc/rancher/k3s/config.yaml so the taint survives node reboots. Taint is already applied live via kubectl.	2026-04-27 21:35:24 +02:00
Tuan-Dat Tran	5bc3024eaf	feat(k3s): replace nginx loadbalancer with kube-vip for control-plane HA Deploys kube-vip as a DaemonSet on all k3s server nodes, advertising a VIP (192.168.20.2) via ARP. Eliminates the single-point-of-failure k3s-loadbalancer VM. - New kube_vip role: RBAC + DaemonSet templates, TLS SAN cert rotation - playbooks/kube-vip.yaml: migration playbook (serial=1, idempotent) - Updated k3s install tasks (server primary/secondary, agent) to use k3s_vip instead of the loadbalancer VM IP - Added k3s_vip: 192.168.20.2 to group_vars (below DHCP range .11-.250) Migration steps in playbook header comment.	2026-04-26 12:08:42 +02:00
Tuan-Dat Tran	fce6f913ff	docs(plan): add docker version update plan for jellyfin and gitea	2026-04-23 08:06:35 +02:00
Tuan-Dat Tran	8239988a70	docs(runbook): add arr-stack downloads cleanup investigation and scripts ~16T freed on aya01 (92% → 57% mergerfs pool). Documents root cause (no hardlinks across mergerfs due to cross-device mounts), cleanup passes via Sonarr/Radarr API verification, and pending decisions (Bleach remux, 111 skipped Sonarr entries).	2026-04-23 08:06:27 +02:00
Tuan-Dat Tran	e87dcd06f3	chore(k3s): rotate cluster token secret	2026-04-23 08:06:08 +02:00
Tuan-Dat Tran	543e9a2c97	fix(docker_host): remove /media/docker from NFS mount loop /media/docker is no longer a valid NFS-backed path; was causing mount failures on docker_host nodes.	2026-04-23 08:06:03 +02:00
Tuan-Dat Tran	afbc3e3c57	docs(runbook): add Longhorn orphan auto-deletion fix and etcd defrag procedure	2026-04-22 22:03:45 +02:00
Tuan-Dat Tran	b157dd0b89	feat(k3s_server): install etcd-client on control plane nodes	2026-04-22 19:40:24 +02:00
Tuan-Dat Tran	057cd7a7f0	docs(runbook): mark vaultwarden as resolved	2026-04-22 00:52:58 +02:00
Tuan-Dat Tran	db2d5dccd4	docs(runbook): mark Longhorn orphan/etcd defrag as resolved 138 orphans deleted, all 3 etcd members defragged from 634MB to ~57MB.	2026-04-22 00:40:23 +02:00
Tuan-Dat Tran	db7e130515	docs: mark server11 disk issue resolved in runbook	2026-04-21 23:41:13 +02:00
Tuan-Dat Tran	c16e7cf740	fix(k3s_server): use inventory_hostname for primary detection and delegate token fetch Primary server detection previously used ansible_default_ipv4.address compared against k3s_primary_server_ip, which breaks with --limit since facts are only gathered for the targeted hosts, causing the variable to resolve to the wrong IP. - Replace IP comparisons with `inventory_hostname == groups['k3s_server'] \| first` in main.yaml (primary install, secondary install, kubeconfig tasks) - Delegate the node-token slurp to the primary server unconditionally so pull_token.yaml works correctly when run against any single node with --limit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 23:30:57 +02:00
Tuan-Dat Tran	c084572521	docs: add k3s-server11 reprovision implementation plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 21:58:13 +02:00
Tuan-Dat Tran	da7bd42f07	docs: add k3s-server11 reprovision spec and cluster outage runbook Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-21 21:55:18 +02:00
Tuan-Dat Tran	f0a45e3fda	fix: configure explicit NTP servers in timesyncd instead of relying on DHCP Gateway at 192.168.20.1 was being provided via DHCP as the NTP server but does not serve NTP, causing NodeClockNotSynchronising across all nodes.	2026-04-20 20:56:30 +02:00
Tuan-Dat Tran	b5f82e2978	fix: install kitty terminfo on all nodes via common role	2026-04-20 20:36:23 +02:00
Tuan-Dat Tran	29561c44c8	fix: enable and start systemd-timesyncd in common time role systemd-timesyncd was installed via common_packages but never enabled or started, causing NodeClockNotSynchronising alerts across all k3s nodes.	2026-04-20 20:18:19 +02:00
Tuan-Dat Tran	d33117a752	chore(docker): update jellyfin to 10.11.7 and gitea to 1.25.5-rootless	2026-04-01 21:20:02 +02:00
Tuan-Dat Tran	e9e4864456	docs: add design spec for docker service version updates (jellyfin 10.11.7, gitea 1.25.5)	2026-04-01 21:17:05 +02:00
Tuan-Dat Tran	043f97ebac	docs: add design spec and implementation plan for docker service redeployment	2026-04-01 21:00:51 +02:00
Tuan-Dat Tran	134eceee0f	Update Jellyfin and Gitea image versions	2026-04-01 20:55:20 +02:00
Tuan-Dat Tran	80f98a9c4b	docs: update Proxmox cluster debugging design with findings and fixes	2026-03-01 20:58:04 +01:00
Tuan-Dat Tran	d4ac3dae60	feat(k3s): Added 2 nodes Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2026-03-01 17:01:51 +01:00
Tuan-Dat Tran	5a8c7f0248	feat(proxmox): add hosts config Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2026-02-28 11:30:58 +01:00
Tuan-Dat Tran	bf7c7c9562	ci: add GitHub Actions workflow for linting	2026-02-25 06:00:20 +01:00
Tuan-Dat Tran	a9346881b0	refactor(edge_vps): reorganize certificate files	2026-02-25 00:26:08 +01:00
Tuan-Dat Tran	193da30e65	docs(edge_vps): update README with role documentation	2026-02-25 00:12:50 +01:00
Tuan-Dat Tran	9a5cb376bd	feat(edge_vps): add inventory variables for VPS group	2026-02-25 00:10:27 +01:00
Tuan-Dat Tran	fc2eefdfb0	feat(edge_vps): add main task orchestrator	2026-02-25 00:03:17 +01:00
Tuan-Dat Tran	274b9c310e	feat(edge_vps): add Elastic Agent setup task and templates	2026-02-25 00:00:00 +01:00
Tuan-Dat Tran	6fdd021604	feat(edge_vps): add Pangolin setup task and templates	2026-02-24 23:56:00 +01:00
Tuan-Dat Tran	1b82acad1f	feat(edge_vps): add Traefik setup task and template	2026-02-24 23:53:00 +01:00
Tuan-Dat Tran	d8822ad904	feat(edge_vps): add WireGuard setup task and template	2026-02-24 23:50:08 +01:00
Tuan-Dat Tran	caecfc7c1d	feat(edge_vps): add directory setup task	2026-02-24 23:47:34 +01:00
Tuan-Dat Tran	4907761649	feat(edge_vps): add role structure and handlers	2026-02-24 23:45:14 +01:00
Tuan-Dat Tran	a3cb1928ae	docs(argocd): add missing Ingress task and note about missing template	2026-02-16 09:25:36 +01:00
Tuan-Dat Tran	99f6876ce9	docs: Add changelog and update role documentation	2026-02-16 09:21:08 +01:00
Tuan-Dat Tran	0a3171b9bc	feat(k3s): Added 2 nodes (2/2) Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2026-01-26 23:08:34 +01:00
Tuan-Dat Tran	3068a5a8fb	feat(k3s): Added 2 nodesg Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2026-01-26 22:42:19 +01:00
Tuan-Dat Tran	ef652fac20	refactor: yml -> yaml Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-11-07 20:44:14 +01:00
Tuan-Dat Tran	22c1b534ab	feat(k3s): Add new node and machine Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-10-26 10:41:11 +01:00
Tuan-Dat Tran	9cb90a8020	feat(caddy): netcup->cf Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-10-25 09:25:40 +02:00
Tuan-Dat Tran	d9181515bb	feat(k3s): Added (temporary) node Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-10-19 01:33:42 +02:00
Tuan-Dat Tran	c3905ed144	feat(git): Add .gitattributes for ansible-vault git diff Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-10-19 00:34:51 +02:00
Tuan-Dat Tran	5fb50ab4b2	feat(k3s): Add new node Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-10-07 23:46:40 +02:00
Tuan-Dat Tran	2909d6e16c	feat(nfs): Removed unused/removed nfs servers Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	0aed818be5	feat(docker): Removed nodes docker-host10 and docker-host12 Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	fbdeec93ce	feat(docker): match services that moved to k3s Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	44626101de	feat(docker): match services that moved to k3s Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	c1d6f13275	refactor(ansible-lint): fixed ansible-lint warnings Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	282e98e90a	fix(proxmox): commented 'non-errors' on script Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-15 23:29:03 +02:00
Tuan-Dat Tran	9573cbfcad	feat(k3s): Added 2 nodes Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-07 21:21:33 +02:00
Tuan-Dat Tran	48aec11d8c	feat(common): added iscsi for longhorn on k3s Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-07 18:17:33 +02:00
Tuan-Dat Tran	a1da69ac98	feat(proxmox): check_vm as cronjob Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-09-02 19:52:49 +02:00
Tuan-Dat Tran	7aa16f3207	Added blog.md Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-27 22:59:01 +02:00
Tuan-Dat Tran	fe3f1749c5	Update README.md Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-27 22:51:15 +02:00
Tuan-Dat Tran	6eef96b302	feat(pre-commit): Added linting	2025-07-27 22:46:23 +02:00
Tuan-Dat Tran	2882abfc0b	Added README.md for roles Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-27 16:40:46 +02:00
Tuan-Dat Tran	2b759cc2ab	Update README.md Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-27 16:16:35 +02:00
Tuan-Dat Tran	dbaebaee80	cleanup: services moved to argocd Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-27 13:58:25 +02:00
Tuan-Dat Tran	89c51aa45c	feat(argo): app-of-app argo Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-25 07:58:41 +02:00
Tuan-Dat Tran	0139850ee3	feat(reverse_proxy): fix caddy letsencrypt Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-22 21:26:11 +02:00
Tuan-Dat Tran	976cad51e2	refactor(k3s): enhance cluster setup and enable ArgoCD apps Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-22 07:23:23 +02:00
Tuan-Dat Tran	e1a2248154	feat(kubernetes): add nfs-provisioner Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-15 23:24:52 +02:00
Tuan-Dat Tran	d8fd094379	feat(kubernetes): stable kubernetes with argo Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-14 22:57:13 +02:00
Tuan-Dat Tran	76000f8123	feat(kubernetes): add initial setup for ArgoCD, Cert-Manager, MetalLB, and Traefik Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-13 14:25:53 +02:00
Tuan-Dat Tran	4aa939426b	refactor(k3s): enhance kubeconfig generation and token management Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-13 09:33:39 +02:00
Tuan-Dat Tran	9cce71f73b	refactor(k3s): manage token securely and install guest agent Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-13 02:15:01 +02:00
Tuan-Dat Tran	97a5d6c41d	refactor(k3s): centralize k3s primary server IP and integrate Netcup DNS Signed-off-by: Tuan-Dat Tran <tuan-dat.tran@tudattr.dev>	2025-07-13 01:30:05 +02:00