# arr-stack Downloads Cleanup — Investigation Findings ## Storage Layout (aya01) | Device | FS | Size | Used | Mount | |--------|----|------|------|-------| | `/dev/sdc3` | btrfs | 1.9T | 177G (10%) | `/` (system) | | `/dev/sda1` | btrfs `proxmox` | 2.8T | 1.3T (48%) | `/opt` | | `/dev/sdd1` | ext4 | 17T | 15T (92%) | `/mnt/hdd0` | | `/dev/sde1` | ext4 | 17T | 15T (92%) | `/mnt/hdd2` | | `/dev/sdf1` | ext4 | 17T | 15T (92%) | `/mnt/hdd1` | | `mergerfs` | fuse | 49T | 43T (92%) | `/media` | `/media` is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup. **After cleanup (2026-04-23):** | Device | Used | Avail | Use% | |--------|------|-------|------| | `/dev/sdd1` (hdd0) | 9.4T | 6.2T | 61% | | `/dev/sdf1` (hdd1) | 9.3T | 6.3T | 60% | | `/dev/sde1` (hdd2) | 7.8T | 7.8T | 51% | | `mergerfs /media` | 27T | 21T | 57% | **~16T freed total** (92% → 57% on the mergerfs pool). ## /media Breakdown (before cleanup) | Directory | Size | |-----------|------| | `downloads` | **22T** | | `series` | 16T | | `movies` | 5T | ## Root Cause: No Hardlinks → All Imports Are Copies Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods: 1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/` 2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy) **All three services mount the mergerfs `/media/` path via NFS:** ``` sonarr: NFS 192.168.20.12:/media/downloads → /downloads NFS 192.168.20.12:/media/series → /tv radarr: NFS 192.168.20.12:/media/downloads → /downloads NFS 192.168.20.12:/media/movies → /movies qbit: NFS 192.168.20.12:/media/downloads → /downloads ``` mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data. **Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr). ## How to Run Prerequisites: ```bash # Port-forward Sonarr and Radarr APIs kubectl -n arr-stack port-forward svc/sonarr 8989:8989 & kubectl -n arr-stack port-forward svc/radarr 7878:7878 & ``` API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env` (i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo). Container path mappings used in scripts: - Sonarr: `/tv/` → `/media/series/` - Radarr: `/movies/` → `/media/movies/` ### Step 1 — Verify (generates `/tmp/arr_verified.json`) ```bash python3 verify.py ``` Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`. ### Step 2 — Delete confirmed-imported downloads ```bash python3 cleanup.py --dry-run # preview python3 cleanup.py --arr sonarr --yes python3 cleanup.py --arr radarr --yes ``` ### Step 3 — Delete orphans (downloads not in Sonarr at all) ```bash python3 cleanup-orphans.py --dry-run # preview python3 cleanup-orphans.py --yes ``` All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome. ## Cleanup Performed (2026-04-23) ### Pass 1 — Orphans (downloads not in Sonarr) Script: `cleanup-orphans.py` Two-pass logic: 1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match) 2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review) 3. Delete remaining true orphans Result: **49 deleted, 461.6G freed, 0 failed** 111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list. Notable orphans deleted: - Game of Thrones S01–S08 (~267G) — removed from Sonarr - Sex Education S01–S04 (~110G) — removed from Sonarr - Love Death & Robots (multiple duplicate copies, ~45G) - Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc. ### Pass 2 — Confirmed-imported Sonarr downloads Script: `cleanup.py --arr sonarr --yes` Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk. Result: **1106 deleted, 0 failed** ### Pass 3 — Confirmed-imported Radarr downloads Script: `cleanup.py --arr radarr --yes` Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk. Result: **259 deleted, 0 failed** ### Summary | Pass | Script | Entries | Space freed | |------|--------|---------|-------------| | Orphans | `cleanup-orphans.py` | 49 | ~461G | | Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) | | Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) | | **Total** | | **1414** | **~16T** | ## Verification Results (from verify.py run before cleanup) | | Safe to delete | Not imported | Path missing | Orphans (no API match) | |---|---|---|---|---| | **Sonarr** (1439 downloads) | 1106 | — | — | 333 | | **Radarr** (289 downloads) | 265 | — | — | 25 | Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333. ### Radarr Orphans (25) — not matched, not deleted - Constantine (2005) - Cowboy Bebop: Knockin' on Heaven's Door (2001) - Les Misérables (2012) - Pokémon Detective Pikachu (2019) - Code Geass: Fukkatsu no Lelouch (2019) - Eiga Go-Toubun no Hanayome (2022) - Gisaengchung / Parasite — Korean title, matching failure - Dune: Part One (2021) — matching failure, confirmed in Radarr - Harry Potter older/duplicate copies — matching failure - Porco Rosso / Kurenai no buta — matching failure - Castle in the Sky / Laputa — matching failure - Steins;Gate: The Movie — matching failure - Project Silence / Talchul — matching failure - Digimon: Frontier & Savers films - One Piece films (several) - Paripi Koumei movie - Fantastic Four (2025) extra copies (3) - JJK DCP trailer file ### Path mismatch entries (confirmed safe, deleted anyway) - Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist - WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk ## Pending Decisions ### Bleach USBD Remux TL (1.8T) `/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00–S16 (-ZR- group). Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported). Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported. Options: - **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired - **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed Per-season breakdown saved in memory. ### SKIPPED downloads (111 Sonarr entries) Downloads where a matching series directory exists on disk but the series is not in Sonarr. Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies. Needs manual review per series before deleting. ## Permanent Fix (not applied) Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks: ```yaml # In sonarr/radarr/qtun deployments, change: path: /media/downloads → path: /mnt/hdd0/downloads path: /media/series → path: /mnt/hdd0/series path: /media/movies → path: /mnt/hdd0/movies ``` Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space. Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.