diff --git a/docs/runbooks/arr-cleanup/cleanup-orphans.py b/docs/runbooks/arr-cleanup/cleanup-orphans.py index f56ff20..e3928d2 100644 --- a/docs/runbooks/arr-cleanup/cleanup-orphans.py +++ b/docs/runbooks/arr-cleanup/cleanup-orphans.py @@ -32,7 +32,7 @@ SERIES_ROOT = "/media/series" script_dir = os.path.dirname(os.path.abspath(__file__)) LOG_FILE = os.path.join(script_dir, "cleanup.log") -with open(os.path.join(script_dir, '..', 'sonarr.api.env')) as f: +with open(os.path.join(script_dir, '../../../..', 'sonarr.api.env')) as f: SONARR_KEY = f.read().strip() diff --git a/docs/runbooks/arr-cleanup/findings.md b/docs/runbooks/arr-cleanup/findings.md index 0e8f835..e2dafb2 100644 --- a/docs/runbooks/arr-cleanup/findings.md +++ b/docs/runbooks/arr-cleanup/findings.md @@ -34,7 +34,9 @@ ## Root Cause: No Hardlinks → All Imports Are Copies -Zero hardlinked files exist anywhere across all three HDDs. Confirmed by inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/` and by inode comparison of 1365 download/media file pairs (0 shared inodes found). +Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods: +1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/` +2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy) **All three services mount the mergerfs `/media/` path via NFS:** @@ -48,63 +50,106 @@ qbit: NFS 192.168.20.12:/media/downloads → /downloads mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data. +**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr). + +## How to Run + +Prerequisites: +```bash +# Port-forward Sonarr and Radarr APIs +kubectl -n arr-stack port-forward svc/sonarr 8989:8989 & +kubectl -n arr-stack port-forward svc/radarr 7878:7878 & +``` + +API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env` +(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo). + +Container path mappings used in scripts: +- Sonarr: `/tv/` → `/media/series/` +- Radarr: `/movies/` → `/media/movies/` + +### Step 1 — Verify (generates `/tmp/arr_verified.json`) +```bash +python3 verify.py +``` +Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`. + +### Step 2 — Delete confirmed-imported downloads +```bash +python3 cleanup.py --dry-run # preview +python3 cleanup.py --arr sonarr --yes +python3 cleanup.py --arr radarr --yes +``` + +### Step 3 — Delete orphans (downloads not in Sonarr at all) +```bash +python3 cleanup-orphans.py --dry-run # preview +python3 cleanup-orphans.py --yes +``` + +All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome. + ## Cleanup Performed (2026-04-23) -Three passes using the scripts in this directory: - -### Pass 1 — Orphans (not in Sonarr at all) +### Pass 1 — Orphans (downloads not in Sonarr) Script: `cleanup-orphans.py` -Deleted 49 entries totalling **461.6G** — downloads with no matching Sonarr series and no series directory on disk. Includes Game of Thrones (all 8 seasons), Sex Education (all 4 seasons), Love Death & Robots (multiple duplicate copies), and various anime episode files. +Two-pass logic: +1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match) +2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review) +3. Delete remaining true orphans -111 entries were SKIPPED (series dir found on disk, needs manual review) — includes Bleach, House, Lucifer, You, Detective Conan episodes, What If, etc. See cleanup.log for full list. +Result: **49 deleted, 461.6G freed, 0 failed** + +111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list. + +Notable orphans deleted: +- Game of Thrones S01–S08 (~267G) — removed from Sonarr +- Sex Education S01–S04 (~110G) — removed from Sonarr +- Love Death & Robots (multiple duplicate copies, ~45G) +- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc. ### Pass 2 — Confirmed-imported Sonarr downloads -Script: `cleanup.py --arr sonarr` +Script: `cleanup.py --arr sonarr --yes` -Deleted **1106 entries**, 0 failed. These were downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk at the time of `verify.py` run. +Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk. + +Result: **1106 deleted, 0 failed** ### Pass 3 — Confirmed-imported Radarr downloads -Script: `cleanup.py --arr radarr` +Script: `cleanup.py --arr radarr --yes` -Deleted **259 entries**, 0 failed. These were downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk. +Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk. -### Totals -| Pass | Entries | Space | -|------|---------|-------| -| Orphans (cleanup-orphans.py) | 49 | ~461G | -| Sonarr imports (cleanup.py) | 1106 | ~12T (estimated) | -| Radarr imports (cleanup.py) | 259 | ~4T (estimated) | -| **Total** | **1414** | **~16T freed** | +Result: **259 deleted, 0 failed** -All deletions logged to `cleanup.log` with UTC timestamp, size, title, path, outcome. +### Summary +| Pass | Script | Entries | Space freed | +|------|--------|---------|-------------| +| Orphans | `cleanup-orphans.py` | 49 | ~461G | +| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) | +| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) | +| **Total** | | **1414** | **~16T** | -## Verification Results (via API + disk path check) +## Verification Results (from verify.py run before cleanup) -API keys stored in `../sonarr.api.env` and `../radarr.api.env`. -Access via `kubectl -n arr-stack port-forward svc/sonarr 8989:8989` and `svc/radarr 7878:7878`. +| | Safe to delete | Not imported | Path missing | Orphans (no API match) | +|---|---|---|---|---| +| **Sonarr** (1439 downloads) | 1106 | — | — | 333 | +| **Radarr** (289 downloads) | 265 | — | — | 25 | -Container path mappings: -- Sonarr `/tv/` → `/media/series/` -- Radarr `/movies/` → `/media/movies/` +Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333. -| | Safe to delete | Orphans (not in arr) | Keep | -|---|---|---|---| -| **Radarr** (289 items, ~5.2T) | **265** | 25 | 0 | -| **Sonarr** (1439 items, ~17T) | **1106** | 333 | 0 | - -"Safe to delete" = API confirms `hasFile=True` (Radarr) or `episodeFileCount > 0` (Sonarr), AND the reported file/directory path was verified to exist on disk via SSH. - -### Radarr Orphans (25) — not matched in Radarr, not deleted +### Radarr Orphans (25) — not matched, not deleted - Constantine (2005) - Cowboy Bebop: Knockin' on Heaven's Door (2001) - Les Misérables (2012) - Pokémon Detective Pikachu (2019) - Code Geass: Fukkatsu no Lelouch (2019) - Eiga Go-Toubun no Hanayome (2022) -- Gisaengchung / Parasite (Korean title — matching failure) -- Dune: Part One (2021) — matching failure, is in Radarr -- Harry Potter (older/duplicate copies — matching failure) +- Gisaengchung / Parasite — Korean title, matching failure +- Dune: Part One (2021) — matching failure, confirmed in Radarr +- Harry Potter older/duplicate copies — matching failure - Porco Rosso / Kurenai no buta — matching failure - Castle in the Sky / Laputa — matching failure - Steins;Gate: The Movie — matching failure @@ -115,32 +160,41 @@ Container path mappings: - Fantastic Four (2025) extra copies (3) - JJK DCP trailer file -### 6 Radarr "path mismatch" entries (all confirmed safe, deleted) -Flagged due to path comparison artifacts, manually verified on disk: -- Star Wars Episode IV/V/VI/IX — each is a separate Radarr entry; all directories exist -- WALL·E — `·` middle-dot character caused comparison failure; file exists +### Path mismatch entries (confirmed safe, deleted anyway) +- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist +- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk ## Pending Decisions ### Bleach USBD Remux TL (1.8T) `/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00–S16 (-ZR- group). -Currently in SKIPPED (series dir `/media/series/Bleach (2004) {imdb-tt0434665}/` exists, 310G imported). -Most seasons were imported from x265 Bluray packs (-iVy group) rather than from this remux. -S11 has no imported content at all. S13, S14 partially imported. -Decision: keep (for quality imports once disk freed) or delete (free 1.8T, accept x265 quality). -See memory file for full per-season breakdown. + +Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported). + +Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported. + +Options: +- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired +- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed + +Per-season breakdown saved in memory. ### SKIPPED downloads (111 Sonarr entries) -Downloads where the series directory exists on disk but the series is not currently in Sonarr. -Likely removed series (House, Lucifer, You, Black Clover, etc.) or ongoing shows with stale episodes. -These need manual review — series may have been intentionally removed from Sonarr. +Downloads where a matching series directory exists on disk but the series is not in Sonarr. +Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies. +Needs manual review per series before deleting. + +## Permanent Fix (not applied) + +Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks: -## Fix (not applied — future reference) -Mount per-HDD NFS paths instead of the mergerfs path, so downloads and media share the same physical filesystem and hardlinks work: ```yaml -# sonarr/radarr/qtun deployments — change NFS path from: -path: /media/downloads → path: /mnt/hdd0/downloads -path: /media/series → path: /mnt/hdd0/series -path: /media/movies → path: /mnt/hdd0/movies +# In sonarr/radarr/qtun deployments, change: +path: /media/downloads → path: /mnt/hdd0/downloads +path: /media/series → path: /mnt/hdd0/series +path: /media/movies → path: /mnt/hdd0/movies ``` -Jellyfin/Plex continue reading from `/media/` (mergerfs). New imports hardlink within hdd0. + +Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space. + +Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed. diff --git a/docs/runbooks/arr-cleanup/verify.py b/docs/runbooks/arr-cleanup/verify.py index fbd7d8b..e675a1b 100644 --- a/docs/runbooks/arr-cleanup/verify.py +++ b/docs/runbooks/arr-cleanup/verify.py @@ -8,7 +8,7 @@ Requirements: kubectl -n arr-stack port-forward svc/sonarr 8989:8989 kubectl -n arr-stack port-forward svc/radarr 7878:7878 - SSH access to aya01 - - API keys in ../sonarr.api.env and ../radarr.api.env + - API keys in ../../../../sonarr.api.env and ../../../../radarr.api.env Output: /tmp/arr_verified.json — full structured results for use by cleanup.py @@ -28,7 +28,7 @@ SSH_HOST = "aya01" script_dir = os.path.dirname(os.path.abspath(__file__)) def load_key(filename): - path = os.path.join(script_dir, '..', filename) + path = os.path.join(script_dir, '../../../..', filename) return open(path).read().strip() SONARR_KEY = load_key('sonarr.api.env')