docs(arr-cleanup): improve runbook and fix api key paths

Rewrites findings.md with how-to section, cleaner summary tables,
and more detailed per-pass results. Fixes relative path for
sonarr/radarr API key files after runbook moved deeper in repo.
This commit is contained in:
Tuan-Dat Tran
2026-04-27 21:39:28 +02:00
parent 95715c7748
commit 5b44c46e10
3 changed files with 111 additions and 57 deletions

View File

@@ -32,7 +32,7 @@ SERIES_ROOT = "/media/series"
script_dir = os.path.dirname(os.path.abspath(__file__))
LOG_FILE = os.path.join(script_dir, "cleanup.log")
with open(os.path.join(script_dir, '..', 'sonarr.api.env')) as f:
with open(os.path.join(script_dir, '../../../..', 'sonarr.api.env')) as f:
SONARR_KEY = f.read().strip()

View File

@@ -34,7 +34,9 @@
## Root Cause: No Hardlinks → All Imports Are Copies
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/` and by inode comparison of 1365 download/media file pairs (0 shared inodes found).
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/`
2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy)
**All three services mount the mergerfs `/media/` path via NFS:**
@@ -48,63 +50,106 @@ qbit: NFS 192.168.20.12:/media/downloads → /downloads
mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr).
## How to Run
Prerequisites:
```bash
# Port-forward Sonarr and Radarr APIs
kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
```
API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env`
(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo).
Container path mappings used in scripts:
- Sonarr: `/tv/``/media/series/`
- Radarr: `/movies/``/media/movies/`
### Step 1 — Verify (generates `/tmp/arr_verified.json`)
```bash
python3 verify.py
```
Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`.
### Step 2 — Delete confirmed-imported downloads
```bash
python3 cleanup.py --dry-run # preview
python3 cleanup.py --arr sonarr --yes
python3 cleanup.py --arr radarr --yes
```
### Step 3 — Delete orphans (downloads not in Sonarr at all)
```bash
python3 cleanup-orphans.py --dry-run # preview
python3 cleanup-orphans.py --yes
```
All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome.
## Cleanup Performed (2026-04-23)
Three passes using the scripts in this directory:
### Pass 1 — Orphans (not in Sonarr at all)
### Pass 1 — Orphans (downloads not in Sonarr)
Script: `cleanup-orphans.py`
Deleted 49 entries totalling **461.6G** — downloads with no matching Sonarr series and no series directory on disk. Includes Game of Thrones (all 8 seasons), Sex Education (all 4 seasons), Love Death & Robots (multiple duplicate copies), and various anime episode files.
Two-pass logic:
1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review)
3. Delete remaining true orphans
111 entries were SKIPPED (series dir found on disk, needs manual review) — includes Bleach, House, Lucifer, You, Detective Conan episodes, What If, etc. See cleanup.log for full list.
Result: **49 deleted, 461.6G freed, 0 failed**
111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list.
Notable orphans deleted:
- Game of Thrones S01S08 (~267G) — removed from Sonarr
- Sex Education S01S04 (~110G) — removed from Sonarr
- Love Death & Robots (multiple duplicate copies, ~45G)
- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
### Pass 2 — Confirmed-imported Sonarr downloads
Script: `cleanup.py --arr sonarr`
Script: `cleanup.py --arr sonarr --yes`
Deleted **1106 entries**, 0 failed. These were downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk at the time of `verify.py` run.
Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk.
Result: **1106 deleted, 0 failed**
### Pass 3 — Confirmed-imported Radarr downloads
Script: `cleanup.py --arr radarr`
Script: `cleanup.py --arr radarr --yes`
Deleted **259 entries**, 0 failed. These were downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
### Totals
| Pass | Entries | Space |
|------|---------|-------|
| Orphans (cleanup-orphans.py) | 49 | ~461G |
| Sonarr imports (cleanup.py) | 1106 | ~12T (estimated) |
| Radarr imports (cleanup.py) | 259 | ~4T (estimated) |
| **Total** | **1414** | **~16T freed** |
Result: **259 deleted, 0 failed**
All deletions logged to `cleanup.log` with UTC timestamp, size, title, path, outcome.
### Summary
| Pass | Script | Entries | Space freed |
|------|--------|---------|-------------|
| Orphans | `cleanup-orphans.py` | 49 | ~461G |
| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) |
| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) |
| **Total** | | **1414** | **~16T** |
## Verification Results (via API + disk path check)
## Verification Results (from verify.py run before cleanup)
API keys stored in `../sonarr.api.env` and `../radarr.api.env`.
Access via `kubectl -n arr-stack port-forward svc/sonarr 8989:8989` and `svc/radarr 7878:7878`.
| | Safe to delete | Not imported | Path missing | Orphans (no API match) |
|---|---|---|---|---|
| **Sonarr** (1439 downloads) | 1106 | — | — | 333 |
| **Radarr** (289 downloads) | 265 | — | — | 25 |
Container path mappings:
- Sonarr `/tv/``/media/series/`
- Radarr `/movies/``/media/movies/`
Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333.
| | Safe to delete | Orphans (not in arr) | Keep |
|---|---|---|---|
| **Radarr** (289 items, ~5.2T) | **265** | 25 | 0 |
| **Sonarr** (1439 items, ~17T) | **1106** | 333 | 0 |
"Safe to delete" = API confirms `hasFile=True` (Radarr) or `episodeFileCount > 0` (Sonarr), AND the reported file/directory path was verified to exist on disk via SSH.
### Radarr Orphans (25) — not matched in Radarr, not deleted
### Radarr Orphans (25) — not matched, not deleted
- Constantine (2005)
- Cowboy Bebop: Knockin' on Heaven's Door (2001)
- Les Misérables (2012)
- Pokémon Detective Pikachu (2019)
- Code Geass: Fukkatsu no Lelouch (2019)
- Eiga Go-Toubun no Hanayome (2022)
- Gisaengchung / Parasite (Korean title matching failure)
- Dune: Part One (2021) — matching failure, is in Radarr
- Harry Potter (older/duplicate copies — matching failure)
- Gisaengchung / Parasite Korean title, matching failure
- Dune: Part One (2021) — matching failure, confirmed in Radarr
- Harry Potter older/duplicate copies — matching failure
- Porco Rosso / Kurenai no buta — matching failure
- Castle in the Sky / Laputa — matching failure
- Steins;Gate: The Movie — matching failure
@@ -115,32 +160,41 @@ Container path mappings:
- Fantastic Four (2025) extra copies (3)
- JJK DCP trailer file
### 6 Radarr "path mismatch" entries (all confirmed safe, deleted)
Flagged due to path comparison artifacts, manually verified on disk:
- Star Wars Episode IV/V/VI/IX — each is a separate Radarr entry; all directories exist
- WALL·E — `·` middle-dot character caused comparison failure; file exists
### Path mismatch entries (confirmed safe, deleted anyway)
- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk
## Pending Decisions
### Bleach USBD Remux TL (1.8T)
`/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00S16 (-ZR- group).
Currently in SKIPPED (series dir `/media/series/Bleach (2004) {imdb-tt0434665}/` exists, 310G imported).
Most seasons were imported from x265 Bluray packs (-iVy group) rather than from this remux.
S11 has no imported content at all. S13, S14 partially imported.
Decision: keep (for quality imports once disk freed) or delete (free 1.8T, accept x265 quality).
See memory file for full per-season breakdown.
Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported).
Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
Options:
- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
Per-season breakdown saved in memory.
### SKIPPED downloads (111 Sonarr entries)
Downloads where the series directory exists on disk but the series is not currently in Sonarr.
Likely removed series (House, Lucifer, You, Black Clover, etc.) or ongoing shows with stale episodes.
These need manual review series may have been intentionally removed from Sonarr.
Downloads where a matching series directory exists on disk but the series is not in Sonarr.
Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies.
Needs manual review per series before deleting.
## Permanent Fix (not applied)
Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
## Fix (not applied — future reference)
Mount per-HDD NFS paths instead of the mergerfs path, so downloads and media share the same physical filesystem and hardlinks work:
```yaml
# sonarr/radarr/qtun deployments change NFS path from:
path: /media/downloads path: /mnt/hdd0/downloads
path: /media/series path: /mnt/hdd0/series
path: /media/movies path: /mnt/hdd0/movies
# In sonarr/radarr/qtun deployments, change:
path: /media/downloads path: /mnt/hdd0/downloads
path: /media/series path: /mnt/hdd0/series
path: /media/movies path: /mnt/hdd0/movies
```
Jellyfin/Plex continue reading from `/media/` (mergerfs). New imports hardlink within hdd0.
Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.

View File

@@ -8,7 +8,7 @@ Requirements:
kubectl -n arr-stack port-forward svc/sonarr 8989:8989
kubectl -n arr-stack port-forward svc/radarr 7878:7878
- SSH access to aya01
- API keys in ../sonarr.api.env and ../radarr.api.env
- API keys in ../../../../sonarr.api.env and ../../../../radarr.api.env
Output:
/tmp/arr_verified.json — full structured results for use by cleanup.py
@@ -28,7 +28,7 @@ SSH_HOST = "aya01"
script_dir = os.path.dirname(os.path.abspath(__file__))
def load_key(filename):
path = os.path.join(script_dir, '..', filename)
path = os.path.join(script_dir, '../../../..', filename)
return open(path).read().strip()
SONARR_KEY = load_key('sonarr.api.env')