Rewrites findings.md with how-to section, cleaner summary tables, and more detailed per-pass results. Fixes relative path for sonarr/radarr API key files after runbook moved deeper in repo.
8.1 KiB
arr-stack Downloads Cleanup — Investigation Findings
Storage Layout (aya01)
| Device | FS | Size | Used | Mount |
|---|---|---|---|---|
/dev/sdc3 |
btrfs | 1.9T | 177G (10%) | / (system) |
/dev/sda1 |
btrfs proxmox |
2.8T | 1.3T (48%) | /opt |
/dev/sdd1 |
ext4 | 17T | 15T (92%) | /mnt/hdd0 |
/dev/sde1 |
ext4 | 17T | 15T (92%) | /mnt/hdd2 |
/dev/sdf1 |
ext4 | 17T | 15T (92%) | /mnt/hdd1 |
mergerfs |
fuse | 49T | 43T (92%) | /media |
/media is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup.
After cleanup (2026-04-23):
| Device | Used | Avail | Use% |
|---|---|---|---|
/dev/sdd1 (hdd0) |
9.4T | 6.2T | 61% |
/dev/sdf1 (hdd1) |
9.3T | 6.3T | 60% |
/dev/sde1 (hdd2) |
7.8T | 7.8T | 51% |
mergerfs /media |
27T | 21T | 57% |
~16T freed total (92% → 57% on the mergerfs pool).
/media Breakdown (before cleanup)
| Directory | Size |
|---|---|
downloads |
22T |
series |
16T |
movies |
5T |
Root Cause: No Hardlinks → All Imports Are Copies
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
- Inspecting the Kubernetes manifests in
argocd-homelab/services/arr-stack/ - Inode comparison of 1365 download/media file pairs — 0 shared inodes found (every file is a distinct copy)
All three services mount the mergerfs /media/ path via NFS:
sonarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/series → /tv
radarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/movies → /movies
qbit: NFS 192.168.20.12:/media/downloads → /downloads
mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to /media/downloads/sonarr/ (lands on e.g. hdd1) and Sonarr imports to /media/series/ (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
Estimated wasted space before cleanup: ~21T (the entire downloads/sonarr + downloads/radarr).
How to Run
Prerequisites:
# Port-forward Sonarr and Radarr APIs
kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
API keys are loaded from ../../../../sonarr.api.env and ../../../../radarr.api.env
(i.e. /home/tudattr/workspace/infra/sonarr.api.env relative to this repo).
Container path mappings used in scripts:
- Sonarr:
/tv/→/media/series/ - Radarr:
/movies/→/media/movies/
Step 1 — Verify (generates /tmp/arr_verified.json)
python3 verify.py
Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as safe, not_imported, or path_missing.
Step 2 — Delete confirmed-imported downloads
python3 cleanup.py --dry-run # preview
python3 cleanup.py --arr sonarr --yes
python3 cleanup.py --arr radarr --yes
Step 3 — Delete orphans (downloads not in Sonarr at all)
python3 cleanup-orphans.py --dry-run # preview
python3 cleanup-orphans.py --yes
All actions are logged to cleanup.log with UTC timestamp, size, title, path, and outcome.
Cleanup Performed (2026-04-23)
Pass 1 — Orphans (downloads not in Sonarr)
Script: cleanup-orphans.py
Two-pass logic:
- Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
- If no API match, check if a series directory with a similar name exists in
/media/series/— if it does, skip (needs manual review) - Delete remaining true orphans
Result: 49 deleted, 461.6G freed, 0 failed
111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See cleanup.log for full list.
Notable orphans deleted:
- Game of Thrones S01–S08 (~267G) — removed from Sonarr
- Sex Education S01–S04 (~110G) — removed from Sonarr
- Love Death & Robots (multiple duplicate copies, ~45G)
- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
Pass 2 — Confirmed-imported Sonarr downloads
Script: cleanup.py --arr sonarr --yes
Deleted downloads where Sonarr confirmed episodeFileCount > 0 AND the series directory was verified to exist on disk.
Result: 1106 deleted, 0 failed
Pass 3 — Confirmed-imported Radarr downloads
Script: cleanup.py --arr radarr --yes
Deleted downloads where Radarr confirmed hasFile=True AND the file/directory path was verified to exist on disk.
Result: 259 deleted, 0 failed
Summary
| Pass | Script | Entries | Space freed |
|---|---|---|---|
| Orphans | cleanup-orphans.py |
49 | ~461G |
| Sonarr imports | cleanup.py --arr sonarr |
1106 | ~12T (estimated) |
| Radarr imports | cleanup.py --arr radarr |
259 | ~4T (estimated) |
| Total | 1414 | ~16T |
Verification Results (from verify.py run before cleanup)
| Safe to delete | Not imported | Path missing | Orphans (no API match) | |
|---|---|---|---|---|
| Sonarr (1439 downloads) | 1106 | — | — | 333 |
| Radarr (289 downloads) | 265 | — | — | 25 |
Note: cleanup-orphans.py uses more aggressive title matching (alternate titles, partial match) than verify.py, so its orphan count (160 not-in-Sonarr out of 1438) is lower than verify.py's 333.
Radarr Orphans (25) — not matched, not deleted
- Constantine (2005)
- Cowboy Bebop: Knockin' on Heaven's Door (2001)
- Les Misérables (2012)
- Pokémon Detective Pikachu (2019)
- Code Geass: Fukkatsu no Lelouch (2019)
- Eiga Go-Toubun no Hanayome (2022)
- Gisaengchung / Parasite — Korean title, matching failure
- Dune: Part One (2021) — matching failure, confirmed in Radarr
- Harry Potter older/duplicate copies — matching failure
- Porco Rosso / Kurenai no buta — matching failure
- Castle in the Sky / Laputa — matching failure
- Steins;Gate: The Movie — matching failure
- Project Silence / Talchul — matching failure
- Digimon: Frontier & Savers films
- One Piece films (several)
- Paripi Koumei movie
- Fantastic Four (2025) extra copies (3)
- JJK DCP trailer file
Path mismatch entries (confirmed safe, deleted anyway)
- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
- WALL·E —
·middle-dot (U+00B7) broke string comparison; file confirmed on disk
Pending Decisions
Bleach USBD Remux TL (1.8T)
/media/downloads/sonarr/Bleach USBD Remux TL — full lossless Bluray remux S00–S16 (-ZR- group).
Currently SKIPPED — /media/series/Bleach (2004) {imdb-tt0434665}/ exists (310G imported).
Most seasons were imported from lighter x265 Bluray packs (Bleach S0x Bluray EAC3 2.0 1080p x265-iVy) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
Options:
- Delete — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
- Keep — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
Per-season breakdown saved in memory.
SKIPPED downloads (111 Sonarr entries)
Downloads where a matching series directory exists on disk but the series is not in Sonarr. Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies. Needs manual review per series before deleting.
Permanent Fix (not applied)
Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
# In sonarr/radarr/qtun deployments, change:
path: /media/downloads → path: /mnt/hdd0/downloads
path: /media/series → path: /mnt/hdd0/series
path: /media/movies → path: /mnt/hdd0/movies
Jellyfin/Plex keep reading from /media/ (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.