Files
ansible/docs/runbooks/arr-cleanup/findings.md
Tuan-Dat Tran 5b44c46e10 docs(arr-cleanup): improve runbook and fix api key paths
Rewrites findings.md with how-to section, cleaner summary tables,
and more detailed per-pass results. Fixes relative path for
sonarr/radarr API key files after runbook moved deeper in repo.
2026-04-27 21:39:28 +02:00

201 lines
8.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# arr-stack Downloads Cleanup — Investigation Findings
## Storage Layout (aya01)
| Device | FS | Size | Used | Mount |
|--------|----|------|------|-------|
| `/dev/sdc3` | btrfs | 1.9T | 177G (10%) | `/` (system) |
| `/dev/sda1` | btrfs `proxmox` | 2.8T | 1.3T (48%) | `/opt` |
| `/dev/sdd1` | ext4 | 17T | 15T (92%) | `/mnt/hdd0` |
| `/dev/sde1` | ext4 | 17T | 15T (92%) | `/mnt/hdd2` |
| `/dev/sdf1` | ext4 | 17T | 15T (92%) | `/mnt/hdd1` |
| `mergerfs` | fuse | 49T | 43T (92%) | `/media` |
`/media` is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup.
**After cleanup (2026-04-23):**
| Device | Used | Avail | Use% |
|--------|------|-------|------|
| `/dev/sdd1` (hdd0) | 9.4T | 6.2T | 61% |
| `/dev/sdf1` (hdd1) | 9.3T | 6.3T | 60% |
| `/dev/sde1` (hdd2) | 7.8T | 7.8T | 51% |
| `mergerfs /media` | 27T | 21T | 57% |
**~16T freed total** (92% → 57% on the mergerfs pool).
## /media Breakdown (before cleanup)
| Directory | Size |
|-----------|------|
| `downloads` | **22T** |
| `series` | 16T |
| `movies` | 5T |
## Root Cause: No Hardlinks → All Imports Are Copies
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/`
2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy)
**All three services mount the mergerfs `/media/` path via NFS:**
```
sonarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/series → /tv
radarr: NFS 192.168.20.12:/media/downloads → /downloads
NFS 192.168.20.12:/media/movies → /movies
qbit: NFS 192.168.20.12:/media/downloads → /downloads
```
mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr).
## How to Run
Prerequisites:
```bash
# Port-forward Sonarr and Radarr APIs
kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
```
API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env`
(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo).
Container path mappings used in scripts:
- Sonarr: `/tv/``/media/series/`
- Radarr: `/movies/``/media/movies/`
### Step 1 — Verify (generates `/tmp/arr_verified.json`)
```bash
python3 verify.py
```
Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`.
### Step 2 — Delete confirmed-imported downloads
```bash
python3 cleanup.py --dry-run # preview
python3 cleanup.py --arr sonarr --yes
python3 cleanup.py --arr radarr --yes
```
### Step 3 — Delete orphans (downloads not in Sonarr at all)
```bash
python3 cleanup-orphans.py --dry-run # preview
python3 cleanup-orphans.py --yes
```
All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome.
## Cleanup Performed (2026-04-23)
### Pass 1 — Orphans (downloads not in Sonarr)
Script: `cleanup-orphans.py`
Two-pass logic:
1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review)
3. Delete remaining true orphans
Result: **49 deleted, 461.6G freed, 0 failed**
111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list.
Notable orphans deleted:
- Game of Thrones S01S08 (~267G) — removed from Sonarr
- Sex Education S01S04 (~110G) — removed from Sonarr
- Love Death & Robots (multiple duplicate copies, ~45G)
- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
### Pass 2 — Confirmed-imported Sonarr downloads
Script: `cleanup.py --arr sonarr --yes`
Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk.
Result: **1106 deleted, 0 failed**
### Pass 3 — Confirmed-imported Radarr downloads
Script: `cleanup.py --arr radarr --yes`
Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
Result: **259 deleted, 0 failed**
### Summary
| Pass | Script | Entries | Space freed |
|------|--------|---------|-------------|
| Orphans | `cleanup-orphans.py` | 49 | ~461G |
| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) |
| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) |
| **Total** | | **1414** | **~16T** |
## Verification Results (from verify.py run before cleanup)
| | Safe to delete | Not imported | Path missing | Orphans (no API match) |
|---|---|---|---|---|
| **Sonarr** (1439 downloads) | 1106 | — | — | 333 |
| **Radarr** (289 downloads) | 265 | — | — | 25 |
Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333.
### Radarr Orphans (25) — not matched, not deleted
- Constantine (2005)
- Cowboy Bebop: Knockin' on Heaven's Door (2001)
- Les Misérables (2012)
- Pokémon Detective Pikachu (2019)
- Code Geass: Fukkatsu no Lelouch (2019)
- Eiga Go-Toubun no Hanayome (2022)
- Gisaengchung / Parasite — Korean title, matching failure
- Dune: Part One (2021) — matching failure, confirmed in Radarr
- Harry Potter older/duplicate copies — matching failure
- Porco Rosso / Kurenai no buta — matching failure
- Castle in the Sky / Laputa — matching failure
- Steins;Gate: The Movie — matching failure
- Project Silence / Talchul — matching failure
- Digimon: Frontier & Savers films
- One Piece films (several)
- Paripi Koumei movie
- Fantastic Four (2025) extra copies (3)
- JJK DCP trailer file
### Path mismatch entries (confirmed safe, deleted anyway)
- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk
## Pending Decisions
### Bleach USBD Remux TL (1.8T)
`/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00S16 (-ZR- group).
Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported).
Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
Options:
- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
Per-season breakdown saved in memory.
### SKIPPED downloads (111 Sonarr entries)
Downloads where a matching series directory exists on disk but the series is not in Sonarr.
Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies.
Needs manual review per series before deleting.
## Permanent Fix (not applied)
Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
```yaml
# In sonarr/radarr/qtun deployments, change:
path: /media/downloads → path: /mnt/hdd0/downloads
path: /media/series → path: /mnt/hdd0/series
path: /media/movies → path: /mnt/hdd0/movies
```
Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.