Rewrites findings.md with how-to section, cleaner summary tables, and more detailed per-pass results. Fixes relative path for sonarr/radarr API key files after runbook moved deeper in repo.
201 lines
8.1 KiB
Markdown
201 lines
8.1 KiB
Markdown
# arr-stack Downloads Cleanup — Investigation Findings
|
||
|
||
## Storage Layout (aya01)
|
||
|
||
| Device | FS | Size | Used | Mount |
|
||
|--------|----|------|------|-------|
|
||
| `/dev/sdc3` | btrfs | 1.9T | 177G (10%) | `/` (system) |
|
||
| `/dev/sda1` | btrfs `proxmox` | 2.8T | 1.3T (48%) | `/opt` |
|
||
| `/dev/sdd1` | ext4 | 17T | 15T (92%) | `/mnt/hdd0` |
|
||
| `/dev/sde1` | ext4 | 17T | 15T (92%) | `/mnt/hdd2` |
|
||
| `/dev/sdf1` | ext4 | 17T | 15T (92%) | `/mnt/hdd1` |
|
||
| `mergerfs` | fuse | 49T | 43T (92%) | `/media` |
|
||
|
||
`/media` is a mergerfs union of hdd0 + hdd1 + hdd2. All three HDDs were at ~92% capacity before cleanup.
|
||
|
||
**After cleanup (2026-04-23):**
|
||
|
||
| Device | Used | Avail | Use% |
|
||
|--------|------|-------|------|
|
||
| `/dev/sdd1` (hdd0) | 9.4T | 6.2T | 61% |
|
||
| `/dev/sdf1` (hdd1) | 9.3T | 6.3T | 60% |
|
||
| `/dev/sde1` (hdd2) | 7.8T | 7.8T | 51% |
|
||
| `mergerfs /media` | 27T | 21T | 57% |
|
||
|
||
**~16T freed total** (92% → 57% on the mergerfs pool).
|
||
|
||
## /media Breakdown (before cleanup)
|
||
|
||
| Directory | Size |
|
||
|-----------|------|
|
||
| `downloads` | **22T** |
|
||
| `series` | 16T |
|
||
| `movies` | 5T |
|
||
|
||
## Root Cause: No Hardlinks → All Imports Are Copies
|
||
|
||
Zero hardlinked files exist anywhere across all three HDDs. Confirmed by two methods:
|
||
1. Inspecting the Kubernetes manifests in `argocd-homelab/services/arr-stack/`
|
||
2. Inode comparison of 1365 download/media file pairs — **0 shared inodes found** (every file is a distinct copy)
|
||
|
||
**All three services mount the mergerfs `/media/` path via NFS:**
|
||
|
||
```
|
||
sonarr: NFS 192.168.20.12:/media/downloads → /downloads
|
||
NFS 192.168.20.12:/media/series → /tv
|
||
radarr: NFS 192.168.20.12:/media/downloads → /downloads
|
||
NFS 192.168.20.12:/media/movies → /movies
|
||
qbit: NFS 192.168.20.12:/media/downloads → /downloads
|
||
```
|
||
|
||
mergerfs does not support hardlinks across underlying filesystems. When qBit downloads to `/media/downloads/sonarr/` (lands on e.g. hdd1) and Sonarr imports to `/media/series/` (lands on e.g. hdd0), the hardlink attempt crosses a physical disk boundary → falls back to copy. Every import doubles the data.
|
||
|
||
**Estimated wasted space before cleanup: ~21T** (the entire downloads/sonarr + downloads/radarr).
|
||
|
||
## How to Run
|
||
|
||
Prerequisites:
|
||
```bash
|
||
# Port-forward Sonarr and Radarr APIs
|
||
kubectl -n arr-stack port-forward svc/sonarr 8989:8989 &
|
||
kubectl -n arr-stack port-forward svc/radarr 7878:7878 &
|
||
```
|
||
|
||
API keys are loaded from `../../../../sonarr.api.env` and `../../../../radarr.api.env`
|
||
(i.e. `/home/tudattr/workspace/infra/sonarr.api.env` relative to this repo).
|
||
|
||
Container path mappings used in scripts:
|
||
- Sonarr: `/tv/` → `/media/series/`
|
||
- Radarr: `/movies/` → `/media/movies/`
|
||
|
||
### Step 1 — Verify (generates `/tmp/arr_verified.json`)
|
||
```bash
|
||
python3 verify.py
|
||
```
|
||
Cross-references all downloads against Sonarr/Radarr APIs, verifies reported file paths exist on disk via SSH. Classifies each entry as `safe`, `not_imported`, or `path_missing`.
|
||
|
||
### Step 2 — Delete confirmed-imported downloads
|
||
```bash
|
||
python3 cleanup.py --dry-run # preview
|
||
python3 cleanup.py --arr sonarr --yes
|
||
python3 cleanup.py --arr radarr --yes
|
||
```
|
||
|
||
### Step 3 — Delete orphans (downloads not in Sonarr at all)
|
||
```bash
|
||
python3 cleanup-orphans.py --dry-run # preview
|
||
python3 cleanup-orphans.py --yes
|
||
```
|
||
|
||
All actions are logged to `cleanup.log` with UTC timestamp, size, title, path, and outcome.
|
||
|
||
## Cleanup Performed (2026-04-23)
|
||
|
||
### Pass 1 — Orphans (downloads not in Sonarr)
|
||
Script: `cleanup-orphans.py`
|
||
|
||
Two-pass logic:
|
||
1. Match each download name against Sonarr API (title, slug, sortTitle, alternate titles, partial match)
|
||
2. If no API match, check if a series directory with a similar name exists in `/media/series/` — if it does, skip (needs manual review)
|
||
3. Delete remaining true orphans
|
||
|
||
Result: **49 deleted, 461.6G freed, 0 failed**
|
||
|
||
111 entries SKIPPED (series dir found on disk) — includes Bleach, House, Lucifer, You, SpongeBob, Detective Conan episodes, What If, etc. See `cleanup.log` for full list.
|
||
|
||
Notable orphans deleted:
|
||
- Game of Thrones S01–S08 (~267G) — removed from Sonarr
|
||
- Sex Education S01–S04 (~110G) — removed from Sonarr
|
||
- Love Death & Robots (multiple duplicate copies, ~45G)
|
||
- Senpai is an Otokonoko, Wind Breaker, Wistoria, Hibike! Euphonium S3 episodes, etc.
|
||
|
||
### Pass 2 — Confirmed-imported Sonarr downloads
|
||
Script: `cleanup.py --arr sonarr --yes`
|
||
|
||
Deleted downloads where Sonarr confirmed `episodeFileCount > 0` AND the series directory was verified to exist on disk.
|
||
|
||
Result: **1106 deleted, 0 failed**
|
||
|
||
### Pass 3 — Confirmed-imported Radarr downloads
|
||
Script: `cleanup.py --arr radarr --yes`
|
||
|
||
Deleted downloads where Radarr confirmed `hasFile=True` AND the file/directory path was verified to exist on disk.
|
||
|
||
Result: **259 deleted, 0 failed**
|
||
|
||
### Summary
|
||
| Pass | Script | Entries | Space freed |
|
||
|------|--------|---------|-------------|
|
||
| Orphans | `cleanup-orphans.py` | 49 | ~461G |
|
||
| Sonarr imports | `cleanup.py --arr sonarr` | 1106 | ~12T (estimated) |
|
||
| Radarr imports | `cleanup.py --arr radarr` | 259 | ~4T (estimated) |
|
||
| **Total** | | **1414** | **~16T** |
|
||
|
||
## Verification Results (from verify.py run before cleanup)
|
||
|
||
| | Safe to delete | Not imported | Path missing | Orphans (no API match) |
|
||
|---|---|---|---|---|
|
||
| **Sonarr** (1439 downloads) | 1106 | — | — | 333 |
|
||
| **Radarr** (289 downloads) | 265 | — | — | 25 |
|
||
|
||
Note: `cleanup-orphans.py` uses more aggressive title matching (alternate titles, partial match) than `verify.py`, so its orphan count (160 not-in-Sonarr out of 1438) is lower than `verify.py`'s 333.
|
||
|
||
### Radarr Orphans (25) — not matched, not deleted
|
||
- Constantine (2005)
|
||
- Cowboy Bebop: Knockin' on Heaven's Door (2001)
|
||
- Les Misérables (2012)
|
||
- Pokémon Detective Pikachu (2019)
|
||
- Code Geass: Fukkatsu no Lelouch (2019)
|
||
- Eiga Go-Toubun no Hanayome (2022)
|
||
- Gisaengchung / Parasite — Korean title, matching failure
|
||
- Dune: Part One (2021) — matching failure, confirmed in Radarr
|
||
- Harry Potter older/duplicate copies — matching failure
|
||
- Porco Rosso / Kurenai no buta — matching failure
|
||
- Castle in the Sky / Laputa — matching failure
|
||
- Steins;Gate: The Movie — matching failure
|
||
- Project Silence / Talchul — matching failure
|
||
- Digimon: Frontier & Savers films
|
||
- One Piece films (several)
|
||
- Paripi Koumei movie
|
||
- Fantastic Four (2025) extra copies (3)
|
||
- JJK DCP trailer file
|
||
|
||
### Path mismatch entries (confirmed safe, deleted anyway)
|
||
- Star Wars Episode IV/V/VI/IX — all matched to Episode IV record; manually confirmed all 4 dirs exist
|
||
- WALL·E — `·` middle-dot (U+00B7) broke string comparison; file confirmed on disk
|
||
|
||
## Pending Decisions
|
||
|
||
### Bleach USBD Remux TL (1.8T)
|
||
`/media/downloads/sonarr/Bleach USBD Remux TL` — full lossless Bluray remux S00–S16 (-ZR- group).
|
||
|
||
Currently SKIPPED — `/media/series/Bleach (2004) {imdb-tt0434665}/` exists (310G imported).
|
||
|
||
Most seasons were imported from lighter x265 Bluray packs (`Bleach S0x Bluray EAC3 2.0 1080p x265-iVy`) rather than this remux. S11 has no imported content. S13 and S14 partially imported.
|
||
|
||
Options:
|
||
- **Delete** — free 1.8T, imported x265 content stays, re-download at remux quality later if desired
|
||
- **Keep** — retain as source for Sonarr to import remaining episodes at lossless quality now that disk space is freed
|
||
|
||
Per-season breakdown saved in memory.
|
||
|
||
### SKIPPED downloads (111 Sonarr entries)
|
||
Downloads where a matching series directory exists on disk but the series is not in Sonarr.
|
||
Likely intentionally removed series (House, Lucifer, You, Black Clover, etc.) with leftover download copies.
|
||
Needs manual review per series before deleting.
|
||
|
||
## Permanent Fix (not applied)
|
||
|
||
Mount per-HDD NFS paths instead of the mergerfs path, so qBit downloads and arr imports land on the same physical filesystem, enabling hardlinks:
|
||
|
||
```yaml
|
||
# In sonarr/radarr/qtun deployments, change:
|
||
path: /media/downloads → path: /mnt/hdd0/downloads
|
||
path: /media/series → path: /mnt/hdd0/series
|
||
path: /media/movies → path: /mnt/hdd0/movies
|
||
```
|
||
|
||
Jellyfin/Plex keep reading from `/media/` (mergerfs union). New imports hardlink within hdd0, wasting no extra space.
|
||
|
||
Tradeoff: all new content lands on hdd0 only. Load balancing across the three disks stops working for new downloads. Once hdd0 fills up a migration strategy is needed.
|