feat: smart batch processing with skip logic

- Change --batch to accept directory instead of glob pattern
- Automatically skip already-processed scan dates
- Add --force flag to reprocess all files
- Fix date extraction regex to parse from client info line
- Display helpful tips about skipping/forcing
- Better user feedback with skip counts and suggestions

Usage:
  python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results

This will process only new scans, skipping any dates already in the output.
This commit is contained in:
Mac DeCourcy 2025-10-06 15:33:05 -07:00
parent d6793e2572
commit b046af5d25
3 changed files with 342 additions and 38 deletions

View file

@ -1,18 +0,0 @@
# Results Directory
Your extracted DEXA data will be saved here by default.
## Output Files
When you run the extraction script with `--outdir data/results`, you'll get:
- `overall.csv` - Time-series data (one row per scan)
- `regional.csv` - Regional body composition
- `muscle_balance.csv` - Left/right limb comparison
- `overall.json` - Structured JSON format
- `summary.md` - Human-readable summary
## Note
⚠️ **Result files are gitignored** - They contain your personal health data and won't be committed to version control.