feat: smart batch processing with skip logic
- Change --batch to accept directory instead of glob pattern - Automatically skip already-processed scan dates - Add --force flag to reprocess all files - Fix date extraction regex to parse from client info line - Display helpful tips about skipping/forcing - Better user feedback with skip counts and suggestions Usage: python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results This will process only new scans, skipping any dates already in the output.
This commit is contained in:
parent
d6793e2572
commit
b046af5d25
3 changed files with 342 additions and 38 deletions
48
README.md
48
README.md
|
|
@ -66,7 +66,16 @@ python dexa_extract.py <PDF_PATH> --height-in <HEIGHT> [--weight-lb <WEIGHT>] [-
|
|||
python dexa_extract.py data/pdfs/2025-10-06-scan.pdf --height-in 74 --weight-lb 212 --outdir data/results
|
||||
```
|
||||
|
||||
**Process multiple scans** (appends to existing files):
|
||||
**Batch process multiple scans:**
|
||||
```bash
|
||||
# Process all PDFs in a directory (automatically skips already-processed dates)
|
||||
python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results
|
||||
|
||||
# Force reprocessing all files
|
||||
python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results --force
|
||||
```
|
||||
|
||||
**Individual scans** (appends to existing files):
|
||||
```bash
|
||||
python dexa_extract.py data/pdfs/scan-2025-01.pdf --height-in 74 --outdir data/results
|
||||
python dexa_extract.py data/pdfs/scan-2025-04.pdf --height-in 74 --outdir data/results
|
||||
|
|
@ -247,10 +256,35 @@ Higher trunk percentage may indicate good core development, while higher leg per
|
|||
|
||||
The script appends data to existing CSV files, making it easy to track changes over time:
|
||||
|
||||
1. Place all your DEXA PDFs in `data/pdfs/`
|
||||
2. Process each one with the same output directory
|
||||
3. Open `overall.csv` in Excel/Google Sheets to visualize trends
|
||||
4. Compare `muscle_balance.csv` to track left/right symmetry improvements
|
||||
### Option 1: Batch Processing (Recommended)
|
||||
```bash
|
||||
# Place all your PDFs in one directory
|
||||
data/pdfs/
|
||||
├── scan-2025-01-15.pdf
|
||||
├── scan-2025-04-20.pdf
|
||||
└── scan-2025-10-06.pdf
|
||||
|
||||
# Process all at once (automatically skips already-processed dates)
|
||||
python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results
|
||||
|
||||
# Add new scans later - only new ones will be processed
|
||||
cp ~/Downloads/scan-2025-12-15.pdf data/pdfs/
|
||||
python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results
|
||||
```
|
||||
|
||||
### Option 2: Individual Processing
|
||||
```bash
|
||||
# Process scans as you get them
|
||||
python dexa_extract.py data/pdfs/scan-2025-01.pdf --height-in 74 --outdir data/results
|
||||
python dexa_extract.py data/pdfs/scan-2025-04.pdf --height-in 74 --outdir data/results
|
||||
python dexa_extract.py data/pdfs/scan-2025-10.pdf --height-in 74 --outdir data/results
|
||||
```
|
||||
|
||||
### Analyzing Results
|
||||
1. Open `overall.csv` in Excel/Google Sheets to visualize trends
|
||||
2. Compare `muscle_balance.csv` to track left/right symmetry improvements
|
||||
3. Review `summary.md` for readable reports of each scan
|
||||
4. Use `overall.json` for programmatic analysis
|
||||
|
||||
## Privacy & Security
|
||||
|
||||
|
|
@ -281,12 +315,12 @@ The script appends data to existing CSV files, making it easy to track changes o
|
|||
|
||||
Contributions welcome! Areas for improvement:
|
||||
|
||||
- [ ] Enhanced error handling and validation
|
||||
- [ ] Automatic height detection from PDF
|
||||
- [ ] Data visualization/plotting features
|
||||
- [ ] GUI interface for non-technical users
|
||||
- [ ] Batch processing multiple PDFs at once
|
||||
- [ ] Export to additional formats (Excel, SQLite, etc.)
|
||||
- [ ] Support for older BodySpec PDF formats
|
||||
- [ ] Progress bar for batch processing
|
||||
|
||||
## License
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue