- Change --batch to accept directory instead of glob pattern
- Automatically skip already-processed scan dates
- Add --force flag to reprocess all files
- Fix date extraction regex to parse from client info line
- Display helpful tips about skipping/forcing
- Better user feedback with skip counts and suggestions
Usage:
python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results
This will process only new scans, skipping any dates already in the output.
- Add input validation for PDF files, height, and weight
- Validate PDF file exists, is a file, and has .pdf extension
- Check height range (36-96 inches) and weight range (50-500 lbs)
- Add warnings for missing critical data
- Improve user feedback with emojis and clear error messages
- Better output formatting with file descriptions
- Catch and handle PDF reading errors gracefully