mac/bodyspec-insights

Author	SHA1	Message	Date
Mac DeCourcy	130f0ba994	Refactor: implement pattern dictionary for PDF extraction Major improvements to extraction code: PATTERN DICTIONARY APPROACH: - Centralized all extraction patterns into EXTRACTION_PATTERNS dict - Each field now self-documents with description, required status, cast function - Multiple patterns per field with automatic fallback (e.g., summary table → individual field) - Validation built-in: reports missing required vs optional fields NEW FUNCTIONS: - extract_field(): Tries multiple patterns with fallback logic - extract_all_fields(): Extracts all defined fields with validation - Comprehensive docstrings explaining the approach BENEFITS: ✅ Self-documenting - each pattern describes what it extracts ✅ Maintainable - add new fields by adding one dict entry ✅ Robust - automatic fallback if primary pattern fails ✅ Validated - instant feedback on missing required fields ✅ Type-safe - cast functions ensure correct data types TESTING: - All existing tests pass - Single-file mode: ✅ - Batch mode: ✅ - Data extraction: ✅ identical to previous version Code grew by ~160 lines but with significant improvements in: - Readability (clear field definitions) - Maintainability (centralized patterns) - Extensibility (easy to add new fields) - Debuggability (validation reports)	2025-10-06 17:59:27 -07:00
Mac DeCourcy	2c17d86fe7	Refactor: eliminate duplicated code in PDF processing - Remove ~200 lines of duplicate code between single-file and batch processing - Consolidate all PDF processing logic into process_single_pdf() function - Add batch_mode parameter to control output formatting - Single-file and batch modes now use the same code path - Improves maintainability and reduces chance of inconsistencies Net reduction: 202 lines deleted, 56 lines added (-146 lines total)	2025-10-06 17:47:46 -07:00
Mac DeCourcy	37267fbf34	Add lean percentage to regional data - Add LeanPercent column to regional.csv matching BodySpec reports - Calculate lean percentage from lean tissue (excluding BMC) for accuracy - Update JSON output to include lean_percent for each region - Document new column in README - Values now match BodySpec regional reports (e.g., Arms: 73.7%, Legs: 72.7%, Trunk: 64.4%)	2025-10-06 17:44:16 -07:00
Mac DeCourcy	b046af5d25	feat: smart batch processing with skip logic - Change --batch to accept directory instead of glob pattern - Automatically skip already-processed scan dates - Add --force flag to reprocess all files - Fix date extraction regex to parse from client info line - Display helpful tips about skipping/forcing - Better user feedback with skip counts and suggestions Usage: python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results This will process only new scans, skipping any dates already in the output.	2025-10-06 15:33:05 -07:00
Mac DeCourcy	d6793e2572	feat: add comprehensive error handling and validation - Add input validation for PDF files, height, and weight - Validate PDF file exists, is a file, and has .pdf extension - Check height range (36-96 inches) and weight range (50-500 lbs) - Add warnings for missing critical data - Improve user feedback with emojis and clear error messages - Better output formatting with file descriptions - Catch and handle PDF reading errors gracefully	2025-10-06 15:24:11 -07:00
Mac DeCourcy	c7d0255f61	Initial commit: BodySpec Insights - comprehensive DEXA analytics tool	2025-10-06 14:32:25 -07:00

6 commits