Commit graph

7 commits

Author SHA1 Message Date
Mac DeCourcy
c7d2e32d7d Add MeasuredDate column to all CSV files
CHANGES:
- Added MeasuredDate as first column in regional.csv
- Added MeasuredDate as first column in muscle_balance.csv
- Updated README to document new column structure

BENEFITS:
 Track regional changes over time (e.g., Arms fat % across scans)
 Easy time-series analysis with pandas/Excel
 Filter by date range for progress tracking
 Consistent date column across all 3 CSV files
 Enables queries like: 'Show me trunk fat % over last 6 months'

EXAMPLE USAGE:
  import pandas as pd
  regional = pd.read_csv('regional.csv')
  arms = regional[regional['Region'] == 'Arms']
  # Now you can track Arms progress over time!

Each scan now adds:
- 1 row to overall.csv
- 6 rows to regional.csv (one per region)
- 6 rows to muscle_balance.csv (one per limb comparison)
2025-10-07 15:24:29 -07:00
Mac DeCourcy
130f0ba994 Refactor: implement pattern dictionary for PDF extraction
Major improvements to extraction code:

PATTERN DICTIONARY APPROACH:
- Centralized all extraction patterns into EXTRACTION_PATTERNS dict
- Each field now self-documents with description, required status, cast function
- Multiple patterns per field with automatic fallback (e.g., summary table → individual field)
- Validation built-in: reports missing required vs optional fields

NEW FUNCTIONS:
- extract_field(): Tries multiple patterns with fallback logic
- extract_all_fields(): Extracts all defined fields with validation
- Comprehensive docstrings explaining the approach

BENEFITS:
 Self-documenting - each pattern describes what it extracts
 Maintainable - add new fields by adding one dict entry
 Robust - automatic fallback if primary pattern fails
 Validated - instant feedback on missing required fields
 Type-safe - cast functions ensure correct data types

TESTING:
- All existing tests pass
- Single-file mode: 
- Batch mode: 
- Data extraction:  identical to previous version

Code grew by ~160 lines but with significant improvements in:
- Readability (clear field definitions)
- Maintainability (centralized patterns)
- Extensibility (easy to add new fields)
- Debuggability (validation reports)
2025-10-06 17:59:27 -07:00
Mac DeCourcy
2c17d86fe7 Refactor: eliminate duplicated code in PDF processing
- Remove ~200 lines of duplicate code between single-file and batch processing
- Consolidate all PDF processing logic into process_single_pdf() function
- Add batch_mode parameter to control output formatting
- Single-file and batch modes now use the same code path
- Improves maintainability and reduces chance of inconsistencies

Net reduction: 202 lines deleted, 56 lines added (-146 lines total)
2025-10-06 17:47:46 -07:00
Mac DeCourcy
37267fbf34 Add lean percentage to regional data
- Add LeanPercent column to regional.csv matching BodySpec reports
- Calculate lean percentage from lean tissue (excluding BMC) for accuracy
- Update JSON output to include lean_percent for each region
- Document new column in README
- Values now match BodySpec regional reports (e.g., Arms: 73.7%, Legs: 72.7%, Trunk: 64.4%)
2025-10-06 17:44:16 -07:00
Mac DeCourcy
b046af5d25 feat: smart batch processing with skip logic
- Change --batch to accept directory instead of glob pattern
- Automatically skip already-processed scan dates
- Add --force flag to reprocess all files
- Fix date extraction regex to parse from client info line
- Display helpful tips about skipping/forcing
- Better user feedback with skip counts and suggestions

Usage:
  python dexa_extract.py --batch data/pdfs --height-in 74 --outdir data/results

This will process only new scans, skipping any dates already in the output.
2025-10-06 15:33:05 -07:00
Mac DeCourcy
d6793e2572 feat: add comprehensive error handling and validation
- Add input validation for PDF files, height, and weight
- Validate PDF file exists, is a file, and has .pdf extension
- Check height range (36-96 inches) and weight range (50-500 lbs)
- Add warnings for missing critical data
- Improve user feedback with emojis and clear error messages
- Better output formatting with file descriptions
- Catch and handle PDF reading errors gracefully
2025-10-06 15:24:11 -07:00
Mac DeCourcy
c7d0255f61 Initial commit: BodySpec Insights - comprehensive DEXA analytics tool 2025-10-06 14:32:25 -07:00