Initial commit: BodySpec Insights - comprehensive DEXA analytics tool

This commit is contained in:
Mac DeCourcy 2025-10-06 14:32:25 -07:00
commit c7d0255f61
10 changed files with 907 additions and 0 deletions

14
.gitattributes vendored Normal file
View file

@ -0,0 +1,14 @@
# Normalize line endings
* text=auto
# Python files
*.py text eol=lf
# Shell scripts
*.sh text eol=lf
# Data files
*.csv text eol=lf
*.json text eol=lf
*.md text eol=lf

40
.gitignore vendored Normal file
View file

@ -0,0 +1,40 @@
# Python virtual environment
venv/
env/
ENV/
*.pyc
__pycache__/
*.py[cod]
*$py.class
# PDF files (sensitive health data)
*.pdf
# Results and output files (exclude directories but allow README.md)
dexa_out/
data/pdfs/*.pdf
data/results/*.csv
data/results/*.json
*.csv
*.json
# Exclude generated markdown but keep README files
summary.md
!README.md
# IDE and editor files
.vscode/
.idea/
*.swp
*.swo
*~
.DS_Store
# Logs
*.log
# Distribution / packaging
dist/
build/
*.egg-info/

304
README.md Normal file
View file

@ -0,0 +1,304 @@
# BodySpec Insights
**Body composition analytics for BodySpec DEXA scan PDFs**
A Python tool to extract and analyze body composition data from BodySpec DEXA scan reports. Automatically parses measurements, computes 30+ derived metrics, and tracks your progress over time.
> **Note:** This tool is specifically designed for BodySpec PDF reports and may not work with other DEXA providers (DexaFit, Hologic, etc.).
## Features
- 📊 **Comprehensive Data Extraction**: Body fat %, lean mass, bone density, regional composition, and more
- 🧮 **Derived Metrics**: Automatically calculates FFMI, FMI, LSTI, SMI, and other body composition indices
- 📁 **Multiple Output Formats**: CSV (for spreadsheet analysis), JSON (for programmatic use), and Markdown (for readable summaries)
- 📈 **Time-Series Ready**: Append mode allows tracking progress across multiple scans
- 🎯 **Regional Analysis**: Breaks down composition by Arms, Legs, Trunk, Android, and Gynoid regions
- ⚖️ **Muscle Balance**: Tracks left/right limb symmetry
## Installation
### Prerequisites
- Python 3.7 or higher
- pip (Python package manager)
### Setup
1. **Clone or download this repository**
2. **Create a virtual environment** (recommended):
```bash
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
```
The script requires:
- `pdfplumber` - PDF text extraction
- `pandas` - Data manipulation and CSV handling
## Usage
### Basic Command
```bash
python dexa_extract.py <PDF_PATH> --height-in <HEIGHT> [--weight-lb <WEIGHT>] [--outdir <OUTPUT_DIR>]
```
### Required Arguments
- `PDF_PATH` - Path to your DEXA scan PDF report
- `--height-in` - Your height in inches
### Optional Arguments
- `--weight-lb` - Body weight in pounds (used as fallback if PDF doesn't contain total mass)
- `--outdir` - Output directory for results (default: `dexa_out`)
### Examples
**Single scan:**
```bash
python dexa_extract.py data/pdfs/2025-10-06-scan.pdf --height-in 74 --weight-lb 212 --outdir data/results
```
**Process multiple scans** (appends to existing files):
```bash
python dexa_extract.py data/pdfs/scan-2025-01.pdf --height-in 74 --outdir data/results
python dexa_extract.py data/pdfs/scan-2025-04.pdf --height-in 74 --outdir data/results
python dexa_extract.py data/pdfs/scan-2025-10.pdf --height-in 74 --outdir data/results
```
**Height conversion** (for reference):
- 5'8" = 68 inches
- 5'10" = 70 inches
- 6'0" = 72 inches
- 6'2" = 74 inches
- 6'4" = 76 inches
## Directory Structure
```
bodyspec-insights/
├── dexa_extract.py # Main extraction script
├── requirements.txt # Python dependencies
├── README.md # This file
├── .gitignore # Git ignore patterns
├── data/ # Data directory (gitignored)
│ ├── pdfs/ # Place your BodySpec PDF reports here
│ └── results/ # Results will be saved here
└── venv/ # Virtual environment (gitignored)
```
## Output Files
The script generates 5 files in the specified output directory:
### 1. `overall.csv`
Time-series data with one row per scan. Includes all primary metrics and derived indices.
**Columns:**
- `MeasuredDate` - Scan date (YYYY-MM-DD)
- `Height_in`, `Height_ft_in` - Height measurements
- `Weight_lb_Input`, `DEXA_TotalMass_lb`, `Adjusted_Body_Weight_lb` - Weight data
- `BodyFat_percent`, `LeanMass_percent` - Body composition percentages
- `FatMass_lb`, `LeanSoftTissue_lb`, `BoneMineralContent_lb`, `FatFreeMass_lb` - Mass measurements
- `BMI`, `FFMI`, `FMI`, `LST_Index`, `SMI`, `BMDI` - Normalized indices
- `ALM_lb` - Appendicular lean mass (arms + legs)
- `VAT_Mass_lb`, `VAT_Volume_in3`, `VAT_Index` - Visceral adipose tissue
- `Android_percent`, `Gynoid_percent`, `AG_Ratio` - Fat distribution
- `Trunk_to_Limb_Fat_Ratio` - Central adiposity indicator
- `Arms_Lean_pct`, `Legs_Lean_pct`, `Trunk_Lean_pct` - Regional lean mass distribution
- `Arm_Symmetry_Index`, `Leg_Symmetry_Index` - Left/right balance (50% = perfect)
- `RMR_cal_per_day` - Resting metabolic rate
### 2. `regional.csv`
Regional body composition breakdown (Arms, Legs, Trunk, Android, Gynoid, Total).
**Columns:** Region, FatPercent, TotalMass_lb, FatTissue_lb, LeanTissue_lb, BMC_lb
### 3. `muscle_balance.csv`
Left/right limb comparison for tracking muscle symmetry.
**Regions:** Arms Total, Right Arm, Left Arm, Legs Total, Right Leg, Left Leg
### 4. `overall.json`
Structured JSON format containing all extracted data in a hierarchical format.
**Structure:**
```json
{
"measured_date": "2025-10-06",
"anthropometrics": { ... },
"composition": { ... },
"regional": [ ... ],
"muscle_balance": [ ... ],
"supplemental": { ... },
"bone_density": { ... }
}
```
### 5. `summary.md`
Human-readable Markdown summary of the scan results.
## Extracted Metrics
### Primary Measurements
- **Body Fat %** - Percentage of body weight that is fat
- **Lean Mass %** - Percentage of body weight that is lean tissue (complement of body fat %)
- **Fat Mass** - Total weight of fat tissue
- **Lean Soft Tissue** - Muscle, organs, and other non-bone lean tissue
- **Bone Mineral Content (BMC)** - Total bone mineral weight
- **Fat-Free Mass** - Total body weight minus fat mass
### Derived Indices (Height-Normalized)
- **BMI** - Body Mass Index (standard weight-to-height ratio)
- **FFMI** - Fat-Free Mass Index (normalized muscle mass)
- **FMI** - Fat Mass Index (normalized fat mass)
- **LSTI** - Lean Soft Tissue Index (height-adjusted lean tissue)
- **SMI** - Skeletal Muscle Index (height-adjusted appendicular lean mass)
- **BMDI** - Bone Mineral Density Index (height-adjusted bone content)
- **VAT Index** - Visceral fat normalized by height
### Regional Analysis
- **Android** - Abdominal/trunk fat (higher risk area)
- **Gynoid** - Hip/thigh fat (lower risk area)
- **A/G Ratio** - Android-to-Gynoid ratio (cardiovascular risk indicator)
- **Trunk-to-Limb Fat Ratio** - Ratio of trunk fat to limb fat (central adiposity indicator)
- **Lean Mass Distribution** - Percentage of total lean mass in arms, legs, and trunk
### Symmetry & Balance
- **Arm Symmetry Index** - Right-to-left arm lean mass balance (50% = perfect symmetry)
- **Leg Symmetry Index** - Right-to-left leg lean mass balance (50% = perfect symmetry)
### Supplemental
- **VAT (Visceral Adipose Tissue)** - Deep abdominal fat around organs
- **RMR (Resting Metabolic Rate)** - Estimated daily calorie burn at rest
- **Adjusted Body Weight** - Clinical weight used for medication dosing and nutrition calculations
- **Bone Density** - BMD (g/cm²), T-score, Z-score
## Understanding Your Results
### Body Fat % Ranges (by age and sex)
**Men:**
- Athletes: 6-13%
- Fitness: 14-17%
- Average: 18-24%
- Above Average: 25%+
**Women:**
- Athletes: 14-20%
- Fitness: 21-24%
- Average: 25-31%
- Above Average: 32%+
### FFMI (Fat-Free Mass Index)
Normalized measure of muscle mass:
- **16-17**: Below average
- **18-20**: Average/athletic
- **21-23**: Above average/very muscular
- **24-25**: Elite natural bodybuilder range
- **26+**: Typically requires enhanced training
### A/G Ratio (Android/Gynoid Ratio)
Fat distribution indicator:
- **< 1.0**: Lower risk (more fat in hips/thighs)
- **1.0-1.5**: Moderate
- **> 1.5**: Higher risk (more abdominal fat)
### Trunk-to-Limb Fat Ratio
Central adiposity indicator:
- **< 1.0**: More peripheral fat distribution (healthier)
- **1.0-1.5**: Moderate central fat
- **> 1.5**: High central fat (increased health risk)
### Symmetry Indices
Muscle balance between left and right sides:
- **50%**: Perfect symmetry
- **48-52%**: Normal range (slight asymmetry is common)
- **< 48% or > 52%**: Notable imbalance (may indicate injury, overuse, or compensation patterns)
### VAT Index
Visceral fat normalized by height:
- **< 0.30**: Low visceral fat
- **0.30-0.50**: Moderate
- **> 0.50**: High (increased metabolic risk)
### Lean Mass Distribution
Typical ranges for lean tissue distribution:
- **Arms**: 13-16% of total lean mass
- **Legs**: 32-38% of total lean mass
- **Trunk**: 46-54% of total lean mass
Higher trunk percentage may indicate good core development, while higher leg percentage suggests strong lower body development.
## Tracking Progress
The script appends data to existing CSV files, making it easy to track changes over time:
1. Place all your DEXA PDFs in `data/pdfs/`
2. Process each one with the same output directory
3. Open `overall.csv` in Excel/Google Sheets to visualize trends
4. Compare `muscle_balance.csv` to track left/right symmetry improvements
## Privacy & Security
⚠️ **Important:** DEXA reports contain personal health information (PHI).
- All PDF files and results are excluded from git via `.gitignore`
- Keep your `data/` directory private
- Don't commit PDFs or output files to version control
- Consider encrypting your data directory if sharing the repository
## Troubleshooting
### "Total mass is missing" error
- Ensure your PDF contains a SUMMARY RESULTS table
- Provide `--weight-lb` as a fallback
### No data extracted or null values
- **Verify your PDF is from BodySpec** - This tool only works with BodySpec reports
- Ensure the PDF is text-based, not a scanned image
- Check that your BodySpec report includes the "SUMMARY RESULTS" table
- Open an issue with a sample (redacted) PDF for support
### Import errors
- Ensure virtual environment is activated: `source venv/bin/activate`
- Reinstall dependencies: `pip install -r requirements.txt`
## Contributing
Contributions welcome! Areas for improvement:
- [ ] Enhanced error handling and validation
- [ ] Automatic height detection from PDF
- [ ] Data visualization/plotting features
- [ ] GUI interface for non-technical users
- [ ] Batch processing multiple PDFs at once
- [ ] Export to additional formats (Excel, SQLite, etc.)
## License
MIT License - feel free to use and modify for personal or commercial use.
## Acknowledgments
Built for personal body composition tracking with BodySpec scans. Thanks to BodySpec for providing detailed, consistent DEXA scan reports that make automated analysis possible.
**Disclaimer:** This is an unofficial, independent tool and is not affiliated with or endorsed by BodySpec.
---
**Questions or issues?** Open an issue on GitHub or contact the maintainer.

3
data/.gitkeep Normal file
View file

@ -0,0 +1,3 @@
# This file ensures the data directory structure is preserved in git
# while keeping the actual PDF and results files private (see .gitignore)

0
data/pdfs/.gitkeep Normal file
View file

17
data/pdfs/README.md Normal file
View file

@ -0,0 +1,17 @@
# PDFs Directory
Place your BodySpec DEXA scan PDF reports in this directory.
## Example
```
data/pdfs/
├── 2025-01-15-scan.pdf
├── 2025-04-20-scan.pdf
└── 2025-10-06-scan.pdf
```
## Note
⚠️ **PDF files are gitignored** - They won't be committed to version control to protect your personal health information.

0
data/results/.gitkeep Normal file
View file

18
data/results/README.md Normal file
View file

@ -0,0 +1,18 @@
# Results Directory
Your extracted DEXA data will be saved here by default.
## Output Files
When you run the extraction script with `--outdir data/results`, you'll get:
- `overall.csv` - Time-series data (one row per scan)
- `regional.csv` - Regional body composition
- `muscle_balance.csv` - Left/right limb comparison
- `overall.json` - Structured JSON format
- `summary.md` - Human-readable summary
## Note
⚠️ **Result files are gitignored** - They contain your personal health data and won't be committed to version control.

497
dexa_extract.py Normal file
View file

@ -0,0 +1,497 @@
#!/usr/bin/env python3
"""
BodySpec Insights - Body composition analytics for BodySpec DEXA scan PDFs
Extract measurements from BodySpec DEXA reports, compute 30+ derived metrics,
and output structured data for progress tracking.
Usage:
python dexa_extract.py /path/to/bodyspec-report.pdf --height-in 74 --weight-lb 212 --outdir ./data/results
Note: This script is specifically designed for BodySpec PDF reports.
Requires:
pip install pdfplumber pandas
"""
import argparse
import json
import math
import os
import re
from datetime import datetime
import pdfplumber
import pandas as pd
def read_pdf_text(pdf_path):
with pdfplumber.open(pdf_path) as pdf:
pages_text = [page.extract_text() or "" for page in pdf.pages]
return "\n".join(pages_text)
def find_one(pattern, text, cast=float, flags=re.IGNORECASE):
m = re.search(pattern, text, flags)
if not m:
return None
val = m.group(1).replace(",", "").strip()
return cast(val) if cast else val
def convert_date_to_iso(date_str):
"""Convert MM/DD/YYYY to YYYY-MM-DD"""
if not date_str:
return None
try:
dt = datetime.strptime(date_str, "%m/%d/%Y")
return dt.strftime("%Y-%m-%d")
except:
return date_str
def inches_to_ft_in(inches):
"""Convert inches to feet'inches" format"""
if inches is None:
return None
feet = int(inches // 12)
remaining_inches = int(inches % 12)
return f"{feet}'{remaining_inches}\""
def parse_regional_table(text):
regions = ["Arms", "Legs", "Trunk", "Android", "Gynoid", "Total"]
out = {}
for r in regions:
# Example line: Arms 22.1% 27.4 6.0 20.2 1.1
pattern = rf"{r}\s+([\d\.]+)%\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)"
m = re.search(pattern, text)
if m:
out[r] = {
"fat_percent": float(m.group(1)),
"total_mass_lb": float(m.group(2)),
"fat_tissue_lb": float(m.group(3)),
"lean_tissue_lb": float(m.group(4)),
"bmc_lb": float(m.group(5)),
}
return out
def parse_muscle_balance(text):
names = ["Arms Total", "Right Arm", "Left Arm", "Legs Total", "Right Leg", "Left Leg"]
out = {}
for n in names:
# Example: Right Arm 20.4 13.7 2.8 10.3 0.6
pattern = rf"{n}\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)"
m = re.search(pattern, text)
if m:
out[n] = {
"fat_percent": float(m.group(1)),
"total_mass_lb": float(m.group(2)),
"fat_mass_lb": float(m.group(3)),
"lean_mass_lb": float(m.group(4)),
"bmc_lb": float(m.group(5)),
}
return out
def parse_bone_density_total(text):
# Example: Total 1.280 0.8 0.8
m = re.search(r"Total\s+([\d\.]+)\s+([-\d\.]+)\s+([-\d\.]+)", text)
if m:
return {
"total_bmd_g_per_cm2": float(m.group(1)),
"young_adult_t_score": float(m.group(2)),
"age_matched_z_score": float(m.group(3)),
}
return {}
def parse_dexa_pdf(pdf_path):
text = read_pdf_text(pdf_path)
data = {}
data["measured_date"] = find_one(r"Measured Date\s+([\d/]+)", text, cast=str)
# First try to extract from SUMMARY RESULTS table (more reliable)
# Pattern: 10/6/2025 27.8% 211.6 58.8 145.4 7.4
summary_pattern = r"(\d{1,2}/\d{1,2}/\d{4})\s+([\d\.]+)%\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)\s+([\d\.]+)"
summary_match = re.search(summary_pattern, text)
if summary_match:
data["body_fat_percent"] = float(summary_match.group(2))
data["total_mass_lb"] = float(summary_match.group(3))
data["fat_mass_lb"] = float(summary_match.group(4))
data["lean_soft_tissue_lb"] = float(summary_match.group(5))
data["bmc_lb"] = float(summary_match.group(6))
else:
# Fallback to individual patterns
data["body_fat_percent"] = find_one(r"Total Body Fat %\s+([\d\.]+)", text)
data["total_mass_lb"] = find_one(r"Total Mass.*?\(lbs\)\s+([\d\.]+)", text)
data["fat_mass_lb"] = find_one(r"Fat Tissue \(lbs\)\s+([\d\.]+)", text)
data["lean_soft_tissue_lb"] = find_one(r"Lean Tissue \(lbs\)\s+([\d\.]+)", text)
data["bmc_lb"] = find_one(r"Bone Mineral\s+Content \(BMC\)\s+([\d\.]+)", text)
# Supplemental
data["android_percent"] = find_one(r"Android.*?([\d\.]+)%", text)
data["gynoid_percent"] = find_one(r"Gynoid.*?([\d\.]+)%", text)
data["rmr_cal_per_day"] = find_one(r"([\d,]+)\s*cal/day", text, cast=lambda s: int(s.replace(",", "")))
# A/G Ratio appears after RMR, Android%, Gynoid% on same line: "1,778 cal/day 36.5% 27.8% 1.31"
ag_match = re.search(r"[\d,]+\s*cal/day\s+([\d\.]+)%\s+([\d\.]+)%\s+([\d\.]+)", text)
if ag_match:
data["ag_ratio"] = float(ag_match.group(3))
else:
data["ag_ratio"] = find_one(r"A/G Ratio\s+([\d\.]+)", text)
data["vat_mass_lb"] = find_one(r"Mass \(lbs\)\s+([\d\.]+)", text)
data["vat_volume_in3"] = find_one(r"Volume \(in3\)\s+([\d\.]+)", text)
# Tables
data["regional"] = parse_regional_table(text)
data["muscle_balance"] = parse_muscle_balance(text)
data["bone_density"] = parse_bone_density_total(text)
return data
def compute_derived(d, height_in, weight_lb=None):
# Prefer DEXA total mass if available
total_mass = d.get("total_mass_lb") or weight_lb
if total_mass is None:
raise ValueError("Total mass is missing; pass --weight-lb if the PDF lacks it.")
fm = d.get("fat_mass_lb")
lst = d.get("lean_soft_tissue_lb")
bmc = d.get("bmc_lb")
bf_pct = d.get("body_fat_percent")
ffm = None
if fm is not None:
ffm = total_mass - fm
elif lst is not None and bmc is not None:
ffm = lst + bmc
def idx(value_lb):
return round(703.0 * value_lb / (height_in ** 2), 2)
derived = {
"height_in": height_in,
"height_ft_in": inches_to_ft_in(height_in),
"weight_input_lb": weight_lb,
"bmi": round(703.0 * total_mass / (height_in ** 2), 1),
"fat_free_mass_lb": round(ffm, 1) if ffm is not None else None,
"ffmi": idx(ffm) if ffm is not None else None,
"fmi": idx(fm) if fm is not None else None,
"lsti": idx(lst) if lst is not None else None,
"alm_lb": None,
"smi": None,
}
# Lean mass percentage (complement of body fat %)
if bf_pct is not None:
derived["lean_mass_percent"] = round(100 - bf_pct, 1)
else:
derived["lean_mass_percent"] = None
# ALM from regional lean masses
arms_lean = d.get("regional", {}).get("Arms", {}).get("lean_tissue_lb")
legs_lean = d.get("regional", {}).get("Legs", {}).get("lean_tissue_lb")
trunk_lean = d.get("regional", {}).get("Trunk", {}).get("lean_tissue_lb")
if arms_lean is not None and legs_lean is not None:
alm = arms_lean + legs_lean
derived["alm_lb"] = round(alm, 1)
derived["smi"] = idx(alm)
# Regional lean mass distribution
if lst is not None and arms_lean is not None and legs_lean is not None and trunk_lean is not None:
derived["arms_lean_pct"] = round(100 * arms_lean / lst, 1)
derived["legs_lean_pct"] = round(100 * legs_lean / lst, 1)
derived["trunk_lean_pct"] = round(100 * trunk_lean / lst, 1)
else:
derived["arms_lean_pct"] = None
derived["legs_lean_pct"] = None
derived["trunk_lean_pct"] = None
# Trunk-to-limb fat ratio (health risk indicator)
trunk_fat = d.get("regional", {}).get("Trunk", {}).get("fat_tissue_lb")
arms_fat = d.get("regional", {}).get("Arms", {}).get("fat_tissue_lb")
legs_fat = d.get("regional", {}).get("Legs", {}).get("fat_tissue_lb")
if trunk_fat is not None and arms_fat is not None and legs_fat is not None:
limb_fat = arms_fat + legs_fat
if limb_fat > 0:
derived["trunk_to_limb_fat_ratio"] = round(trunk_fat / limb_fat, 2)
else:
derived["trunk_to_limb_fat_ratio"] = None
else:
derived["trunk_to_limb_fat_ratio"] = None
# Limb symmetry indices (balance indicators)
mb = d.get("muscle_balance", {})
right_arm = mb.get("Right Arm", {}).get("lean_mass_lb")
left_arm = mb.get("Left Arm", {}).get("lean_mass_lb")
right_leg = mb.get("Right Leg", {}).get("lean_mass_lb")
left_leg = mb.get("Left Leg", {}).get("lean_mass_lb")
if right_arm is not None and left_arm is not None and right_arm + left_arm > 0:
# Symmetry: 100 = perfect, <100 = left stronger, >100 = right stronger
derived["arm_symmetry_index"] = round(100 * right_arm / (right_arm + left_arm), 1)
else:
derived["arm_symmetry_index"] = None
if right_leg is not None and left_leg is not None and right_leg + left_leg > 0:
derived["leg_symmetry_index"] = round(100 * right_leg / (right_leg + left_leg), 1)
else:
derived["leg_symmetry_index"] = None
# VAT Index (normalized by height squared, like BMI)
vat_mass = d.get("vat_mass_lb")
if vat_mass is not None:
derived["vat_index"] = idx(vat_mass)
else:
derived["vat_index"] = None
# Bone Mineral Density Index (BMC normalized by height)
if bmc is not None:
derived["bmdi"] = idx(bmc)
else:
derived["bmdi"] = None
# Adjusted Body Weight (used in nutrition/health calculations)
# ABW = IBW + 0.4 * (actual weight - IBW), where IBW differs by sex
# For simplicity, using a unisex approximation: IBW ≈ height_in * 2.3 - 100 (rough estimate)
if total_mass is not None:
ibw_estimate = height_in * 2.3 - 100
if total_mass > ibw_estimate:
derived["adjusted_body_weight_lb"] = round(ibw_estimate + 0.4 * (total_mass - ibw_estimate), 1)
else:
derived["adjusted_body_weight_lb"] = round(total_mass, 1)
else:
derived["adjusted_body_weight_lb"] = None
return total_mass, derived
def ensure_outdir(outdir):
os.makedirs(outdir, exist_ok=True)
def write_or_append_csv(path, row_dict, columns):
df_row = pd.DataFrame([{k: row_dict.get(k) for k in columns}])
if os.path.exists(path):
df_row.to_csv(path, mode="a", header=False, index=False)
else:
df_row.to_csv(path, index=False)
def write_or_append_json(path, obj):
if os.path.exists(path):
with open(path, "r") as f:
try:
data = json.load(f)
except json.JSONDecodeError:
data = []
else:
data = []
if isinstance(data, dict):
# convert to list of entries if previous file was a single dict
data = [data]
data.append(obj)
with open(path, "w") as f:
json.dump(data, f, indent=2)
def append_markdown(path, md_text):
mode = "a" if os.path.exists(path) else "w"
with open(path, mode) as f:
f.write(md_text.strip() + "\n\n")
def make_markdown(measured_date, d, derived, total_mass):
lines = []
lines.append(f"# DEXA Summary — {measured_date}")
lines.append("")
lines.append(f"- Height: {derived['height_in']} in")
lines.append(f"- Weight: {round(total_mass, 1)} lb")
if d.get("body_fat_percent") is not None and d.get("fat_mass_lb") is not None:
lines.append(f"- Body fat: {d['body_fat_percent']}% ({d['fat_mass_lb']} lb)")
if d.get("lean_soft_tissue_lb") is not None:
lines.append(f"- Lean soft tissue: {d['lean_soft_tissue_lb']} lb")
if d.get("bmc_lb") is not None:
lines.append(f"- Bone mineral content: {d['bmc_lb']} lb")
lines.append(f"- Fatfree mass: {derived.get('fat_free_mass_lb')}")
lines.append(f"- BMI: {derived['bmi']}")
lines.append(f"- FFMI: {derived.get('ffmi')}; FMI: {derived.get('fmi')}; Lean Soft Tissue Index: {derived.get('lsti')}")
if derived.get("alm_lb") is not None:
lines.append(f"- Appendicular Lean Mass: {derived['alm_lb']} lb; Skeletal Muscle Index: {derived['smi']}")
if d.get("android_percent") is not None and d.get("gynoid_percent") is not None and d.get("ag_ratio") is not None:
lines.append(f"- Android: {d['android_percent']}%; Gynoid: {d['gynoid_percent']}%; A/G ratio: {d['ag_ratio']}")
if d.get("vat_mass_lb") is not None and d.get("vat_volume_in3") is not None:
lines.append(f"- VAT: {d['vat_mass_lb']} lb ({d['vat_volume_in3']} in³)")
if d.get("rmr_cal_per_day") is not None:
lines.append(f"- RMR: {d['rmr_cal_per_day']} cal/day")
lines.append("")
lines.append("## Regional")
for name, r in d.get("regional", {}).items():
lines.append(f"- {name}: {r['fat_percent']}% fat; {r['total_mass_lb']} lb total; {r['fat_tissue_lb']} lb fat; {r['lean_tissue_lb']} lb lean; {r['bmc_lb']} lb BMC")
return "\n".join(lines)
def main():
ap = argparse.ArgumentParser()
ap.add_argument("pdf", help="Path to DEXA report PDF")
ap.add_argument("--height-in", type=float, required=True, help="Height in inches (Imperial)")
ap.add_argument("--weight-lb", type=float, help="Body weight in lb (optional; used if DEXA total mass missing)")
ap.add_argument("--outdir", default="dexa_out", help="Output directory")
args = ap.parse_args()
ensure_outdir(args.outdir)
d = parse_dexa_pdf(args.pdf)
measured_date_raw = d.get("measured_date") or datetime.now().strftime("%m/%d/%Y")
measured_date = convert_date_to_iso(measured_date_raw)
total_mass, derived = compute_derived(d, height_in=args.height_in, weight_lb=args.weight_lb)
# Overall CSV row
overall_cols = [
"MeasuredDate","Height_in","Height_ft_in","Weight_lb_Input","DEXA_TotalMass_lb","BodyFat_percent",
"LeanMass_percent","FatMass_lb","LeanSoftTissue_lb","BoneMineralContent_lb","FatFreeMass_lb",
"BMI","FFMI","FMI","LST_Index","ALM_lb","SMI","VAT_Mass_lb","VAT_Volume_in3","VAT_Index",
"BMDI","Android_percent","Gynoid_percent","AG_Ratio","Trunk_to_Limb_Fat_Ratio",
"Arms_Lean_pct","Legs_Lean_pct","Trunk_Lean_pct","Arm_Symmetry_Index","Leg_Symmetry_Index",
"Adjusted_Body_Weight_lb","RMR_cal_per_day"
]
overall_row = {
"MeasuredDate": measured_date,
"Height_in": derived["height_in"],
"Height_ft_in": derived["height_ft_in"],
"Weight_lb_Input": derived["weight_input_lb"],
"DEXA_TotalMass_lb": round(total_mass, 1),
"BodyFat_percent": d.get("body_fat_percent"),
"LeanMass_percent": derived.get("lean_mass_percent"),
"FatMass_lb": d.get("fat_mass_lb"),
"LeanSoftTissue_lb": d.get("lean_soft_tissue_lb"),
"BoneMineralContent_lb": d.get("bmc_lb"),
"FatFreeMass_lb": derived.get("fat_free_mass_lb"),
"BMI": derived["bmi"],
"FFMI": derived.get("ffmi"),
"FMI": derived.get("fmi"),
"LST_Index": derived.get("lsti"),
"ALM_lb": derived.get("alm_lb"),
"SMI": derived.get("smi"),
"VAT_Mass_lb": d.get("vat_mass_lb"),
"VAT_Volume_in3": d.get("vat_volume_in3"),
"VAT_Index": derived.get("vat_index"),
"BMDI": derived.get("bmdi"),
"Android_percent": d.get("android_percent"),
"Gynoid_percent": d.get("gynoid_percent"),
"AG_Ratio": d.get("ag_ratio"),
"Trunk_to_Limb_Fat_Ratio": derived.get("trunk_to_limb_fat_ratio"),
"Arms_Lean_pct": derived.get("arms_lean_pct"),
"Legs_Lean_pct": derived.get("legs_lean_pct"),
"Trunk_Lean_pct": derived.get("trunk_lean_pct"),
"Arm_Symmetry_Index": derived.get("arm_symmetry_index"),
"Leg_Symmetry_Index": derived.get("leg_symmetry_index"),
"Adjusted_Body_Weight_lb": derived.get("adjusted_body_weight_lb"),
"RMR_cal_per_day": d.get("rmr_cal_per_day"),
}
write_or_append_csv(os.path.join(args.outdir, "overall.csv"), overall_row, overall_cols)
# Regional table
regional_cols = ["Region","FatPercent","TotalMass_lb","FatTissue_lb","LeanTissue_lb","BMC_lb"]
reg_rows = []
for name, r in d.get("regional", {}).items():
reg_rows.append({
"Region": name,
"FatPercent": r["fat_percent"],
"TotalMass_lb": r["total_mass_lb"],
"FatTissue_lb": r["fat_tissue_lb"],
"LeanTissue_lb": r["lean_tissue_lb"],
"BMC_lb": r["bmc_lb"],
})
regional_path = os.path.join(args.outdir, "regional.csv")
if os.path.exists(regional_path):
pd.DataFrame(reg_rows).to_csv(regional_path, mode="a", header=False, index=False)
else:
pd.DataFrame(reg_rows).to_csv(regional_path, index=False)
# Muscle balance
mb_cols = ["Region","FatPercent","TotalMass_lb","FatMass_lb","LeanMass_lb","BMC_lb"]
mb_rows = []
for name, r in d.get("muscle_balance", {}).items():
mb_rows.append({
"Region": name,
"FatPercent": r["fat_percent"],
"TotalMass_lb": r["total_mass_lb"],
"FatMass_lb": r["fat_mass_lb"],
"LeanMass_lb": r["lean_mass_lb"],
"BMC_lb": r["bmc_lb"],
})
mb_path = os.path.join(args.outdir, "muscle_balance.csv")
if os.path.exists(mb_path):
pd.DataFrame(mb_rows).to_csv(mb_path, mode="a", header=False, index=False)
else:
pd.DataFrame(mb_rows).to_csv(mb_path, index=False)
# JSON (overall structured object)
# Convert regional and muscle_balance dicts to arrays
regional_array = [
{"region": name, **data}
for name, data in d.get("regional", {}).items()
]
muscle_balance_array = [
{"region": name, **data}
for name, data in d.get("muscle_balance", {}).items()
]
overall_json = {
"measured_date": measured_date,
"anthropometrics": {
"height_in": derived["height_in"],
"height_ft_in": derived["height_ft_in"],
"weight_input_lb": derived["weight_input_lb"],
"dexa_total_mass_lb": round(total_mass, 1),
"adjusted_body_weight_lb": derived.get("adjusted_body_weight_lb"),
"bmi": derived["bmi"]
},
"composition": {
"body_fat_percent": d.get("body_fat_percent"),
"lean_mass_percent": derived.get("lean_mass_percent"),
"fat_mass_lb": d.get("fat_mass_lb"),
"lean_soft_tissue_lb": d.get("lean_soft_tissue_lb"),
"bone_mineral_content_lb": d.get("bmc_lb"),
"fat_free_mass_lb": derived.get("fat_free_mass_lb"),
"derived_indices": {
"ffmi": derived.get("ffmi"),
"fmi": derived.get("fmi"),
"lsti": derived.get("lsti"),
"alm_lb": derived.get("alm_lb"),
"smi": derived.get("smi"),
"bmdi": derived.get("bmdi")
}
},
"regional": regional_array,
"regional_analysis": {
"trunk_to_limb_fat_ratio": derived.get("trunk_to_limb_fat_ratio"),
"lean_mass_distribution": {
"arms_percent": derived.get("arms_lean_pct"),
"legs_percent": derived.get("legs_lean_pct"),
"trunk_percent": derived.get("trunk_lean_pct")
}
},
"muscle_balance": muscle_balance_array,
"symmetry_indices": {
"arm_symmetry_index": derived.get("arm_symmetry_index"),
"leg_symmetry_index": derived.get("leg_symmetry_index")
},
"supplemental": {
"android_percent": d.get("android_percent"),
"gynoid_percent": d.get("gynoid_percent"),
"ag_ratio": d.get("ag_ratio"),
"vat": {
"mass_lb": d.get("vat_mass_lb"),
"volume_in3": d.get("vat_volume_in3"),
"vat_index": derived.get("vat_index")
},
"rmr_cal_per_day": d.get("rmr_cal_per_day")
},
"bone_density": d.get("bone_density", {})
}
write_or_append_json(os.path.join(args.outdir, "overall.json"), overall_json)
# Markdown summary (append)
md_text = make_markdown(measured_date, d, derived, total_mass)
append_markdown(os.path.join(args.outdir, "summary.md"), md_text)
print(f"Wrote files to: {args.outdir}")
print("Files: overall.csv, regional.csv, muscle_balance.csv, overall.json, summary.md")
if __name__ == "__main__":
main()

14
requirements.txt Normal file
View file

@ -0,0 +1,14 @@
cffi==2.0.0
charset-normalizer==3.4.3
cryptography==46.0.2
numpy==2.3.3
pandas==2.3.3
pdfminer.six==20250506
pdfplumber==0.11.7
pillow==11.3.0
pycparser==2.23
pypdfium2==4.30.0
python-dateutil==2.9.0.post0
pytz==2025.2
six==1.17.0
tzdata==2025.2