Eylea Data Analysis Module
Analysis of intravitreal Eylea injection treatment patterns and outcomes.
This module provides tools for analyzing real-world data from intravitreal Eylea injections to derive parameters for simulation models. The analysis includes:
Data loading, cleaning and validation
Patient cohort characterization
Injection interval analysis
Visual acuity trajectory analysis
Treatment course identification
Data visualization and export
Key Features
Robust data validation with flexible column name mapping
Comprehensive data quality reporting
Detailed analysis of treatment intervals and patterns
Visual acuity trajectory modeling
Automated visualization generation
Multiple export formats (CSV, SQLite)
Classes
EyleaDataAnalyzer : Main analysis class implementing the full analysis pipeline
Examples
>>> analyzer = EyleaDataAnalyzer('input_data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")
- class analysis.eylea_data_analysis.EyleaDataAnalyzer(data_path, output_dir=None)[source]
Bases:
objectAnalyze Eylea treatment data to derive simulation parameters.
This class implements a comprehensive analysis pipeline for real-world Eylea treatment data, including data loading, cleaning, analysis, visualization and export.
- Parameters:
data_path (str) – Path to CSV file containing Eylea treatment records
output_dir (str, optional) – Directory to save analysis outputs (default creates ‘output’ directory)
- Variables:
data (pandas.DataFrame) – The loaded and processed treatment data
patient_data (pandas.DataFrame) – Patient-level analysis results
injection_intervals (pandas.DataFrame) – Injection interval analysis results
va_trajectories (pandas.DataFrame) – Visual acuity trajectory analysis results
treatment_courses (pandas.DataFrame) – Treatment course analysis results
data_quality_report (dict) – Comprehensive data quality assessment
Examples
>>> analyzer = EyleaDataAnalyzer('treatment_data.csv') >>> analyzer.load_data() >>> analyzer.analyze_injection_intervals() >>> analyzer.plot_injection_intervals()
- COLUMN_MAPPINGS = {'Age at Death': ['Age at Death', 'Death Age', 'Age When Deceased', 'Deceased Age'], 'Baseline CRT': ['Baseline CRT', 'BaselineCRT', 'Initial CRT', 'Starting CRT'], 'Baseline VA Letter Score': ['Baseline VA Letter Score', 'Baseline VA', 'BaselineVA', 'Initial VA', 'Starting VA'], 'CRT at Injection': ['CRT at Injection', 'CRT', 'Central Retinal Thickness'], 'Current Age': ['Current Age', 'Age', 'Patient Age'], 'Date of 1st Injection': ['Date of 1st Injection', 'First Injection Date', 'Initial Treatment Date', 'First Treatment Date'], 'Days Since Last Injection': ['Days Since Last Injection', 'Interval', 'Treatment Interval', 'Days_Since_Last', 'Injection Interval'], 'Deceased': ['Deceased', 'Death', 'Mortality'], 'Eye': ['Eye', 'Treated Eye'], 'Gender': ['Gender', 'Sex'], 'Injection Date': ['Injection Date', 'InjectionDate', 'Date of Injection', 'Treatment Date'], 'UUID': ['UUID', 'Patient ID', 'PatientID', 'Patient_ID', 'ID'], 'VA Letter Score at Injection': ['VA Letter Score at Injection', 'VA Score', 'ETDRS Score', 'Visual Acuity', 'VA_Score', 'Letter Score']}
- DATA_VALIDATION = {'Age at Death': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline CRT': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline VA Letter Score': {'max': 100, 'min': 0, 'required': False, 'type': <class 'float'>}, 'CRT at Injection': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Current Age': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Date of 1st Injection': {'required': False, 'type': 'datetime'}, 'Days Since Last Injection': {'max': 365, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Deceased': {'max': 1, 'min': 0, 'required': False, 'type': <class 'int'>}, 'Injection Date': {'required': True, 'type': 'datetime'}, 'UUID': {'required': True, 'type': <class 'str'>}, 'VA Letter Score at Injection': {'max': 100, 'min': 0, 'required': True, 'type': <class 'float'>}}
- __init__(data_path, output_dir=None)[source]
Initialize analyzer with data path and output directory.
- Parameters:
data_path (str) – Path to CSV file containing Eylea treatment records. Expected columns: - Patient identifiers (UUID or similar) - Injection dates - Visual acuity measurements - Other treatment parameters
output_dir (str, optional) – Directory to save analysis outputs. If None, creates ‘output’ directory in current working directory.
Notes
The analyzer is initialized but no data is loaded until load_data() is called. For a complete analysis pipeline, use run_analysis() which handles all steps.
- load_data()[source]
Load, validate and clean Eylea treatment data.
- Returns:
The loaded, validated and cleaned data
- Return type:
pandas.DataFrame
- Raises:
ValueError – If required columns are missing or data validation fails
IOError – If the data file cannot be read
Notes
Processing steps: 1. Load CSV file from data_path 2. Map column names to standardized format 3. Validate data types and ranges 4. Clean missing values and outliers 5. Generate comprehensive data quality report
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> data = analyzer.load_data() >>> print(f"Loaded {len(data)} records")
- map_column_names()[source]
Map variant column names to standardized names.
- Returns:
Modifies the data attribute in-place with standardized column names
- Return type:
None
Notes
Uses both exact and fuzzy matching to handle common column name variations. Mappings are defined in the COLUMN_MAPPINGS class attribute.
The mapping process: 1. Attempts exact matches for each standard column name 2. Falls back to fuzzy matching (case/space insensitive) 3. Preserves unmapped columns unchanged
Results are stored in: - column_mapping_used attribute - data_quality_report[‘column_mapping’]
- validate_data_structure()[source]
Validate data structure, types and integrity.
- Returns:
Modifies data in-place with validated/converted values
- Return type:
None
- Raises:
ValueError – If required columns are missing or critical validation fails
Notes
Validation checks: 1. Required columns (per DATA_VALIDATION) 2. Date format conversion 3. Numeric value ranges 4. Temporal sequence integrity 5. Deceased status consistency 6. Duplicate records
Stores results in data_quality_report including: - validation_errors: Critical issues - validation_warnings: Non-critical issues - temporal_anomalies: Sequence errors - outliers: Values outside expected ranges
- clean_data()[source]
Clean and preprocess the Eylea treatment data.
- Returns:
Modifies the data attribute in-place with cleaned/preprocessed values
- Return type:
None
Notes
Performs the following cleaning operations: 1. Handles missing values in critical fields 2. Cleans Visual Acuity measurements (clipping, outlier detection) 3. Handles temporal anomalies (out-of-sequence dates, long gaps) 4. Creates unique patient and eye identifiers 5. Calculates derived fields (adjusted age, days since last injection)
Results are tracked in: - data_quality_report[‘missing_values’] - data_quality_report[‘outliers’] - data_quality_report[‘temporal_anomalies’]
- handle_missing_values()[source]
Handle missing values in the dataset.
- Returns:
Modifies data in-place with imputed values where appropriate
- Return type:
None
Notes
Missing value handling strategies: 1. Baseline VA: Uses first available VA measurement if missing 2. Age data: Different handling for deceased vs living patients 3. Current age: Adds 0.5 years to account for temporal alignment 4. Injection intervals: Calculates from dates if missing
Tracks missing values in: - data_quality_report[‘missing_values_before’] - data_quality_report[‘missing_values_after’]
- clean_va_measurements()[source]
Clean and validate Visual Acuity measurements.
- Returns:
Modifies VA measurements in-place with cleaned values
- Return type:
None
Notes
Cleaning steps: 1. Clips VA values to valid range [0, 100] 2. Identifies implausible changes (>30 letters between consecutive measurements)
Tracks cleaning results in: - data_quality_report[‘va_outliers_before’] - data_quality_report[‘va_implausible_changes’]
Saves details of implausible changes to: - output/implausible_va_changes.csv
- handle_temporal_anomalies()[source]
Handle temporal anomalies in the data.
- Returns:
Modifies data in-place with corrected temporal sequences
- Return type:
None
Notes
Handles these temporal anomalies: 1. Out-of-sequence injection dates (fixes by sorting) 2. Long treatment gaps (>180 days) 3. Single injection patients
Tracks anomalies in: - data_quality_report[‘single_injection_patients’] - data_quality_report[‘sequence_fixes’] - data_quality_report[‘long_treatment_gaps’]
Saves details of sequence fixes to: - output/sequence_fixes.csv
- create_patient_id()[source]
Create unique patient and eye identifiers.
- Returns:
Modifies data in-place by adding: - patient_id - eye_key - eye_standardized
- Return type:
None
Notes
Identifier creation logic: 1. Uses existing UUID if available 2. Creates composite ID from available fields if UUID missing 3. Creates eye-specific key (patient_id + eye) 4. Standardizes eye values (uppercase, no spaces)
Finally sorts data by eye_key and injection date.
- generate_data_quality_report()[source]
Generate a comprehensive data quality report.
This method calculates various data quality metrics and saves them to a text file in the output directory.
- Returns:
The data quality report as a dictionary
- Return type:
dict
Notes
The report includes: 1. Summary metrics (rows, columns, missing data percentage) 2. Column mapping information 3. Validation errors and warnings 4. Missing values by column 5. Age data processing details 6. Temporal anomalies 7. VA measurement anomalies
The report is saved to ‘data_quality_report.txt’ in the output directory.
- analyze_patient_cohort()[source]
Analyze patient cohort demographics and treatment characteristics.
- Returns:
DataFrame with one row per patient containing: - Demographics (age, gender) - Eye information - Baseline measurements (VA, CRT) - Treatment information (injection count, dates) - Mortality information (deceased status, age at death)
- Return type:
pandas.DataFrame
Notes
Key processing steps: 1. Groups data by patient_id 2. Extracts first row for each patient to get baseline characteristics 3. Calculates treatment duration from first to last injection 4. Handles missing values in baseline measurements
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> patient_data = analyzer.analyze_patient_cohort() >>> print(patient_data[['patient_id', 'injection_count']].head())
- analyze_injection_intervals()[source]
Analyze time intervals between consecutive injections by eye.
- Returns:
DataFrame with interval information containing: - Patient and eye identifiers - Injection sequence numbers - Dates of consecutive injections - Interval in days between injections - VA measurements at each injection - Flags for long (>180d) and very long (>365d) gaps
- Return type:
pandas.DataFrame
Notes
Processing steps: 1. Groups data by eye_key (patient + eye) 2. Sorts injections by date 3. Calculates days between consecutive injections 4. Flags clinically significant gaps 5. Tracks VA changes between injections
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> intervals = analyzer.analyze_injection_intervals() >>> print(intervals[['eye_key', 'interval_days']].describe())
- analyze_va_trajectories()[source]
Analyze visual acuity trajectories over time by eye.
- Returns:
DataFrame with VA trajectory information containing: - Patient and eye identifiers - Injection sequence numbers - Days from first injection - VA score at each injection - Baseline VA - VA change from baseline
- Return type:
pandas.DataFrame
Notes
Processing steps: 1. Groups data by eye_key (patient + eye) 2. Uses first available VA as baseline if missing 3. Calculates days from first injection 4. Computes VA change from baseline 5. Applies smoothing for population average
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> va_traj = analyzer.analyze_va_trajectories() >>> print(va_traj[['eye_key', 'va_change']].describe())
- plot_injection_intervals()[source]
Plot distribution of injection intervals and intervals by sequence.
- Returns:
Saves two plots to output directory: 1. ‘injection_intervals.png’ - Histogram of intervals with reference lines 2. ‘injection_intervals_by_sequence.png’ - Mean/median intervals by sequence
- Return type:
None
Notes
- Plot 1 (Histogram):
Shows distribution of all injection intervals
- Includes reference lines at:
28 days (monthly)
56 days (bi-monthly)
84 days (quarterly)
Plot 2 (Sequence): - Shows mean ± SD and median intervals by injection number - Helps identify interval patterns over treatment course
Automatically calls analyze_injection_intervals() if needed.
- plot_va_trajectories()[source]
Plot visual acuity trajectories over time and by injection number.
- Returns:
Saves two plots to output directory: 1. ‘va_trajectories.png’ - Individual trajectories + population average 2. ‘va_by_injection_number.png’ - Mean VA by injection number
- Return type:
None
Notes
Plot 1 (Trajectories): - Shows VA over time for sample of 20 eyes - Includes LOESS-smoothed population average line - Falls back to simple average if statsmodels not available
Plot 2 (Injection Number): - Shows mean ± SD VA by injection sequence - Includes sample size annotations - Helps identify VA patterns over treatment course
Automatically calls analyze_va_trajectories() if needed.
- plot_va_change_distribution()[source]
Plot distribution of VA changes from baseline and outcome categories.
- Returns:
Saves two plots to output directory: 1. ‘va_change_distribution.png’ - Histogram of VA changes 2. ‘va_outcome_categories.png’ - Categorical outcomes
- Return type:
None
Notes
- Plot 1 (Histogram):
Shows distribution of final VA changes from baseline
- Includes reference lines at:
0 (no change)
±5 letters (gain/loss)
±15 letters (significant gain/loss)
- Plot 2 (Categories):
Groups outcomes into clinically relevant categories
Shows counts and percentages for each category
- Categories:
≥15 letter gain
5-14 letter gain
Stable (-4 to +4)
5-14 letter loss
≥15 letter loss
Unknown
Automatically calls analyze_va_trajectories() if needed.
- analyze_treatment_courses()[source]
Analyze treatment courses by identifying potential breaks.
- Returns:
DataFrame with treatment course information containing: - Patient and eye identifiers - Course start/end dates - Duration in days - Injection count - Flags for long pauses (>365d) - Potential separate courses
- Return type:
pandas.DataFrame
Notes
Key processing steps: 1. Groups data by eye_key (patient + eye) 2. Identifies very long gaps (>365d) as potential course breaks 3. Calculates duration from first to last injection 4. Tracks injection counts per course
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> courses = analyzer.analyze_treatment_courses() >>> print(courses[['eye_key', 'duration_days']].describe())
- plot_treatment_courses()[source]
Plot treatment course durations and injection counts per course.
- Returns:
Saves two plots to output directory: 1. ‘treatment_course_durations.png’ - Histogram of durations 2. ‘injections_per_course.png’ - Histogram of injection counts
- Return type:
None
Notes
Plot 1 (Durations): - Shows distribution of treatment course durations in days - Helps identify typical treatment persistence patterns
Plot 2 (Injections): - Shows distribution of injection counts per course - Uses discrete bins (1-20 injections) - Helps identify typical treatment intensity
Automatically calls analyze_treatment_courses() if needed.
- export_interval_va_data(format='csv', db_path=None)[source]
Export interval and VA data to CSV and/or SQLite format.
- Parameters:
format (str, optional) – Output format (‘csv’, ‘sqlite’, or ‘both’). Default ‘csv’.
db_path (str, optional) – Custom path for SQLite database. Default uses ‘eylea_intervals.db’ in output directory.
- Returns:
Dictionary containing paths to exported files with keys: - ‘csv’: Path to detailed CSV file - ‘summary_csv’: Path to summary CSV file - ‘sqlite’: Path to SQLite database (if exported)
- Return type:
dict
Notes
Exports two data types: 1. Detailed data (per-injection intervals and VA measurements) 2. Summary data (per-patient interval lists and VA changes)
CSV outputs: - ‘interval_va_data.csv’: Detailed injection-level data - ‘interval_va_summary.csv’: Patient-level summary
SQLite outputs: - ‘interval_va_data’ table: Detailed data - ‘interval_summary’ table: Summary data
Automatically calls analyze_injection_intervals() if needed.
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> paths = analyzer.export_interval_va_data(format='both') >>> print(paths['csv']) # Prints path to detailed CSV
- run_analysis()[source]
Execute complete analysis pipeline from data loading to export.
- Returns:
Dictionary with analysis summary containing: - patient_count: Number of unique patients - eye_count: Number of treated eyes - injection_count: Total injections analyzed - course_count: Number of treatment courses - mean_injection_interval: Average interval between injections - median_injection_interval: Median interval between injections - output_dir: Path to output directory - data_quality_report: Summary of data quality metrics - export_paths: Paths to exported files
- Return type:
dict
Notes
Analysis steps: 1. Data loading and cleaning 2. Patient cohort analysis 3. Injection interval analysis 4. VA trajectory analysis 5. Treatment course analysis 6. Visualization generation 7. Data export
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> results = analyzer.run_analysis() >>> print(f"Analyzed {results['patient_count']} patients")
- analysis.eylea_data_analysis.main()[source]
Command line interface for running Eylea data analysis.
- Returns:
Prints analysis summary to stdout
- Return type:
None
Notes
Command line arguments: –data : Path to input CSV file (default: ‘input_data/sample_raw.csv’) –output : Output directory (default: ‘output’) –debug : Enable debug logging –validation-strictness : Set validation level (‘strict’, ‘moderate’, ‘lenient’)
Example
python eylea_data_analysis.py –data treatment_data.csv –output results