Eylea Data Analysis Module

Analysis of intravitreal Eylea injection treatment patterns and outcomes.

This module provides tools for analyzing real-world data from intravitreal Eylea injections to derive parameters for simulation models. The analysis includes:

Data loading, cleaning and validation
Patient cohort characterization
Injection interval analysis
Visual acuity trajectory analysis
Treatment course identification
Data visualization and export

Key Features

Robust data validation with flexible column name mapping
Comprehensive data quality reporting
Detailed analysis of treatment intervals and patterns
Visual acuity trajectory modeling
Automated visualization generation
Multiple export formats (CSV, SQLite)

Classes

EyleaDataAnalyzer : Main analysis class implementing the full analysis pipeline

Examples

>>> analyzer = EyleaDataAnalyzer('input_data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")

class analysis.eylea_data_analysis.EyleaDataAnalyzer(data_path, output_dir=None)[source]

Bases: object

Analyze Eylea treatment data to derive simulation parameters.

This class implements a comprehensive analysis pipeline for real-world Eylea treatment data, including data loading, cleaning, analysis, visualization and export.

Parameters:

data_path (str) – Path to CSV file containing Eylea treatment records
output_dir (str, optional) – Directory to save analysis outputs (default creates ‘output’ directory)

Variables:

data (pandas.DataFrame) – The loaded and processed treatment data
patient_data (pandas.DataFrame) – Patient-level analysis results
injection_intervals (pandas.DataFrame) – Injection interval analysis results
va_trajectories (pandas.DataFrame) – Visual acuity trajectory analysis results
treatment_courses (pandas.DataFrame) – Treatment course analysis results
data_quality_report (dict) – Comprehensive data quality assessment

Examples

>>> analyzer = EyleaDataAnalyzer('treatment_data.csv')
>>> analyzer.load_data()
>>> analyzer.analyze_injection_intervals()
>>> analyzer.plot_injection_intervals()

COLUMN_MAPPINGS = {'Age at Death': ['Age at Death', 'Death Age', 'Age When Deceased', 'Deceased Age'], 'Baseline CRT': ['Baseline CRT', 'BaselineCRT', 'Initial CRT', 'Starting CRT'], 'Baseline VA Letter Score': ['Baseline VA Letter Score', 'Baseline VA', 'BaselineVA', 'Initial VA', 'Starting VA'], 'CRT at Injection': ['CRT at Injection', 'CRT', 'Central Retinal Thickness'], 'Current Age': ['Current Age', 'Age', 'Patient Age'], 'Date of 1st Injection': ['Date of 1st Injection', 'First Injection Date', 'Initial Treatment Date', 'First Treatment Date'], 'Days Since Last Injection': ['Days Since Last Injection', 'Interval', 'Treatment Interval', 'Days_Since_Last', 'Injection Interval'], 'Deceased': ['Deceased', 'Death', 'Mortality'], 'Eye': ['Eye', 'Treated Eye'], 'Gender': ['Gender', 'Sex'], 'Injection Date': ['Injection Date', 'InjectionDate', 'Date of Injection', 'Treatment Date'], 'UUID': ['UUID', 'Patient ID', 'PatientID', 'Patient_ID', 'ID'], 'VA Letter Score at Injection': ['VA Letter Score at Injection', 'VA Score', 'ETDRS Score', 'Visual Acuity', 'VA_Score', 'Letter Score']}

DATA_VALIDATION = {'Age at Death': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline CRT': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline VA Letter Score': {'max': 100, 'min': 0, 'required': False, 'type': <class 'float'>}, 'CRT at Injection': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Current Age': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Date of 1st Injection': {'required': False, 'type': 'datetime'}, 'Days Since Last Injection': {'max': 365, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Deceased': {'max': 1, 'min': 0, 'required': False, 'type': <class 'int'>}, 'Injection Date': {'required': True, 'type': 'datetime'}, 'UUID': {'required': True, 'type': <class 'str'>}, 'VA Letter Score at Injection': {'max': 100, 'min': 0, 'required': True, 'type': <class 'float'>}}

__init__(data_path, output_dir=None)[source]

Initialize analyzer with data path and output directory.

Parameters:

data_path (str) – Path to CSV file containing Eylea treatment records. Expected columns: - Patient identifiers (UUID or similar) - Injection dates - Visual acuity measurements - Other treatment parameters
output_dir (str, optional) – Directory to save analysis outputs. If None, creates ‘output’ directory in current working directory.

Notes

The analyzer is initialized but no data is loaded until load_data() is called. For a complete analysis pipeline, use run_analysis() which handles all steps.

load_data()[source]

Load, validate and clean Eylea treatment data.

Returns:

The loaded, validated and cleaned data

Return type:

pandas.DataFrame

Raises:

ValueError – If required columns are missing or data validation fails
IOError – If the data file cannot be read

Notes

Processing steps: 1. Load CSV file from data_path 2. Map column names to standardized format 3. Validate data types and ranges 4. Clean missing values and outliers 5. Generate comprehensive data quality report

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> data = analyzer.load_data()
>>> print(f"Loaded {len(data)} records")

map_column_names()[source]

Map variant column names to standardized names.

Returns:: Modifies the data attribute in-place with standardized column names
Return type:: None

Notes

Uses both exact and fuzzy matching to handle common column name variations. Mappings are defined in the COLUMN_MAPPINGS class attribute.

The mapping process: 1. Attempts exact matches for each standard column name 2. Falls back to fuzzy matching (case/space insensitive) 3. Preserves unmapped columns unchanged

Results are stored in: - column_mapping_used attribute - data_quality_report[‘column_mapping’]

validate_data_structure()[source]

Validate data structure, types and integrity.

Returns:: Modifies data in-place with validated/converted values
Return type:: None
Raises:: ValueError – If required columns are missing or critical validation fails

Notes

Validation checks: 1. Required columns (per DATA_VALIDATION) 2. Date format conversion 3. Numeric value ranges 4. Temporal sequence integrity 5. Deceased status consistency 6. Duplicate records

Stores results in data_quality_report including: - validation_errors: Critical issues - validation_warnings: Non-critical issues - temporal_anomalies: Sequence errors - outliers: Values outside expected ranges

clean_data()[source]

Clean and preprocess the Eylea treatment data.

Returns:: Modifies the data attribute in-place with cleaned/preprocessed values
Return type:: None

Notes

Performs the following cleaning operations: 1. Handles missing values in critical fields 2. Cleans Visual Acuity measurements (clipping, outlier detection) 3. Handles temporal anomalies (out-of-sequence dates, long gaps) 4. Creates unique patient and eye identifiers 5. Calculates derived fields (adjusted age, days since last injection)

Results are tracked in: - data_quality_report[‘missing_values’] - data_quality_report[‘outliers’] - data_quality_report[‘temporal_anomalies’]

handle_missing_values()[source]

Handle missing values in the dataset.

Returns:: Modifies data in-place with imputed values where appropriate
Return type:: None

Notes

Missing value handling strategies: 1. Baseline VA: Uses first available VA measurement if missing 2. Age data: Different handling for deceased vs living patients 3. Current age: Adds 0.5 years to account for temporal alignment 4. Injection intervals: Calculates from dates if missing

Tracks missing values in: - data_quality_report[‘missing_values_before’] - data_quality_report[‘missing_values_after’]

clean_va_measurements()[source]

Clean and validate Visual Acuity measurements.

Returns:: Modifies VA measurements in-place with cleaned values
Return type:: None

Notes

Cleaning steps: 1. Clips VA values to valid range [0, 100] 2. Identifies implausible changes (>30 letters between consecutive measurements)

Tracks cleaning results in: - data_quality_report[‘va_outliers_before’] - data_quality_report[‘va_implausible_changes’]

Saves details of implausible changes to: - output/implausible_va_changes.csv

handle_temporal_anomalies()[source]

Handle temporal anomalies in the data.

Returns:: Modifies data in-place with corrected temporal sequences
Return type:: None

Notes

Handles these temporal anomalies: 1. Out-of-sequence injection dates (fixes by sorting) 2. Long treatment gaps (>180 days) 3. Single injection patients

Tracks anomalies in: - data_quality_report[‘single_injection_patients’] - data_quality_report[‘sequence_fixes’] - data_quality_report[‘long_treatment_gaps’]

Saves details of sequence fixes to: - output/sequence_fixes.csv

create_patient_id()[source]

Create unique patient and eye identifiers.

Returns:: Modifies data in-place by adding: - patient_id - eye_key - eye_standardized
Return type:: None

Notes

Identifier creation logic: 1. Uses existing UUID if available 2. Creates composite ID from available fields if UUID missing 3. Creates eye-specific key (patient_id + eye) 4. Standardizes eye values (uppercase, no spaces)

Finally sorts data by eye_key and injection date.

generate_data_quality_report()[source]

Generate a comprehensive data quality report.

This method calculates various data quality metrics and saves them to a text file in the output directory.

Returns:: The data quality report as a dictionary
Return type:: dict

Notes

The report includes: 1. Summary metrics (rows, columns, missing data percentage) 2. Column mapping information 3. Validation errors and warnings 4. Missing values by column 5. Age data processing details 6. Temporal anomalies 7. VA measurement anomalies

The report is saved to ‘data_quality_report.txt’ in the output directory.

analyze_patient_cohort()[source]

Analyze patient cohort demographics and treatment characteristics.

Returns:: DataFrame with one row per patient containing: - Demographics (age, gender) - Eye information - Baseline measurements (VA, CRT) - Treatment information (injection count, dates) - Mortality information (deceased status, age at death)
Return type:: pandas.DataFrame

Notes

Key processing steps: 1. Groups data by patient_id 2. Extracts first row for each patient to get baseline characteristics 3. Calculates treatment duration from first to last injection 4. Handles missing values in baseline measurements

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> patient_data = analyzer.analyze_patient_cohort()
>>> print(patient_data[['patient_id', 'injection_count']].head())

analyze_injection_intervals()[source]

Analyze time intervals between consecutive injections by eye.

Returns:: DataFrame with interval information containing: - Patient and eye identifiers - Injection sequence numbers - Dates of consecutive injections - Interval in days between injections - VA measurements at each injection - Flags for long (>180d) and very long (>365d) gaps
Return type:: pandas.DataFrame

Notes

Processing steps: 1. Groups data by eye_key (patient + eye) 2. Sorts injections by date 3. Calculates days between consecutive injections 4. Flags clinically significant gaps 5. Tracks VA changes between injections

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> intervals = analyzer.analyze_injection_intervals()
>>> print(intervals[['eye_key', 'interval_days']].describe())

analyze_va_trajectories()[source]

Analyze visual acuity trajectories over time by eye.

Returns:: DataFrame with VA trajectory information containing: - Patient and eye identifiers - Injection sequence numbers - Days from first injection - VA score at each injection - Baseline VA - VA change from baseline
Return type:: pandas.DataFrame

Notes

Processing steps: 1. Groups data by eye_key (patient + eye) 2. Uses first available VA as baseline if missing 3. Calculates days from first injection 4. Computes VA change from baseline 5. Applies smoothing for population average

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> va_traj = analyzer.analyze_va_trajectories()
>>> print(va_traj[['eye_key', 'va_change']].describe())

plot_injection_intervals()[source]

Plot distribution of injection intervals and intervals by sequence.

Returns:: Saves two plots to output directory: 1. ‘injection_intervals.png’ - Histogram of intervals with reference lines 2. ‘injection_intervals_by_sequence.png’ - Mean/median intervals by sequence
Return type:: None

Notes

Plot 1 (Histogram):

Shows distribution of all injection intervals
Includes reference lines at:
- 28 days (monthly)
- 56 days (bi-monthly)
- 84 days (quarterly)

Plot 2 (Sequence): - Shows mean ± SD and median intervals by injection number - Helps identify interval patterns over treatment course

Automatically calls analyze_injection_intervals() if needed.

plot_va_trajectories()[source]

Plot visual acuity trajectories over time and by injection number.

Returns:: Saves two plots to output directory: 1. ‘va_trajectories.png’ - Individual trajectories + population average 2. ‘va_by_injection_number.png’ - Mean VA by injection number
Return type:: None

Notes

Plot 1 (Trajectories): - Shows VA over time for sample of 20 eyes - Includes LOESS-smoothed population average line - Falls back to simple average if statsmodels not available

Plot 2 (Injection Number): - Shows mean ± SD VA by injection sequence - Includes sample size annotations - Helps identify VA patterns over treatment course

Automatically calls analyze_va_trajectories() if needed.

plot_va_change_distribution()[source]

Plot distribution of VA changes from baseline and outcome categories.

Returns:: Saves two plots to output directory: 1. ‘va_change_distribution.png’ - Histogram of VA changes 2. ‘va_outcome_categories.png’ - Categorical outcomes
Return type:: None

Notes

Plot 1 (Histogram):

Shows distribution of final VA changes from baseline
Includes reference lines at:
- 0 (no change)
- ±5 letters (gain/loss)
- ±15 letters (significant gain/loss)

Plot 2 (Categories):

Groups outcomes into clinically relevant categories
Shows counts and percentages for each category
Categories:
- ≥15 letter gain
- 5-14 letter gain
- Stable (-4 to +4)
- 5-14 letter loss
- ≥15 letter loss
- Unknown

Automatically calls analyze_va_trajectories() if needed.

analyze_treatment_courses()[source]

Analyze treatment courses by identifying potential breaks.

Returns:: DataFrame with treatment course information containing: - Patient and eye identifiers - Course start/end dates - Duration in days - Injection count - Flags for long pauses (>365d) - Potential separate courses
Return type:: pandas.DataFrame

Notes

Key processing steps: 1. Groups data by eye_key (patient + eye) 2. Identifies very long gaps (>365d) as potential course breaks 3. Calculates duration from first to last injection 4. Tracks injection counts per course

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> courses = analyzer.analyze_treatment_courses()
>>> print(courses[['eye_key', 'duration_days']].describe())

plot_treatment_courses()[source]

Plot treatment course durations and injection counts per course.

Returns:: Saves two plots to output directory: 1. ‘treatment_course_durations.png’ - Histogram of durations 2. ‘injections_per_course.png’ - Histogram of injection counts
Return type:: None

Notes

Plot 1 (Durations): - Shows distribution of treatment course durations in days - Helps identify typical treatment persistence patterns

Plot 2 (Injections): - Shows distribution of injection counts per course - Uses discrete bins (1-20 injections) - Helps identify typical treatment intensity

Automatically calls analyze_treatment_courses() if needed.

export_interval_va_data(format='csv', db_path=None)[source]

Export interval and VA data to CSV and/or SQLite format.

Parameters:

format (str, optional) – Output format (‘csv’, ‘sqlite’, or ‘both’). Default ‘csv’.
db_path (str, optional) – Custom path for SQLite database. Default uses ‘eylea_intervals.db’ in output directory.

Returns:

Dictionary containing paths to exported files with keys: - ‘csv’: Path to detailed CSV file - ‘summary_csv’: Path to summary CSV file - ‘sqlite’: Path to SQLite database (if exported)

Return type:

dict

Notes

Exports two data types: 1. Detailed data (per-injection intervals and VA measurements) 2. Summary data (per-patient interval lists and VA changes)

CSV outputs: - ‘interval_va_data.csv’: Detailed injection-level data - ‘interval_va_summary.csv’: Patient-level summary

SQLite outputs: - ‘interval_va_data’ table: Detailed data - ‘interval_summary’ table: Summary data

Automatically calls analyze_injection_intervals() if needed.

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> paths = analyzer.export_interval_va_data(format='both')
>>> print(paths['csv'])  # Prints path to detailed CSV

run_analysis()[source]

Execute complete analysis pipeline from data loading to export.

Returns:: Dictionary with analysis summary containing: - patient_count: Number of unique patients - eye_count: Number of treated eyes - injection_count: Total injections analyzed - course_count: Number of treatment courses - mean_injection_interval: Average interval between injections - median_injection_interval: Median interval between injections - output_dir: Path to output directory - data_quality_report: Summary of data quality metrics - export_paths: Paths to exported files
Return type:: dict

Notes

Analysis steps: 1. Data loading and cleaning 2. Patient cohort analysis 3. Injection interval analysis 4. VA trajectory analysis 5. Treatment course analysis 6. Visualization generation 7. Data export

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")

analysis.eylea_data_analysis.main()[source]

Command line interface for running Eylea data analysis.

Returns:: Prints analysis summary to stdout
Return type:: None

Notes

Command line arguments: –data : Path to input CSV file (default: ‘input_data/sample_raw.csv’) –output : Output directory (default: ‘output’) –debug : Enable debug logging –validation-strictness : Set validation level (‘strict’, ‘moderate’, ‘lenient’)

Example

python eylea_data_analysis.py –data treatment_data.csv –output results