Analysis Modules

Analysis of intravitreal Eylea injection treatment patterns and outcomes.

This module provides tools for analyzing real-world data from intravitreal Eylea injections to derive parameters for simulation models. The analysis includes:

  • Data loading, cleaning and validation

  • Patient cohort characterization

  • Injection interval analysis

  • Visual acuity trajectory analysis

  • Treatment course identification

  • Data visualization and export

Key Features

  • Robust data validation with flexible column name mapping

  • Comprehensive data quality reporting

  • Detailed analysis of treatment intervals and patterns

  • Visual acuity trajectory modeling

  • Automated visualization generation

  • Multiple export formats (CSV, SQLite)

Classes

EyleaDataAnalyzer : Main analysis class implementing the full analysis pipeline

Examples

>>> analyzer = EyleaDataAnalyzer('input_data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")
class analysis.eylea_data_analysis.EyleaDataAnalyzer(data_path, output_dir=None)[source]

Bases: object

Analyze Eylea treatment data to derive simulation parameters.

This class implements a comprehensive analysis pipeline for real-world Eylea treatment data, including data loading, cleaning, analysis, visualization and export.

Parameters:
  • data_path (str) – Path to CSV file containing Eylea treatment records

  • output_dir (str, optional) – Directory to save analysis outputs (default creates ‘output’ directory)

Variables:
  • data (pandas.DataFrame) – The loaded and processed treatment data

  • patient_data (pandas.DataFrame) – Patient-level analysis results

  • injection_intervals (pandas.DataFrame) – Injection interval analysis results

  • va_trajectories (pandas.DataFrame) – Visual acuity trajectory analysis results

  • treatment_courses (pandas.DataFrame) – Treatment course analysis results

  • data_quality_report (dict) – Comprehensive data quality assessment

Examples

>>> analyzer = EyleaDataAnalyzer('treatment_data.csv')
>>> analyzer.load_data()
>>> analyzer.analyze_injection_intervals()
>>> analyzer.plot_injection_intervals()
COLUMN_MAPPINGS = {'Age at Death': ['Age at Death', 'Death Age', 'Age When Deceased', 'Deceased Age'], 'Baseline CRT': ['Baseline CRT', 'BaselineCRT', 'Initial CRT', 'Starting CRT'], 'Baseline VA Letter Score': ['Baseline VA Letter Score', 'Baseline VA', 'BaselineVA', 'Initial VA', 'Starting VA'], 'CRT at Injection': ['CRT at Injection', 'CRT', 'Central Retinal Thickness'], 'Current Age': ['Current Age', 'Age', 'Patient Age'], 'Date of 1st Injection': ['Date of 1st Injection', 'First Injection Date', 'Initial Treatment Date', 'First Treatment Date'], 'Days Since Last Injection': ['Days Since Last Injection', 'Interval', 'Treatment Interval', 'Days_Since_Last', 'Injection Interval'], 'Deceased': ['Deceased', 'Death', 'Mortality'], 'Eye': ['Eye', 'Treated Eye'], 'Gender': ['Gender', 'Sex'], 'Injection Date': ['Injection Date', 'InjectionDate', 'Date of Injection', 'Treatment Date'], 'UUID': ['UUID', 'Patient ID', 'PatientID', 'Patient_ID', 'ID'], 'VA Letter Score at Injection': ['VA Letter Score at Injection', 'VA Score', 'ETDRS Score', 'Visual Acuity', 'VA_Score', 'Letter Score']}
DATA_VALIDATION = {'Age at Death': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline CRT': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline VA Letter Score': {'max': 100, 'min': 0, 'required': False, 'type': <class 'float'>}, 'CRT at Injection': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Current Age': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Date of 1st Injection': {'required': False, 'type': 'datetime'}, 'Days Since Last Injection': {'max': 365, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Deceased': {'max': 1, 'min': 0, 'required': False, 'type': <class 'int'>}, 'Injection Date': {'required': True, 'type': 'datetime'}, 'UUID': {'required': True, 'type': <class 'str'>}, 'VA Letter Score at Injection': {'max': 100, 'min': 0, 'required': True, 'type': <class 'float'>}}
__init__(data_path, output_dir=None)[source]

Initialize analyzer with data path and output directory.

Parameters:
  • data_path (str) – Path to CSV file containing Eylea treatment records. Expected columns: - Patient identifiers (UUID or similar) - Injection dates - Visual acuity measurements - Other treatment parameters

  • output_dir (str, optional) – Directory to save analysis outputs. If None, creates ‘output’ directory in current working directory.

Notes

The analyzer is initialized but no data is loaded until load_data() is called. For a complete analysis pipeline, use run_analysis() which handles all steps.

load_data()[source]

Load, validate and clean Eylea treatment data.

Returns:

The loaded, validated and cleaned data

Return type:

pandas.DataFrame

Raises:
  • ValueError – If required columns are missing or data validation fails

  • IOError – If the data file cannot be read

Notes

Processing steps: 1. Load CSV file from data_path 2. Map column names to standardized format 3. Validate data types and ranges 4. Clean missing values and outliers 5. Generate comprehensive data quality report

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> data = analyzer.load_data()
>>> print(f"Loaded {len(data)} records")
map_column_names()[source]

Map variant column names to standardized names.

Returns:

Modifies the data attribute in-place with standardized column names

Return type:

None

Notes

Uses both exact and fuzzy matching to handle common column name variations. Mappings are defined in the COLUMN_MAPPINGS class attribute.

The mapping process: 1. Attempts exact matches for each standard column name 2. Falls back to fuzzy matching (case/space insensitive) 3. Preserves unmapped columns unchanged

Results are stored in: - column_mapping_used attribute - data_quality_report[‘column_mapping’]

validate_data_structure()[source]

Validate data structure, types and integrity.

Returns:

Modifies data in-place with validated/converted values

Return type:

None

Raises:

ValueError – If required columns are missing or critical validation fails

Notes

Validation checks: 1. Required columns (per DATA_VALIDATION) 2. Date format conversion 3. Numeric value ranges 4. Temporal sequence integrity 5. Deceased status consistency 6. Duplicate records

Stores results in data_quality_report including: - validation_errors: Critical issues - validation_warnings: Non-critical issues - temporal_anomalies: Sequence errors - outliers: Values outside expected ranges

clean_data()[source]

Clean and preprocess the Eylea treatment data.

Returns:

Modifies the data attribute in-place with cleaned/preprocessed values

Return type:

None

Notes

Performs the following cleaning operations: 1. Handles missing values in critical fields 2. Cleans Visual Acuity measurements (clipping, outlier detection) 3. Handles temporal anomalies (out-of-sequence dates, long gaps) 4. Creates unique patient and eye identifiers 5. Calculates derived fields (adjusted age, days since last injection)

Results are tracked in: - data_quality_report[‘missing_values’] - data_quality_report[‘outliers’] - data_quality_report[‘temporal_anomalies’]

handle_missing_values()[source]

Handle missing values in the dataset.

Returns:

Modifies data in-place with imputed values where appropriate

Return type:

None

Notes

Missing value handling strategies: 1. Baseline VA: Uses first available VA measurement if missing 2. Age data: Different handling for deceased vs living patients 3. Current age: Adds 0.5 years to account for temporal alignment 4. Injection intervals: Calculates from dates if missing

Tracks missing values in: - data_quality_report[‘missing_values_before’] - data_quality_report[‘missing_values_after’]

clean_va_measurements()[source]

Clean and validate Visual Acuity measurements.

Returns:

Modifies VA measurements in-place with cleaned values

Return type:

None

Notes

Cleaning steps: 1. Clips VA values to valid range [0, 100] 2. Identifies implausible changes (>30 letters between consecutive measurements)

Tracks cleaning results in: - data_quality_report[‘va_outliers_before’] - data_quality_report[‘va_implausible_changes’]

Saves details of implausible changes to: - output/implausible_va_changes.csv

handle_temporal_anomalies()[source]

Handle temporal anomalies in the data.

Returns:

Modifies data in-place with corrected temporal sequences

Return type:

None

Notes

Handles these temporal anomalies: 1. Out-of-sequence injection dates (fixes by sorting) 2. Long treatment gaps (>180 days) 3. Single injection patients

Tracks anomalies in: - data_quality_report[‘single_injection_patients’] - data_quality_report[‘sequence_fixes’] - data_quality_report[‘long_treatment_gaps’]

Saves details of sequence fixes to: - output/sequence_fixes.csv

create_patient_id()[source]

Create unique patient and eye identifiers.

Returns:

Modifies data in-place by adding: - patient_id - eye_key - eye_standardized

Return type:

None

Notes

Identifier creation logic: 1. Uses existing UUID if available 2. Creates composite ID from available fields if UUID missing 3. Creates eye-specific key (patient_id + eye) 4. Standardizes eye values (uppercase, no spaces)

Finally sorts data by eye_key and injection date.

generate_data_quality_report()[source]

Generate a comprehensive data quality report.

This method calculates various data quality metrics and saves them to a text file in the output directory.

Returns:

The data quality report as a dictionary

Return type:

dict

Notes

The report includes: 1. Summary metrics (rows, columns, missing data percentage) 2. Column mapping information 3. Validation errors and warnings 4. Missing values by column 5. Age data processing details 6. Temporal anomalies 7. VA measurement anomalies

The report is saved to ‘data_quality_report.txt’ in the output directory.

analyze_patient_cohort()[source]

Analyze patient cohort demographics and treatment characteristics.

Returns:

DataFrame with one row per patient containing: - Demographics (age, gender) - Eye information - Baseline measurements (VA, CRT) - Treatment information (injection count, dates) - Mortality information (deceased status, age at death)

Return type:

pandas.DataFrame

Notes

Key processing steps: 1. Groups data by patient_id 2. Extracts first row for each patient to get baseline characteristics 3. Calculates treatment duration from first to last injection 4. Handles missing values in baseline measurements

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> patient_data = analyzer.analyze_patient_cohort()
>>> print(patient_data[['patient_id', 'injection_count']].head())
analyze_injection_intervals()[source]

Analyze time intervals between consecutive injections by eye.

Returns:

DataFrame with interval information containing: - Patient and eye identifiers - Injection sequence numbers - Dates of consecutive injections - Interval in days between injections - VA measurements at each injection - Flags for long (>180d) and very long (>365d) gaps

Return type:

pandas.DataFrame

Notes

Processing steps: 1. Groups data by eye_key (patient + eye) 2. Sorts injections by date 3. Calculates days between consecutive injections 4. Flags clinically significant gaps 5. Tracks VA changes between injections

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> intervals = analyzer.analyze_injection_intervals()
>>> print(intervals[['eye_key', 'interval_days']].describe())
analyze_va_trajectories()[source]

Analyze visual acuity trajectories over time by eye.

Returns:

DataFrame with VA trajectory information containing: - Patient and eye identifiers - Injection sequence numbers - Days from first injection - VA score at each injection - Baseline VA - VA change from baseline

Return type:

pandas.DataFrame

Notes

Processing steps: 1. Groups data by eye_key (patient + eye) 2. Uses first available VA as baseline if missing 3. Calculates days from first injection 4. Computes VA change from baseline 5. Applies smoothing for population average

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> va_traj = analyzer.analyze_va_trajectories()
>>> print(va_traj[['eye_key', 'va_change']].describe())
plot_injection_intervals()[source]

Plot distribution of injection intervals and intervals by sequence.

Returns:

Saves two plots to output directory: 1. ‘injection_intervals.png’ - Histogram of intervals with reference lines 2. ‘injection_intervals_by_sequence.png’ - Mean/median intervals by sequence

Return type:

None

Notes

Plot 1 (Histogram):
  • Shows distribution of all injection intervals

  • Includes reference lines at:
    • 28 days (monthly)

    • 56 days (bi-monthly)

    • 84 days (quarterly)

Plot 2 (Sequence): - Shows mean ± SD and median intervals by injection number - Helps identify interval patterns over treatment course

Automatically calls analyze_injection_intervals() if needed.

plot_va_trajectories()[source]

Plot visual acuity trajectories over time and by injection number.

Returns:

Saves two plots to output directory: 1. ‘va_trajectories.png’ - Individual trajectories + population average 2. ‘va_by_injection_number.png’ - Mean VA by injection number

Return type:

None

Notes

Plot 1 (Trajectories): - Shows VA over time for sample of 20 eyes - Includes LOESS-smoothed population average line - Falls back to simple average if statsmodels not available

Plot 2 (Injection Number): - Shows mean ± SD VA by injection sequence - Includes sample size annotations - Helps identify VA patterns over treatment course

Automatically calls analyze_va_trajectories() if needed.

plot_va_change_distribution()[source]

Plot distribution of VA changes from baseline and outcome categories.

Returns:

Saves two plots to output directory: 1. ‘va_change_distribution.png’ - Histogram of VA changes 2. ‘va_outcome_categories.png’ - Categorical outcomes

Return type:

None

Notes

Plot 1 (Histogram):
  • Shows distribution of final VA changes from baseline

  • Includes reference lines at:
    • 0 (no change)

    • ±5 letters (gain/loss)

    • ±15 letters (significant gain/loss)

Plot 2 (Categories):
  • Groups outcomes into clinically relevant categories

  • Shows counts and percentages for each category

  • Categories:
    • ≥15 letter gain

    • 5-14 letter gain

    • Stable (-4 to +4)

    • 5-14 letter loss

    • ≥15 letter loss

    • Unknown

Automatically calls analyze_va_trajectories() if needed.

analyze_treatment_courses()[source]

Analyze treatment courses by identifying potential breaks.

Returns:

DataFrame with treatment course information containing: - Patient and eye identifiers - Course start/end dates - Duration in days - Injection count - Flags for long pauses (>365d) - Potential separate courses

Return type:

pandas.DataFrame

Notes

Key processing steps: 1. Groups data by eye_key (patient + eye) 2. Identifies very long gaps (>365d) as potential course breaks 3. Calculates duration from first to last injection 4. Tracks injection counts per course

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> courses = analyzer.analyze_treatment_courses()
>>> print(courses[['eye_key', 'duration_days']].describe())
plot_treatment_courses()[source]

Plot treatment course durations and injection counts per course.

Returns:

Saves two plots to output directory: 1. ‘treatment_course_durations.png’ - Histogram of durations 2. ‘injections_per_course.png’ - Histogram of injection counts

Return type:

None

Notes

Plot 1 (Durations): - Shows distribution of treatment course durations in days - Helps identify typical treatment persistence patterns

Plot 2 (Injections): - Shows distribution of injection counts per course - Uses discrete bins (1-20 injections) - Helps identify typical treatment intensity

Automatically calls analyze_treatment_courses() if needed.

export_interval_va_data(format='csv', db_path=None)[source]

Export interval and VA data to CSV and/or SQLite format.

Parameters:
  • format (str, optional) – Output format (‘csv’, ‘sqlite’, or ‘both’). Default ‘csv’.

  • db_path (str, optional) – Custom path for SQLite database. Default uses ‘eylea_intervals.db’ in output directory.

Returns:

Dictionary containing paths to exported files with keys: - ‘csv’: Path to detailed CSV file - ‘summary_csv’: Path to summary CSV file - ‘sqlite’: Path to SQLite database (if exported)

Return type:

dict

Notes

Exports two data types: 1. Detailed data (per-injection intervals and VA measurements) 2. Summary data (per-patient interval lists and VA changes)

CSV outputs: - ‘interval_va_data.csv’: Detailed injection-level data - ‘interval_va_summary.csv’: Patient-level summary

SQLite outputs: - ‘interval_va_data’ table: Detailed data - ‘interval_summary’ table: Summary data

Automatically calls analyze_injection_intervals() if needed.

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> paths = analyzer.export_interval_va_data(format='both')
>>> print(paths['csv'])  # Prints path to detailed CSV
run_analysis()[source]

Execute complete analysis pipeline from data loading to export.

Returns:

Dictionary with analysis summary containing: - patient_count: Number of unique patients - eye_count: Number of treated eyes - injection_count: Total injections analyzed - course_count: Number of treatment courses - mean_injection_interval: Average interval between injections - median_injection_interval: Median interval between injections - output_dir: Path to output directory - data_quality_report: Summary of data quality metrics - export_paths: Paths to exported files

Return type:

dict

Notes

Analysis steps: 1. Data loading and cleaning 2. Patient cohort analysis 3. Injection interval analysis 4. VA trajectory analysis 5. Treatment course analysis 6. Visualization generation 7. Data export

Examples

>>> analyzer = EyleaDataAnalyzer('data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")
analysis.eylea_data_analysis.main()[source]

Command line interface for running Eylea data analysis.

Returns:

Prints analysis summary to stdout

Return type:

None

Notes

Command line arguments: –data : Path to input CSV file (default: ‘input_data/sample_raw.csv’) –output : Output directory (default: ‘output’) –debug : Enable debug logging –validation-strictness : Set validation level (‘strict’, ‘moderate’, ‘lenient’)

Example

python eylea_data_analysis.py –data treatment_data.csv –output results

Eylea Injection Intervals Analysis

This script analyzes the injection intervals data from the SQLite database to identify patterns in treatment, specifically looking for two groups: - Group LH: 7 injections in first year, then continuing with injections every ~2 months - Group MR: 7 injections in first year, then a pause before resumption of treatment

The script also performs Principal Component Analysis (PCA) to identify patterns in treatment intervals and visual acuity measures (previous VA, current VA, next VA).

analysis.eylea_intervals_analysis.connect_to_db()[source]

Connect to the SQLite database.

Return type:

Connection

analysis.eylea_intervals_analysis.load_interval_data()[source]

Load the interval_va_data table into a Polars DataFrame.

Return type:

DataFrame

analysis.eylea_intervals_analysis.load_interval_summary()[source]

Load the interval_summary table into a Polars DataFrame.

Return type:

DataFrame

analysis.eylea_intervals_analysis.analyze_first_year_injections(df)[source]

Analyze the first year of injections for each patient.

Return type:

DataFrame

Parameters:

df (DataFrame)

Args:

df: DataFrame with interval_va_data

Returns:

DataFrame with first year injection analysis

analysis.eylea_intervals_analysis.identify_treatment_groups(df)[source]

Identify the two treatment groups: - Group LH: 7 injections in first year, then continuing with injections every ~2 months - Group MR: 7 injections in first year, then a pause before resumption of treatment

Return type:

DataFrame

Parameters:

df (DataFrame)

Args:

df: DataFrame with first year injection analysis

Returns:

DataFrame with group assignments

analysis.eylea_intervals_analysis.cluster_treatment_patterns(df)[source]

Use K-means clustering to identify treatment pattern groups.

Return type:

DataFrame

Parameters:

df (DataFrame)

Args:

df: DataFrame with first year injection analysis

Returns:

DataFrame with cluster assignments

analysis.eylea_intervals_analysis.analyze_intervals_by_group(df, interval_data)[source]

Analyze and visualize injection intervals by treatment group.

Return type:

None

Parameters:
  • df (DataFrame)

  • interval_data (DataFrame)

Args:

df: DataFrame with group assignments interval_data: Raw interval data

analysis.eylea_intervals_analysis.prepare_va_interval_data_for_pca(interval_data)[source]

Prepare data for PCA analysis by calculating next VA for each record.

This function processes the interval data to create a dataset with: - treatment_interval (interval_days) - previous_va (prev_va) - current_va - next_va (calculated by joining with next record)

Return type:

DataFrame

Parameters:

interval_data (DataFrame)

Args:

interval_data: Raw interval data from the database

Returns:

DataFrame with prepared features for PCA analysis

analysis.eylea_intervals_analysis.perform_va_interval_pca(interval_data)[source]

Perform PCA analysis on treatment intervals and visual acuity measures.

This function identifies patterns in: - Treatment interval - Previous VA - Current VA - Next VA

Return type:

None

Parameters:

interval_data (DataFrame)

Args:

interval_data: Raw interval data from the database

analysis.eylea_intervals_analysis.analyze_va_by_group(df, interval_data)[source]

Analyze and visualize visual acuity by treatment group.

Return type:

None

Parameters:
  • df (DataFrame)

  • interval_data (DataFrame)

Args:

df: DataFrame with group assignments interval_data: Raw interval data

analysis.eylea_intervals_analysis.main()[source]

Main analysis function.

Visualize Visual Acuity Trajectories by PCA Cluster.

This module creates visualizations of visual acuity (VA) trajectories over time for patients grouped by clusters identified through PCA analysis.

The visualizations include: - Individual VA trajectories for sampled patients from each cluster - Average VA trajectories with standard deviation bands - VA change from baseline plots

Notes

The analysis expects: 1. A SQLite database with interval VA data 2. A CSV file with cluster assignments from PCA analysis 3. Output directory for saving plots

Examples

>>> python visualize_va_by_pca_cluster.py
Generates plots in output/analysis_results directory:
- va_trajectories_by_pca_cluster.png
- va_change_by_pca_cluster.png
analysis.visualize_va_by_pca_cluster.connect_to_db()[source]

Establish connection to the SQLite database containing VA data.

Returns:

Active database connection object

Return type:

sqlite3.Connection

Notes

The database path is defined by the DB_PATH constant. Connection should be closed by the caller when done.

analysis.visualize_va_by_pca_cluster.load_interval_data()[source]

Load interval VA data from SQLite database into Polars DataFrame.

Returns:

DataFrame containing: - uuid: Patient identifier - eye: Eye (left/right) - previous_date: Previous visit date - current_date: Current visit date - prev_va: Previous visual acuity - current_va: Current visual acuity - interval_days: Days between visits

Return type:

pl.DataFrame

Notes

  • Automatically converts date strings to datetime objects

  • Closes database connection when done

analysis.visualize_va_by_pca_cluster.load_cluster_assignments()[source]

Load patient cluster assignments from PCA analysis results.

Returns:

DataFrame containing: - uuid: Patient identifier - eye: Eye (left/right) - cluster: PCA cluster assignment (0-3)

Return type:

pl.DataFrame

Notes

  • Reads from output/analysis_results/va_interval_clusters_4.csv

  • Cluster meanings: 0: Moderate VA, Moderate Interval 1: Low VA, Moderate Interval 2: High VA, Short Interval 3: Long Gap Patients

analysis.visualize_va_by_pca_cluster.visualize_va_trajectories_by_cluster(interval_data, cluster_df)[source]

Generate visualizations of VA trajectories grouped by PCA clusters.

Creates two plots: 1. Individual VA trajectories with cluster averages 2. VA change from baseline with standard deviation bands

Parameters:
  • interval_data (pl.DataFrame) – DataFrame containing interval VA data with columns: - uuid: Patient identifier - eye: Eye (left/right) - previous_date: Previous visit date - current_date: Current visit date - prev_va: Previous visual acuity - current_va: Current visual acuity - interval_days: Days between visits

  • cluster_df (pl.DataFrame) – DataFrame containing cluster assignments with columns: - uuid: Patient identifier - eye: Eye (left/right) - cluster: PCA cluster assignment (0-3)

Returns:

Saves plots to output/analysis_results directory: - va_trajectories_by_pca_cluster.png - va_change_by_pca_cluster.png

Return type:

None

Notes

  • Samples up to 10 patients per cluster for individual trajectories

  • Uses 60-day bins for calculating averages

  • Limits visualization to first 1000 days for clarity

analysis.visualize_va_by_pca_cluster.main()[source]

Main function.

Visualize Long Gap Patients (Cluster 4)

This script creates visualizations specifically focused on the long-gap patients (Cluster 4) identified in the PCA analysis, to better illustrate what happens to visual acuity before, during, and after the long treatment gaps.

analysis.visualize_long_gap_patients.connect_to_db()[source]

Connect to the SQLite database.

Return type:

Connection

analysis.visualize_long_gap_patients.load_interval_data()[source]

Load the interval_va_data table into a Polars DataFrame.

Return type:

DataFrame

analysis.visualize_long_gap_patients.load_cluster_assignments()[source]

Load the cluster assignments from the PCA analysis.

Return type:

DataFrame

analysis.visualize_long_gap_patients.get_long_gap_patients(interval_data, cluster_df)[source]

Identify and prepare data for long-gap patients (Cluster 4).

Return type:

DataFrame

Parameters:
  • interval_data (DataFrame)

  • cluster_df (DataFrame)

Args:

interval_data: Raw interval data cluster_df: DataFrame with cluster assignments

Returns:

DataFrame with long-gap patient data

analysis.visualize_long_gap_patients.visualize_long_gap_patients(long_gap_data)[source]

Create visualizations focused on long-gap patients.

Return type:

None

Parameters:

long_gap_data (DataFrame)

Args:

long_gap_data: DataFrame with long-gap patient data

analysis.visualize_long_gap_patients.main()[source]

Main function.

Visualize VA Change by Interval with PCA Cluster Information

This script creates an enhanced visualization of VA change by interval length, using different marker shapes to indicate which PCA cluster each data point belongs to.

analysis.visualize_va_change_by_cluster.connect_to_db()[source]

Connect to the SQLite database.

Return type:

Connection

analysis.visualize_va_change_by_cluster.load_interval_data()[source]

Load the interval_va_data table into a Polars DataFrame.

Return type:

DataFrame

analysis.visualize_va_change_by_cluster.load_cluster_assignments()[source]

Load the cluster assignments from the PCA analysis.

Return type:

DataFrame

analysis.visualize_va_change_by_cluster.visualize_va_change_by_cluster(interval_data, cluster_df)[source]

Create an enhanced visualization of VA change by interval with cluster information.

Return type:

None

Parameters:
  • interval_data (DataFrame)

  • cluster_df (DataFrame)

Args:

interval_data: Raw interval data cluster_df: DataFrame with cluster assignments

analysis.visualize_va_change_by_cluster.main()[source]

Main function.