Analysis Modules
Analysis of intravitreal Eylea injection treatment patterns and outcomes.
This module provides tools for analyzing real-world data from intravitreal Eylea injections to derive parameters for simulation models. The analysis includes:
Data loading, cleaning and validation
Patient cohort characterization
Injection interval analysis
Visual acuity trajectory analysis
Treatment course identification
Data visualization and export
Key Features
Robust data validation with flexible column name mapping
Comprehensive data quality reporting
Detailed analysis of treatment intervals and patterns
Visual acuity trajectory modeling
Automated visualization generation
Multiple export formats (CSV, SQLite)
Classes
EyleaDataAnalyzer : Main analysis class implementing the full analysis pipeline
Examples
>>> analyzer = EyleaDataAnalyzer('input_data.csv')
>>> results = analyzer.run_analysis()
>>> print(f"Analyzed {results['patient_count']} patients")
- class analysis.eylea_data_analysis.EyleaDataAnalyzer(data_path, output_dir=None)[source]
Bases:
objectAnalyze Eylea treatment data to derive simulation parameters.
This class implements a comprehensive analysis pipeline for real-world Eylea treatment data, including data loading, cleaning, analysis, visualization and export.
- Parameters:
data_path (str) – Path to CSV file containing Eylea treatment records
output_dir (str, optional) – Directory to save analysis outputs (default creates ‘output’ directory)
- Variables:
data (pandas.DataFrame) – The loaded and processed treatment data
patient_data (pandas.DataFrame) – Patient-level analysis results
injection_intervals (pandas.DataFrame) – Injection interval analysis results
va_trajectories (pandas.DataFrame) – Visual acuity trajectory analysis results
treatment_courses (pandas.DataFrame) – Treatment course analysis results
data_quality_report (dict) – Comprehensive data quality assessment
Examples
>>> analyzer = EyleaDataAnalyzer('treatment_data.csv') >>> analyzer.load_data() >>> analyzer.analyze_injection_intervals() >>> analyzer.plot_injection_intervals()
- COLUMN_MAPPINGS = {'Age at Death': ['Age at Death', 'Death Age', 'Age When Deceased', 'Deceased Age'], 'Baseline CRT': ['Baseline CRT', 'BaselineCRT', 'Initial CRT', 'Starting CRT'], 'Baseline VA Letter Score': ['Baseline VA Letter Score', 'Baseline VA', 'BaselineVA', 'Initial VA', 'Starting VA'], 'CRT at Injection': ['CRT at Injection', 'CRT', 'Central Retinal Thickness'], 'Current Age': ['Current Age', 'Age', 'Patient Age'], 'Date of 1st Injection': ['Date of 1st Injection', 'First Injection Date', 'Initial Treatment Date', 'First Treatment Date'], 'Days Since Last Injection': ['Days Since Last Injection', 'Interval', 'Treatment Interval', 'Days_Since_Last', 'Injection Interval'], 'Deceased': ['Deceased', 'Death', 'Mortality'], 'Eye': ['Eye', 'Treated Eye'], 'Gender': ['Gender', 'Sex'], 'Injection Date': ['Injection Date', 'InjectionDate', 'Date of Injection', 'Treatment Date'], 'UUID': ['UUID', 'Patient ID', 'PatientID', 'Patient_ID', 'ID'], 'VA Letter Score at Injection': ['VA Letter Score at Injection', 'VA Score', 'ETDRS Score', 'Visual Acuity', 'VA_Score', 'Letter Score']}
- DATA_VALIDATION = {'Age at Death': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline CRT': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Baseline VA Letter Score': {'max': 100, 'min': 0, 'required': False, 'type': <class 'float'>}, 'CRT at Injection': {'max': 1000, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Current Age': {'max': 120, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Date of 1st Injection': {'required': False, 'type': 'datetime'}, 'Days Since Last Injection': {'max': 365, 'min': 0, 'required': False, 'type': <class 'float'>}, 'Deceased': {'max': 1, 'min': 0, 'required': False, 'type': <class 'int'>}, 'Injection Date': {'required': True, 'type': 'datetime'}, 'UUID': {'required': True, 'type': <class 'str'>}, 'VA Letter Score at Injection': {'max': 100, 'min': 0, 'required': True, 'type': <class 'float'>}}
- __init__(data_path, output_dir=None)[source]
Initialize analyzer with data path and output directory.
- Parameters:
data_path (str) – Path to CSV file containing Eylea treatment records. Expected columns: - Patient identifiers (UUID or similar) - Injection dates - Visual acuity measurements - Other treatment parameters
output_dir (str, optional) – Directory to save analysis outputs. If None, creates ‘output’ directory in current working directory.
Notes
The analyzer is initialized but no data is loaded until load_data() is called. For a complete analysis pipeline, use run_analysis() which handles all steps.
- load_data()[source]
Load, validate and clean Eylea treatment data.
- Returns:
The loaded, validated and cleaned data
- Return type:
pandas.DataFrame
- Raises:
ValueError – If required columns are missing or data validation fails
IOError – If the data file cannot be read
Notes
Processing steps: 1. Load CSV file from data_path 2. Map column names to standardized format 3. Validate data types and ranges 4. Clean missing values and outliers 5. Generate comprehensive data quality report
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> data = analyzer.load_data() >>> print(f"Loaded {len(data)} records")
- map_column_names()[source]
Map variant column names to standardized names.
- Returns:
Modifies the data attribute in-place with standardized column names
- Return type:
None
Notes
Uses both exact and fuzzy matching to handle common column name variations. Mappings are defined in the COLUMN_MAPPINGS class attribute.
The mapping process: 1. Attempts exact matches for each standard column name 2. Falls back to fuzzy matching (case/space insensitive) 3. Preserves unmapped columns unchanged
Results are stored in: - column_mapping_used attribute - data_quality_report[‘column_mapping’]
- validate_data_structure()[source]
Validate data structure, types and integrity.
- Returns:
Modifies data in-place with validated/converted values
- Return type:
None
- Raises:
ValueError – If required columns are missing or critical validation fails
Notes
Validation checks: 1. Required columns (per DATA_VALIDATION) 2. Date format conversion 3. Numeric value ranges 4. Temporal sequence integrity 5. Deceased status consistency 6. Duplicate records
Stores results in data_quality_report including: - validation_errors: Critical issues - validation_warnings: Non-critical issues - temporal_anomalies: Sequence errors - outliers: Values outside expected ranges
- clean_data()[source]
Clean and preprocess the Eylea treatment data.
- Returns:
Modifies the data attribute in-place with cleaned/preprocessed values
- Return type:
None
Notes
Performs the following cleaning operations: 1. Handles missing values in critical fields 2. Cleans Visual Acuity measurements (clipping, outlier detection) 3. Handles temporal anomalies (out-of-sequence dates, long gaps) 4. Creates unique patient and eye identifiers 5. Calculates derived fields (adjusted age, days since last injection)
Results are tracked in: - data_quality_report[‘missing_values’] - data_quality_report[‘outliers’] - data_quality_report[‘temporal_anomalies’]
- handle_missing_values()[source]
Handle missing values in the dataset.
- Returns:
Modifies data in-place with imputed values where appropriate
- Return type:
None
Notes
Missing value handling strategies: 1. Baseline VA: Uses first available VA measurement if missing 2. Age data: Different handling for deceased vs living patients 3. Current age: Adds 0.5 years to account for temporal alignment 4. Injection intervals: Calculates from dates if missing
Tracks missing values in: - data_quality_report[‘missing_values_before’] - data_quality_report[‘missing_values_after’]
- clean_va_measurements()[source]
Clean and validate Visual Acuity measurements.
- Returns:
Modifies VA measurements in-place with cleaned values
- Return type:
None
Notes
Cleaning steps: 1. Clips VA values to valid range [0, 100] 2. Identifies implausible changes (>30 letters between consecutive measurements)
Tracks cleaning results in: - data_quality_report[‘va_outliers_before’] - data_quality_report[‘va_implausible_changes’]
Saves details of implausible changes to: - output/implausible_va_changes.csv
- handle_temporal_anomalies()[source]
Handle temporal anomalies in the data.
- Returns:
Modifies data in-place with corrected temporal sequences
- Return type:
None
Notes
Handles these temporal anomalies: 1. Out-of-sequence injection dates (fixes by sorting) 2. Long treatment gaps (>180 days) 3. Single injection patients
Tracks anomalies in: - data_quality_report[‘single_injection_patients’] - data_quality_report[‘sequence_fixes’] - data_quality_report[‘long_treatment_gaps’]
Saves details of sequence fixes to: - output/sequence_fixes.csv
- create_patient_id()[source]
Create unique patient and eye identifiers.
- Returns:
Modifies data in-place by adding: - patient_id - eye_key - eye_standardized
- Return type:
None
Notes
Identifier creation logic: 1. Uses existing UUID if available 2. Creates composite ID from available fields if UUID missing 3. Creates eye-specific key (patient_id + eye) 4. Standardizes eye values (uppercase, no spaces)
Finally sorts data by eye_key and injection date.
- generate_data_quality_report()[source]
Generate a comprehensive data quality report.
This method calculates various data quality metrics and saves them to a text file in the output directory.
- Returns:
The data quality report as a dictionary
- Return type:
dict
Notes
The report includes: 1. Summary metrics (rows, columns, missing data percentage) 2. Column mapping information 3. Validation errors and warnings 4. Missing values by column 5. Age data processing details 6. Temporal anomalies 7. VA measurement anomalies
The report is saved to ‘data_quality_report.txt’ in the output directory.
- analyze_patient_cohort()[source]
Analyze patient cohort demographics and treatment characteristics.
- Returns:
DataFrame with one row per patient containing: - Demographics (age, gender) - Eye information - Baseline measurements (VA, CRT) - Treatment information (injection count, dates) - Mortality information (deceased status, age at death)
- Return type:
pandas.DataFrame
Notes
Key processing steps: 1. Groups data by patient_id 2. Extracts first row for each patient to get baseline characteristics 3. Calculates treatment duration from first to last injection 4. Handles missing values in baseline measurements
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> patient_data = analyzer.analyze_patient_cohort() >>> print(patient_data[['patient_id', 'injection_count']].head())
- analyze_injection_intervals()[source]
Analyze time intervals between consecutive injections by eye.
- Returns:
DataFrame with interval information containing: - Patient and eye identifiers - Injection sequence numbers - Dates of consecutive injections - Interval in days between injections - VA measurements at each injection - Flags for long (>180d) and very long (>365d) gaps
- Return type:
pandas.DataFrame
Notes
Processing steps: 1. Groups data by eye_key (patient + eye) 2. Sorts injections by date 3. Calculates days between consecutive injections 4. Flags clinically significant gaps 5. Tracks VA changes between injections
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> intervals = analyzer.analyze_injection_intervals() >>> print(intervals[['eye_key', 'interval_days']].describe())
- analyze_va_trajectories()[source]
Analyze visual acuity trajectories over time by eye.
- Returns:
DataFrame with VA trajectory information containing: - Patient and eye identifiers - Injection sequence numbers - Days from first injection - VA score at each injection - Baseline VA - VA change from baseline
- Return type:
pandas.DataFrame
Notes
Processing steps: 1. Groups data by eye_key (patient + eye) 2. Uses first available VA as baseline if missing 3. Calculates days from first injection 4. Computes VA change from baseline 5. Applies smoothing for population average
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> va_traj = analyzer.analyze_va_trajectories() >>> print(va_traj[['eye_key', 'va_change']].describe())
- plot_injection_intervals()[source]
Plot distribution of injection intervals and intervals by sequence.
- Returns:
Saves two plots to output directory: 1. ‘injection_intervals.png’ - Histogram of intervals with reference lines 2. ‘injection_intervals_by_sequence.png’ - Mean/median intervals by sequence
- Return type:
None
Notes
- Plot 1 (Histogram):
Shows distribution of all injection intervals
- Includes reference lines at:
28 days (monthly)
56 days (bi-monthly)
84 days (quarterly)
Plot 2 (Sequence): - Shows mean ± SD and median intervals by injection number - Helps identify interval patterns over treatment course
Automatically calls analyze_injection_intervals() if needed.
- plot_va_trajectories()[source]
Plot visual acuity trajectories over time and by injection number.
- Returns:
Saves two plots to output directory: 1. ‘va_trajectories.png’ - Individual trajectories + population average 2. ‘va_by_injection_number.png’ - Mean VA by injection number
- Return type:
None
Notes
Plot 1 (Trajectories): - Shows VA over time for sample of 20 eyes - Includes LOESS-smoothed population average line - Falls back to simple average if statsmodels not available
Plot 2 (Injection Number): - Shows mean ± SD VA by injection sequence - Includes sample size annotations - Helps identify VA patterns over treatment course
Automatically calls analyze_va_trajectories() if needed.
- plot_va_change_distribution()[source]
Plot distribution of VA changes from baseline and outcome categories.
- Returns:
Saves two plots to output directory: 1. ‘va_change_distribution.png’ - Histogram of VA changes 2. ‘va_outcome_categories.png’ - Categorical outcomes
- Return type:
None
Notes
- Plot 1 (Histogram):
Shows distribution of final VA changes from baseline
- Includes reference lines at:
0 (no change)
±5 letters (gain/loss)
±15 letters (significant gain/loss)
- Plot 2 (Categories):
Groups outcomes into clinically relevant categories
Shows counts and percentages for each category
- Categories:
≥15 letter gain
5-14 letter gain
Stable (-4 to +4)
5-14 letter loss
≥15 letter loss
Unknown
Automatically calls analyze_va_trajectories() if needed.
- analyze_treatment_courses()[source]
Analyze treatment courses by identifying potential breaks.
- Returns:
DataFrame with treatment course information containing: - Patient and eye identifiers - Course start/end dates - Duration in days - Injection count - Flags for long pauses (>365d) - Potential separate courses
- Return type:
pandas.DataFrame
Notes
Key processing steps: 1. Groups data by eye_key (patient + eye) 2. Identifies very long gaps (>365d) as potential course breaks 3. Calculates duration from first to last injection 4. Tracks injection counts per course
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> courses = analyzer.analyze_treatment_courses() >>> print(courses[['eye_key', 'duration_days']].describe())
- plot_treatment_courses()[source]
Plot treatment course durations and injection counts per course.
- Returns:
Saves two plots to output directory: 1. ‘treatment_course_durations.png’ - Histogram of durations 2. ‘injections_per_course.png’ - Histogram of injection counts
- Return type:
None
Notes
Plot 1 (Durations): - Shows distribution of treatment course durations in days - Helps identify typical treatment persistence patterns
Plot 2 (Injections): - Shows distribution of injection counts per course - Uses discrete bins (1-20 injections) - Helps identify typical treatment intensity
Automatically calls analyze_treatment_courses() if needed.
- export_interval_va_data(format='csv', db_path=None)[source]
Export interval and VA data to CSV and/or SQLite format.
- Parameters:
format (str, optional) – Output format (‘csv’, ‘sqlite’, or ‘both’). Default ‘csv’.
db_path (str, optional) – Custom path for SQLite database. Default uses ‘eylea_intervals.db’ in output directory.
- Returns:
Dictionary containing paths to exported files with keys: - ‘csv’: Path to detailed CSV file - ‘summary_csv’: Path to summary CSV file - ‘sqlite’: Path to SQLite database (if exported)
- Return type:
dict
Notes
Exports two data types: 1. Detailed data (per-injection intervals and VA measurements) 2. Summary data (per-patient interval lists and VA changes)
CSV outputs: - ‘interval_va_data.csv’: Detailed injection-level data - ‘interval_va_summary.csv’: Patient-level summary
SQLite outputs: - ‘interval_va_data’ table: Detailed data - ‘interval_summary’ table: Summary data
Automatically calls analyze_injection_intervals() if needed.
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> paths = analyzer.export_interval_va_data(format='both') >>> print(paths['csv']) # Prints path to detailed CSV
- run_analysis()[source]
Execute complete analysis pipeline from data loading to export.
- Returns:
Dictionary with analysis summary containing: - patient_count: Number of unique patients - eye_count: Number of treated eyes - injection_count: Total injections analyzed - course_count: Number of treatment courses - mean_injection_interval: Average interval between injections - median_injection_interval: Median interval between injections - output_dir: Path to output directory - data_quality_report: Summary of data quality metrics - export_paths: Paths to exported files
- Return type:
dict
Notes
Analysis steps: 1. Data loading and cleaning 2. Patient cohort analysis 3. Injection interval analysis 4. VA trajectory analysis 5. Treatment course analysis 6. Visualization generation 7. Data export
Examples
>>> analyzer = EyleaDataAnalyzer('data.csv') >>> results = analyzer.run_analysis() >>> print(f"Analyzed {results['patient_count']} patients")
- analysis.eylea_data_analysis.main()[source]
Command line interface for running Eylea data analysis.
- Returns:
Prints analysis summary to stdout
- Return type:
None
Notes
Command line arguments: –data : Path to input CSV file (default: ‘input_data/sample_raw.csv’) –output : Output directory (default: ‘output’) –debug : Enable debug logging –validation-strictness : Set validation level (‘strict’, ‘moderate’, ‘lenient’)
Example
python eylea_data_analysis.py –data treatment_data.csv –output results
Eylea Injection Intervals Analysis
This script analyzes the injection intervals data from the SQLite database to identify patterns in treatment, specifically looking for two groups: - Group LH: 7 injections in first year, then continuing with injections every ~2 months - Group MR: 7 injections in first year, then a pause before resumption of treatment
The script also performs Principal Component Analysis (PCA) to identify patterns in treatment intervals and visual acuity measures (previous VA, current VA, next VA).
- analysis.eylea_intervals_analysis.connect_to_db()[source]
Connect to the SQLite database.
- Return type:
Connection
- analysis.eylea_intervals_analysis.load_interval_data()[source]
Load the interval_va_data table into a Polars DataFrame.
- Return type:
DataFrame
- analysis.eylea_intervals_analysis.load_interval_summary()[source]
Load the interval_summary table into a Polars DataFrame.
- Return type:
DataFrame
- analysis.eylea_intervals_analysis.analyze_first_year_injections(df)[source]
Analyze the first year of injections for each patient.
- Return type:
DataFrame- Parameters:
df (DataFrame)
- Args:
df: DataFrame with interval_va_data
- Returns:
DataFrame with first year injection analysis
- analysis.eylea_intervals_analysis.identify_treatment_groups(df)[source]
Identify the two treatment groups: - Group LH: 7 injections in first year, then continuing with injections every ~2 months - Group MR: 7 injections in first year, then a pause before resumption of treatment
- Return type:
DataFrame- Parameters:
df (DataFrame)
- Args:
df: DataFrame with first year injection analysis
- Returns:
DataFrame with group assignments
- analysis.eylea_intervals_analysis.cluster_treatment_patterns(df)[source]
Use K-means clustering to identify treatment pattern groups.
- Return type:
DataFrame- Parameters:
df (DataFrame)
- Args:
df: DataFrame with first year injection analysis
- Returns:
DataFrame with cluster assignments
- analysis.eylea_intervals_analysis.analyze_intervals_by_group(df, interval_data)[source]
Analyze and visualize injection intervals by treatment group.
- Return type:
None- Parameters:
df (DataFrame)
interval_data (DataFrame)
- Args:
df: DataFrame with group assignments interval_data: Raw interval data
- analysis.eylea_intervals_analysis.prepare_va_interval_data_for_pca(interval_data)[source]
Prepare data for PCA analysis by calculating next VA for each record.
This function processes the interval data to create a dataset with: - treatment_interval (interval_days) - previous_va (prev_va) - current_va - next_va (calculated by joining with next record)
- Return type:
DataFrame- Parameters:
interval_data (DataFrame)
- Args:
interval_data: Raw interval data from the database
- Returns:
DataFrame with prepared features for PCA analysis
- analysis.eylea_intervals_analysis.perform_va_interval_pca(interval_data)[source]
Perform PCA analysis on treatment intervals and visual acuity measures.
This function identifies patterns in: - Treatment interval - Previous VA - Current VA - Next VA
- Return type:
None- Parameters:
interval_data (DataFrame)
- Args:
interval_data: Raw interval data from the database
- analysis.eylea_intervals_analysis.analyze_va_by_group(df, interval_data)[source]
Analyze and visualize visual acuity by treatment group.
- Return type:
None- Parameters:
df (DataFrame)
interval_data (DataFrame)
- Args:
df: DataFrame with group assignments interval_data: Raw interval data
Visualize Visual Acuity Trajectories by PCA Cluster.
This module creates visualizations of visual acuity (VA) trajectories over time for patients grouped by clusters identified through PCA analysis.
The visualizations include: - Individual VA trajectories for sampled patients from each cluster - Average VA trajectories with standard deviation bands - VA change from baseline plots
Notes
The analysis expects: 1. A SQLite database with interval VA data 2. A CSV file with cluster assignments from PCA analysis 3. Output directory for saving plots
Examples
>>> python visualize_va_by_pca_cluster.py
Generates plots in output/analysis_results directory:
- va_trajectories_by_pca_cluster.png
- va_change_by_pca_cluster.png
- analysis.visualize_va_by_pca_cluster.connect_to_db()[source]
Establish connection to the SQLite database containing VA data.
- Returns:
Active database connection object
- Return type:
sqlite3.Connection
Notes
The database path is defined by the DB_PATH constant. Connection should be closed by the caller when done.
- analysis.visualize_va_by_pca_cluster.load_interval_data()[source]
Load interval VA data from SQLite database into Polars DataFrame.
- Returns:
DataFrame containing: - uuid: Patient identifier - eye: Eye (left/right) - previous_date: Previous visit date - current_date: Current visit date - prev_va: Previous visual acuity - current_va: Current visual acuity - interval_days: Days between visits
- Return type:
pl.DataFrame
Notes
Automatically converts date strings to datetime objects
Closes database connection when done
- analysis.visualize_va_by_pca_cluster.load_cluster_assignments()[source]
Load patient cluster assignments from PCA analysis results.
- Returns:
DataFrame containing: - uuid: Patient identifier - eye: Eye (left/right) - cluster: PCA cluster assignment (0-3)
- Return type:
pl.DataFrame
Notes
Reads from output/analysis_results/va_interval_clusters_4.csv
Cluster meanings: 0: Moderate VA, Moderate Interval 1: Low VA, Moderate Interval 2: High VA, Short Interval 3: Long Gap Patients
- analysis.visualize_va_by_pca_cluster.visualize_va_trajectories_by_cluster(interval_data, cluster_df)[source]
Generate visualizations of VA trajectories grouped by PCA clusters.
Creates two plots: 1. Individual VA trajectories with cluster averages 2. VA change from baseline with standard deviation bands
- Parameters:
interval_data (pl.DataFrame) – DataFrame containing interval VA data with columns: - uuid: Patient identifier - eye: Eye (left/right) - previous_date: Previous visit date - current_date: Current visit date - prev_va: Previous visual acuity - current_va: Current visual acuity - interval_days: Days between visits
cluster_df (pl.DataFrame) – DataFrame containing cluster assignments with columns: - uuid: Patient identifier - eye: Eye (left/right) - cluster: PCA cluster assignment (0-3)
- Returns:
Saves plots to output/analysis_results directory: - va_trajectories_by_pca_cluster.png - va_change_by_pca_cluster.png
- Return type:
None
Notes
Samples up to 10 patients per cluster for individual trajectories
Uses 60-day bins for calculating averages
Limits visualization to first 1000 days for clarity
Visualize Long Gap Patients (Cluster 4)
This script creates visualizations specifically focused on the long-gap patients (Cluster 4) identified in the PCA analysis, to better illustrate what happens to visual acuity before, during, and after the long treatment gaps.
- analysis.visualize_long_gap_patients.connect_to_db()[source]
Connect to the SQLite database.
- Return type:
Connection
- analysis.visualize_long_gap_patients.load_interval_data()[source]
Load the interval_va_data table into a Polars DataFrame.
- Return type:
DataFrame
- analysis.visualize_long_gap_patients.load_cluster_assignments()[source]
Load the cluster assignments from the PCA analysis.
- Return type:
DataFrame
- analysis.visualize_long_gap_patients.get_long_gap_patients(interval_data, cluster_df)[source]
Identify and prepare data for long-gap patients (Cluster 4).
- Return type:
DataFrame- Parameters:
interval_data (DataFrame)
cluster_df (DataFrame)
- Args:
interval_data: Raw interval data cluster_df: DataFrame with cluster assignments
- Returns:
DataFrame with long-gap patient data
- analysis.visualize_long_gap_patients.visualize_long_gap_patients(long_gap_data)[source]
Create visualizations focused on long-gap patients.
- Return type:
None- Parameters:
long_gap_data (DataFrame)
- Args:
long_gap_data: DataFrame with long-gap patient data
Visualize VA Change by Interval with PCA Cluster Information
This script creates an enhanced visualization of VA change by interval length, using different marker shapes to indicate which PCA cluster each data point belongs to.
- analysis.visualize_va_change_by_cluster.connect_to_db()[source]
Connect to the SQLite database.
- Return type:
Connection
- analysis.visualize_va_change_by_cluster.load_interval_data()[source]
Load the interval_va_data table into a Polars DataFrame.
- Return type:
DataFrame
- analysis.visualize_va_change_by_cluster.load_cluster_assignments()[source]
Load the cluster assignments from the PCA analysis.
- Return type:
DataFrame
- analysis.visualize_va_change_by_cluster.visualize_va_change_by_cluster(interval_data, cluster_df)[source]
Create an enhanced visualization of VA change by interval with cluster information.
- Return type:
None- Parameters:
interval_data (DataFrame)
cluster_df (DataFrame)
- Args:
interval_data: Raw interval data cluster_df: DataFrame with cluster assignments