A machine learning system for detecting COVID-19 in chest X-ray images using ensemble neural networks with uncertainty-based weighting. This implementation follows the methodology from the Scientific Reports paper "Generalizable disease detection using model ensemble on chest X-ray images" (2024).
Implements the robust ensemble approach combining three pre-trained CNN architectures:
- ResNet50: 50-layer residual network for robust feature extraction
- DenseNet121: 121-layer densely connected network for detailed pattern recognition
- Inception-ResNet-v2: Hybrid architecture with inception modules and residual connections
The system uses novel uncertainty-based ensemble weighting that reduces the influence of uncertain predictions while leveraging model confidence for improved COVID-19 detection across different chest X-ray datasets.
- Multi-Model Ensemble: Three complementary CNN architectures with transfer learning
- Uncertainty Quantification: Entropy-based weighting for robust predictions
- Evaluation: COVID-19 detection metrics, statistical testing, and visualization
- Statistical Analysis: McNemar's test, bootstrap testing, confidence intervals
- Plots: ROC curves, confusion matrices, performance comparisons
- Modular Architecture: Clean separation of data processing, models, and evaluation
- Config Management: YAML-based parameter management
src/
├── data/ # Data preprocessing and loading pipelines
├── models/ # Individual CNN model implementations
├── ensemble/ # Entropy-based ensemble methods
├── evaluation/ # Evaluation framework
└── utils/ # Configuration and utility functions
data/raw/ # Original datasets (COVIDx CXR-2)
saved_models/ # Trained model checkpoints and weights
configs/ # YAML configuration files
tests/ # Test suite (90%+ coverage)
scripts/ # Training and evaluation scripts
# Clone the repository
git clone git@github.com:Sunsvea/Mira_Covid.git
cd Mira_Covid
# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtThe system uses the COVIDx CXR-2 dataset from Kaggle. We provide a comprehensive data preparation script to automatically download, extract, and organize the dataset.
Dataset: COVIDx CXR-2
# Full automated setup (requires Kaggle API setup)
python scripts/prepare_covidx_data.py --download
# Or extract from the manually downloaded zip
python scripts/prepare_covidx_data.py --extract-only ~/Downloads/covidx-cxr2.zip# 1. Download from Kaggle manually:
https://www.kaggle.com/datasets/andyczhao/covidx-cxr2
# 2. Extract using our script
python scripts/prepare_covidx_data.py --extract-only path/to/covidx-cxr2.zip
# 3. Validate the prepared dataset
python scripts/prepare_covidx_data.py --validate-only# Install Kaggle CLI
pip install kaggle
# Setup API credentials (choose one):
# Method 1: Download kaggle.json from your Kaggle account settings
mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json
# Method 2: Set environment variables
export KAGGLE_USERNAME=your_username
export KAGGLE_KEY=your_api_keyThe script organizes data into the expected structure:
data/raw/
├── train/
│ └── [~55,000 chest X-ray images]
├── test/
│ └── [~10,000 chest X-ray images]
├── train.txt # Training labels (patient_id filename label source)
├── test.txt # Test labels
└── dataset_summary.json # Dataset statistics and preparation info
Dataset Information:
- Size: ~65,000 chest X-ray images (2.5GB download)
- Classes: COVID-19 positive/negative
- Sources: Multiple medical institutions for generalizability
- Format: PNG images with corresponding label files
- Preparation Time: ~5-10 minutes (depending on download speed)
# 1. Analyze your dataset (optional but recommended)
python scripts/train_models.py analyze-data
# 2. Train all individual models (ResNet50, DenseNet121, Inception-ResNet-v2)
python scripts/train_models.py train-individual
# 3. Create ensemble from trained models and evaluate performance
python scripts/train_models.py train-ensemble --checkpoint-dir saved_models# Train specific models only
python scripts/train_models.py train-individual --models resnet50 densenet121
# Custom save directory for models
python scripts/train_models.py train-individual --save-dir my_models
# Force CPU training (sometimes faster than MPS on M3)
python scripts/train_models.py train-individual --device cpu
# Compare different ensemble strategies
python scripts/train_models.py compare-strategies --checkpoint-dir saved_models
# Get help for any command
python scripts/train_models.py --help
python scripts/train_models.py train-individual --helpExpected Training Time:
- Apple M3 MacBook Air (24GB): 2-3+ hours per model (optimized for MPS)
- NVIDIA RTX 5090 (Vast.ai): 20-45+ minutes per model (see
configs/rtx5090.yaml) - NVIDIA RTX 4090: 45-90+ minutes per model
- CPU-only: 5-7+ hours per model
Hardware-Specific Configurations:
- Default config optimized for M3 MacBook (batch_size=32, no multiprocessing)
configs/rtx5090.yamlfor high-end GPU training on rented hardware from Vast.ai
# Evaluate individual trained models
python scripts/evaluate_models.py individual
# Evaluate ensemble and compare with individual models
python scripts/evaluate_models.py ensemble
# External validation (test generalization)
python scripts/evaluate_models.py external --external-data path/to/external/data
# Run tests to verify implementation
pytest tests/ -v --cov=src- Pre-trained ImageNet weights for feature extraction
- Frozen convolutional layers with custom classifier heads
- Global average pooling → FC layers (128→64→16) → Binary output
Weight(i) = exp(-Entropy(Model_i)) / Σ(exp(-Entropy(Model_k)))
Final_Output = Σ(Weight(i) × Prediction(i))
- Image standardization: 256×256×3 pixels
- Normalization: [0,1] pixel values
- Augmentation: rotation, flip, zoom, brightness/contrast
- Balanced train/validation/test splits with source separation
Based on reference paper methodology:
- Internal Validation: 95-97% accuracy
- External Validation: 75-85% accuracy (cross-institutional)
- Ensemble Improvement: 5-15% boost over individual models
- Key Strength: Superior generalization across different data sources
The project emphasizes comprehensive testing:
# Run all tests
pytest tests/
# Run with coverage
pytest --cov=src tests/
# Run specific test categories
pytest tests/test_models.py
pytest tests/test_ensemble.py
pytest tests/test_data.pyTest Coverage Requirements:
- Minimum 90% coverage for all modules
- 100% coverage for critical paths (training, ensemble weighting)
- Mocked external dependencies for fast, reliable testing
Key parameters in configs/default.yaml:
data:
image_size: [224, 224] # Standard ImageNet size
batch_size: 32 # Optimized for M3 MacBook performance
num_workers: 0 # No multiprocessing for MPS compatibility
models:
resnet50:
classifier_layers: [128, 64, 16]
dropout_rate: 0.3
freeze_backbone: true # Faster training with transfer learning
training:
epochs: 10 # Full training epochs
learning_rate: 0.002 # Higher LR for faster convergence
early_stopping:
enabled: true
patience: 3 # Early stopping for development- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Write tests
- Ensure 90%+ test coverage
- Run linting and formatting
- Commit changes
- Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
If you use this work in your research, please cite:
@article{abad2024generalizable,
title={Generalizable disease detection using model ensemble on chest X-ray images},
author={Abad, Zelalem Dagne and others},
journal={Scientific Reports},
year={2024}
}Dean Coulstock
- 📍 Dublin, Ireland
- 💼 DeanJCoulstock@gmail.com
- 📧 Chickenstock02@gmail.com (personal)
- Original research paper authors for the methodological foundation
- Medical imaging community for dataset contributions
- Open-source ML community for the underlying frameworks