This project implements a Random Forest Classifier to predict whether an individual is anemic or non-anemic based on their blood-related attributes. The project includes:
- A Flask-based web application to interact with the model
- Pre-trained model for predictions
- A detailed implementation of training, class balancing, and feature importance analysis
- NEW FEATURE: Upload a blood report image to predict anemia using Google Gemini 2.0 Flash for text extraction
- Random Forest Classifier for high accuracy predictions
- SMOTE (Synthetic Minority Oversampling Technique) for handling class imbalance
- Feature Importance Analysis to identify the most significant contributors to predictions
- Simple frontend interface for user input
- NEW: Image-Based Prediction โ Upload a blood test report image to automatically extract relevant values and get predictions
Model Comparison Analysis
| Algorithm | Accuracy | AUC |
|---|---|---|
| Random Forest | 99% | 99% |
| Logistic Regression | 98% | 98% |
| SVM | 90% | 90% |
| KNN | 87% | 87% |
Random Forest Classifier demonstrates superior performance across both metrics.
The system follows a structured pipeline from dataset handling to model predictions.
- ๐ Hemoglobin Levels
- ๐ฌ Mean Corpuscular Volume (MCV)
- ๐งช Mean Corpuscular Hemoglobin (MCH)
- ๐ Mean Corpuscular Hemoglobin Concentration (MCHC)
- ๐ค Gender
Achieved perfect balance: 801 samples each for anemic and non-anemic classes using SMOTE
Key Contributors:
- ๐ฅ Hemoglobin: 87.0% contribution
- ๐ฅ Gender: 9.1% contribution
- ๐งฌ MCH: 2.7% contribution
- ๐ Others: 1.2% contribution
Directory structure:
โโโ yogeshwaran10-anemia_detection/
โโโ README.md
โโโ Procfile
โโโ app.py
โโโ process_image.py
โโโ requirements.txt
โโโ runtime.txt
โโโ utils.py
โโโ images/
โโโ model/
โ โโโ random_forest_classifier.pkl
โโโ static/
โ โโโ style.css
โโโ templates/
โโโ index.htmlgit clone <repository_url>
cd anemia-detection-using-machine-learningpip install -r requirements.txtpython app.pyThen open the app in your browser at http://127.0.0.1:5000/
- โ High Accuracy: Achieved through class balancing and Random Forest optimization
- ๐ฏ Precise Predictions: Driven by significant features like hemoglobin levels
- ๐ธ New Image Upload Feature: Extracts blood test attributes automatically for prediction
- ๐ Expand dataset to include more diverse features
- ๐ Implement advanced models like XGBoost or LightGBM
- โ๏ธ Address Feature scaling โ๏ธ importance to other features
The outcome variable in the dataset indicates the final diagnosis or classification for each patient. The outcome is binary, with two possible values:
- Not Anemic: The patient is not anemic, based on clinical criteria and test results.
- Anemic: The patient is anemic, suggesting a deficiency of red blood cells or hemoglobin in the blood.
Your contributions are welcome! Feel free to:
- ๐ Report bugs
- ๐ก Suggest features
- ๐ง Submit pull requests
This project is licensed under the MIT License. See the LICENSE file for details.



