This project predicts whether a bank customer will churn (leave) or stay using cutting-edge machine learning techniques. It includes comprehensive data preprocessing, advanced model training, and a sleek Streamlit web application for real-time predictions.
- Overview
- Project Structure
- Setup and Installation
- How to Run
- Features
- Results
- Technologies Used
- Contributing
Customer churn prediction is crucial for financial institutions to enhance customer retention and profitability. This project leverages a robust machine learning pipeline featuring XGBoost, LightGBM, and Neural Networks for high-accuracy predictions. It also provides an intuitive Streamlit web app for live customer churn predictions.

Prerequisites:
Python 3.8 or higher
Jupyter Notebook
pip or conda for package management
Install Dependencies: pip install -r requirements.txt
✅ Run the Notebook for Analysis & Model Training
Navigate to Bank_Churn_Prediction_Analysis.ipynb and run all cells for EDA, feature engineering, and model training.
Ensure the trained model (final_churn_model.pkl) is in the models/ directory.
Launch the app:
streamlit run app/streamlit_app.py
Enter customer details in the web form and click Predict for instant feedback:
🚨 Exit: Customer likely to churn.
✅ No Exit: Customer likely to stay.
Data preprocessing: Handling missing values, outlier removal, and feature scaling.
Feature engineering with PCA for dimensionality reduction.
Multiple ML models tested:
XGBoost
LightGBM
Neural Networks (Keras)
RF selected as the best model with 87% accuracy.
Developed using Streamlit for real-time, web-based predictions.
Clean UI with dynamic feedback:
🟥 Exit (Red): High churn risk
🟩 No Exit (Green): Low churn risk
📈 Results
F1 Score: 0.86
AUC-ROC Score: 0.91
Real-time prediction capability via Streamlit.
Languages & Frameworks: Python (Pandas, NumPy, Scikit-learn, XGBoost, LightGBM, Keras)
Visualization: Matplotlib, Seaborn, Plotly
Deployment: Streamlit
Model Serialization: joblib
Other Tools: Git, Jupyter Notebook
Contributions are welcome! Feel free to fork the repository and submit pull requests.
Let me know if you’d like Python code snippets for specific sections (e.g., XGBoost training, Streamlit app) or want to explore deployment options like Heroku or AWS EC2. 🚀💡
I'm excited to present my latest addition to the Machine Learning Series: Bank Customer Churn Prediction 🏦📊. This project was a remarkable experience, exploring data preprocessing, feature engineering, and advanced machine learning methods to address a crucial business challenge—predicting whether a customer will stay with the bank or churn.
📌 Dataset:
The dataset comprised customer demographics, account information, and behavioral attributes.
Dropped non-informative columns like RowNumber, CustomerId, and Surname that had no predictive significance.
Performed one-hot encoding on categorical features (Geography and Gender), ensuring the dummy variable trap was avoided.
Scaled numerical features such as CreditScore, Balance, and EstimatedSalary for consistency across the dataset.
Identified data imbalance in churn distribution, with fewer customers leaving compared to those staying (SMOTE was used to balance the dataset).
Key predictors included Age, Tenure, and Balance, as identified through a correlation matrix analysis.
Trained and evaluated multiple classification models:
Logistic Regression
Random Forest
Gradient Boosting
🏆 Random Forest delivered the best performance with 87% accuracy, balancing precision and recall effectively.
Built an interactive GUI using Tkinter for user-friendly predictions.
Users can input customer details and receive instant predictions on churn risk.
Visual feedback integrated with clear labels:
🔴 Exit: Customer likely to churn.
🟢 Stay: Customer likely to remain.
Proactively identifies customers at risk of leaving.
Supports targeted retention strategies, potentially saving millions in acquisition costs.
📊 Older customers and those with high balances but low engagement are more likely to churn.
📊 Geography and gender significantly influence churn probability, emphasizing the need for localized strategies.
This project sharpened my technical skills and enhanced my understanding of customer behavior in real-world scenarios. I gained practical experience in:
Managing imbalanced datasets
Applying feature scaling techniques
Developing user-friendly ML applications


