This project performs customer segmentation for a mall using K-Means clustering, an unsupervised machine learning algorithm.
The goal is to group customers based on Annual Income and Spending Score to help businesses design targeted marketing strategies.
- Source: Mall Customer Dataset
- Features:
- CustomerID
- Gender
- Age
- Annual Income (k$)
- Spending Score (1-100)
- Target: None (unsupervised learning problem)
-
Data Loading & Inspection
- Loaded CSV into a pandas DataFrame.
- Checked for missing values and data types.
-
Feature Selection
- Chose
Annual Income (k$)andSpending Score (1-100)for clustering.
- Chose
-
Feature Scaling
- Standardized features using
StandardScalerto improve clustering performance.
- Standardized features using
-
Finding Optimal Clusters
- Used the Elbow Method to determine the optimal number of clusters based on WCSS (Within-Cluster Sum of Squares).
-
K-Means Clustering
- Trained K-Means with the optimal number of clusters.
- Assigned cluster labels to each customer.
-
Visualization
- Visualized clusters with different colors.
- Marked centroids for clarity.
-
Insights
- Summarized characteristics of each cluster (average income, spending score, etc.).
- Helps understand customer behavior for targeted marketing.
pandas→ Data handlingnumpy→ Numerical operationsmatplotlib&seaborn→ Visualizationsklearn→ K-Means clustering and scaling
- Clone the repository:
git clone <your-repo-url>