A comprehensive collection of machine learning algorithms implemented from scratch in Python, along with educational Jupyter notebooks demonstrating core ML concepts.
- Overview
- Features
- Repository Structure
- Installation
- Usage
- Algorithms Implemented
- Jupyter Notebooks
- Results
- Contributing
- License
This repository contains educational implementations of fundamental machine learning algorithms built from scratch using Python and NumPy. The goal is to provide clear, well-documented code that helps understand the mathematical foundations and inner workings of these algorithms.
- Pure Python implementations - No ML libraries used for algorithm core logic
- Educational focus - Clear code with detailed comments
- Comprehensive collection - 8+ algorithms covering classification and regression
- Interactive notebooks - Jupyter notebooks with step-by-step explanations
- Visualization - Plots and graphs showing algorithm behavior
- Ready-to-run examples - Each algorithm includes working test cases
machine-learning/
├── ml-algorithms-scratch/ # Core algorithm implementations
│ ├── adaboost.py # AdaBoost ensemble method
│ ├── decision_tree.py # Decision Tree classifier
│ ├── knn.py # K-Nearest Neighbors
│ ├── logistic_regression.py # Logistic Regression
│ ├── naive_bayes.py # Naive Bayes classifier
│ ├── perceptron.py # Single-layer Perceptron
│ ├── random_forest.py # Random Forest ensemble
│ └── svm.py # Support Vector Machine
├── Gradient_Descent.ipynb # Gradient descent optimization
├── Linear_Regression_in_One_Variable.ipynb # Linear regression tutorial
├── Logistic_Regression.ipynb # Logistic regression from scratch
├── Polynomial_Regression.ipynb # Polynomial feature engineering
├── results/ # Output visualizations
│ ├── knn.png # KNN classification results
│ └── svm.png # SVM decision boundary
├── requirements.txt # Project dependencies
└── README.md # This file
- Python 3.7 or higher
- pip package manager
-
Clone the repository
git clone https://github.com/abdulhakkeempa/machine-learning.git cd machine-learning -
Install dependencies
pip install -r requirements.txt
Or install manually:
pip install numpy matplotlib scikit-learn pandas
Each algorithm file can be executed directly to see a demonstration:
cd ml-algorithms-scratch
# Run Logistic Regression example
python logistic_regression.py
# Run Support Vector Machine example
python svm.py
# Run K-Nearest Neighbors example
python knn.py# Example: Using the custom Logistic Regression
import sys
sys.path.append('ml-algorithms-scratch')
from logistic_regression import LogisticRegression
import numpy as np
# Create sample data
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([0, 0, 0, 1])
# Train the model
lr = LogisticRegression(alpha=1, epochs=10)
lr.fit(X, y)
# Make predictions
predictions = lr.predict(X)
print(f"Predictions: {predictions}")Launch Jupyter to explore the educational notebooks:
jupyter notebookThen open any of the .ipynb files to see detailed explanations and visualizations.
| Algorithm | File | Description |
|---|---|---|
| Logistic Regression | logistic_regression.py |
Binary classification using sigmoid function |
| Support Vector Machine | svm.py |
Maximum margin classifier with regularization |
| K-Nearest Neighbors | knn.py |
Instance-based learning algorithm |
| Decision Tree | decision_tree.py |
Tree-based classifier using information gain |
| Random Forest | random_forest.py |
Ensemble of decision trees |
| Naive Bayes | naive_bayes.py |
Probabilistic classifier using Bayes' theorem |
| Perceptron | perceptron.py |
Single-layer neural network |
| AdaBoost | adaboost.py |
Adaptive boosting ensemble method |
| Algorithm | Notebook | Description |
|---|---|---|
| Linear Regression | Linear_Regression_in_One_Variable.ipynb |
Simple linear regression implementation |
| Polynomial Regression | Polynomial_Regression.ipynb |
Feature engineering with polynomial terms |
- Logistic Regression: Gradient descent optimization, sigmoid activation
- SVM: Hinge loss, L2 regularization, decision boundary visualization
- KNN: Euclidean distance, majority voting, configurable k value
- Decision Tree: Information gain splitting, configurable max depth
- Random Forest: Bootstrap aggregating, feature randomness
- Naive Bayes: Gaussian distribution assumption, Laplace smoothing
- Perceptron: Binary classification, linear activation
- AdaBoost: Weak learner combination, adaptive weights
Interactive notebooks with detailed explanations:
- Gradient_Descent.ipynb - Understanding optimization fundamentals
- Linear_Regression_in_One_Variable.ipynb - Simple linear regression walkthrough
- Logistic_Regression.ipynb - Binary classification from first principles
- Polynomial_Regression.ipynb - Feature engineering and overfitting
Each notebook includes:
- Mathematical foundations
- Step-by-step implementation
- Visualizations and plots
- Real-world examples
The results/ folder contains visualizations generated by the algorithms:
- knn.png - K-Nearest Neighbors classification boundaries
- svm.png - Support Vector Machine decision boundaries and support vectors
Example outputs show:
- Decision boundaries for classification problems
- Learning curves and convergence behavior
- Algorithm performance on test datasets
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-algorithm) - Add your algorithm with proper documentation
- Include test cases and examples
- Add visualizations if applicable
- Submit a pull request
- Follow existing code style and structure
- Include docstrings and comments
- Add test cases in the
if __name__ == "__main__":block - Update README if adding new algorithms
This project is open source and available under the MIT License.
Abdul Hakkeem PA
- GitHub: @abdulhakkeempa
This repository is designed for educational purposes to help students and practitioners understand machine learning algorithms from the ground up. The implementations prioritize clarity and understanding over performance optimization.
⭐ Star this repository if you find it helpful!