A machine learning project that uses historical tennis match data to build ELO rating systems and predict match outcomes.
This project analyzes professional tennis matches to:
- Calculate standard and weighted ELO ratings for players
- Visualize player performance over time
- Predict match outcomes using neural networks
- Compare different rating systems and their predictive power
The tennis match data used in this project is sourced from Tennis-Data.co.uk, which provides detailed historical match statistics for ATP and WTA tournaments. The dataset includes match results, player rankings, and score information dating back multiple years.
- Standard ELO: Traditional implementation of the ELO rating system
- Weighted ELO: Advanced system that factors in match dominance based on set scores
- Historical player performance tracking
- Interactive visualizations of rating changes over time
- Comparative analysis between rating systems
- Neural network model for predicting match outcomes
- Feature importance analysis
- Interactive prediction interface for hypothetical matchups
- Clone the repository:
git clone https://github.com/neenza/tnnp.git
cd tnnp- Install required packages:
pip install pandas numpy matplotlib seaborn scikit-learn tensorflow ipywidgets jupyterlab- Download tennis match data files (Excel format) from Tennis-Data.co.uk and place them in the project directory.
Launch Jupyter Lab or Notebook:
jupyter labOpen the v2.ipynb notebook to run the full analysis pipeline.
The system expects Excel files with tennis match data in the following format:
- Match date in a 'Date' column
- Winner and Loser in respective columns
- Player rankings in 'WRank' and 'LRank' columns
- Set scores in 'W1', 'L1', 'W2', 'L2', etc. columns
The feature_engineering.py module processes raw match data into features suitable for machine learning:
- ELO rating differences
- Recent form indicators
- Head-to-head statistics
- Surface-specific performance metrics
The match_predictor.py module implements:
- Neural network architecture for binary classification
- Model training and evaluation functions
- Visualization tools for model performance
- Feature importance analysis
plot_player_elo_history('Roger Federer', rating_history)compare_elo_ratings('Rafael Nadal', rating_history, weighted_rating_history)predict_match('Novak Djokovic', 'Andy Murray', fe, predictor, scaler)The model achieves approximately 65-70% accuracy in predicting match outcomes, with the weighted ELO system providing valuable additional predictive power compared to standard ELO ratings alone.
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.