Welcome to the CollegeFootballData.com Starter Pack β a curated bundle of structured college football data, custom advanced metrics, and real-world Jupyter notebooks to help you build models, explore trends, and launch your own analytics projects faster.
-
β Historical Data
- Game results (1869βpresent)
- Play-by-play, drives, season stats (2003βpresent)
- Advanced team-level metrics (EPA, success rate, explosiveness, etc.)
-
β 12 Jupyter Notebooks
- Code walkthroughs for ranking, predictions, dashboards, and more
-
β PDF Guides
- Full data dictionary
- Overview of included notebooks
-
β Metadata
- Teams, conferences, venues
This pack is designed for:
- Data scientists & analysts
- CFB modelers and pickβem competitors
- Academic researchers
- Hobbyists and fans who love working with real data
π data π advanced_game_stats/ π advanced_season_stats/ π drives/ π game_stats/ π plays/ π season_stats/ π conferences.csv π games.csv π teams.csv π 00_data_dictionary.ipynb π 01_intro_to_data.ipynb π 02_build_simple_rankings.ipynb π 03_metrics_comparison.ipynb π 04_team_similarity.ipynb π 05_matchup_predictor.ipynb π 06_custom_rankings_by_metric.ipynb π 07_drive_efficiency.ipynb π 08_offense_vs_defense_comparison.ipynb π 09_opponent_adjustments.ipynb π 10_srs_adjusted_metrics.ipynb π 11_metric_distribution_explorer.ipynb π CFBD Starter Pack - Data Files Guide.pdf π CFBD Starter Pack - Notebooks Guide.pdf π headers.md π 12_efficiency_dashboards.ipynb π LICENSE.txt π README.md
To run the Jupyter notebooks, install:
A production-ready machine learning model for predicting college football game outcomes using data from the College Football Data API.
- π Robust API Client with automatic retry logic and timeout handling
- π€ Machine Learning Models (Random Forest and Gradient Boosting) for game outcome prediction
- π Feature Engineering from team statistics, talent ratings, and historical data
- π Comprehensive Logging for debugging and monitoring
- β Input Validation with detailed error messages
- π§ͺ Unit Tests for core functionality
- βοΈ Configurable Parameters via config.py
- π Confidence Scores for each prediction
- π CI/CD Pipeline with automated testing on multiple Python versions
- Python 3.7 or higher
- College Football Data API key (get one at https://collegefootballdata.com/)
- Clone this repository:
git clone https://github.com/zachringnight/cfbmodel.git
cd cfbmodel- Install required dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn jupyter- Set up your API key (optional - can also pass via command line):
cp .env.example .env
# Edit .env and add your API keyInstall as a Python package:
git clone https://github.com/zachringnight/cfbmodel.git
cd cfbmodel
pip install -e .This will install the package in development mode and make the cfbmodel command available in your terminal.
The model can be used either by running the main script directly or through the installed package.
The model can run automatically via GitHub Actions:
- Scheduled: Runs every Saturday at 8 AM UTC
- Manual: Trigger from GitHub Actions tab with custom parameters
- Outputs: JSON and CSV predictions, trained models, and logs
See .github/WORKFLOW_DOCUMENTATION.md for complete workflow documentation.
Quick Setup:
- Add your API key as a GitHub secret named
CFB_API_KEY(Security Guide) - The workflow will automatically run weekly
- Download predictions from the Actions artifacts
β οΈ Security Notice: Never hardcode API keys in code or commit them to the repository. The workflow is designed to use GitHub Secrets securely. See .github/SECURITY.md for best practices.
Run predictions for the current week's games with automatic week detection:
# Set your API key
export CFB_API_KEY="YOUR_API_KEY"
# Run predictions with structured outputs (JSON + CSV)
python run_predictions_with_outputs.py --train --train-year 2024
# Or use the simpler script (text output only)
python run_weekly_predictions.py --train --train-year 2024For detailed instructions, see WEEKLY_PREDICTIONS_GUIDE.md.
Train a model using data from a specific season:
# Using the script directly
python main.py --api-key YOUR_API_KEY --year 2023 --train
# Or if installed as a package
cfbmodel --api-key YOUR_API_KEY --year 2023 --train
# Or with executable permission
./main.py --api-key YOUR_API_KEY --year 2023 --trainThis will:
- Fetch game data, team statistics, and talent ratings for the specified year
- Prepare features for machine learning
- Train a Random Forest classifier
- Display training metrics and feature importance
- Save the trained model to
cfb_model.pkl
Make predictions for games in a specific week:
python main.py --api-key YOUR_API_KEY --year 2024 --predict --week 5This will:
- Load the trained model
- Fetch upcoming games for the specified week
- Generate predictions with confidence scores
- Display results for each matchup
You can train and predict in one command:
python main.py --api-key YOUR_API_KEY --year 2023 --train --predict --week 10Run the unit tests to validate the installation:
python -m pytest test_cfb_model.py -vAll 10 tests should pass, covering:
- Model initialization and training
- Input validation and error handling
- Prediction functionality
- Data preprocessing
This project includes a GitHub Actions CI workflow that automatically tests the model on every push and pull request.
The CI workflow:
- Tests on Multiple Python Versions: Runs tests on Python 3.9, 3.10, 3.11, and 3.12
- Validates Module Imports: Ensures all core modules can be imported
- Tests Model Functionality: Trains and validates the model with synthetic data
- Runs Unit Tests: Executes the full test suite automatically
- Push to
main,master,develop, or anycopilot/**branch - Pull requests to
main,master, ordevelop - Manual workflow dispatch
Check the Actions tab in the GitHub repository to see the status of the CI workflow.
A manual workflow (model-demo.yml) is also available for demonstrating the model:
- Can be triggered manually from the Actions tab
- Option to use synthetic test data or real API data
- Shows complete model training and prediction pipeline
- Useful for demonstrations and validating model functionality
To run the demo:
- Go to the Actions tab
- Select "Model Demo (Manual Trigger)"
- Click "Run workflow"
- Choose synthetic or real data (real data requires CFB_API_KEY secret)
Modify config.py to customize model parameters:
- Model Type: Random Forest or Gradient Boosting
- Model Hyperparameters: n_estimators, max_depth, learning_rate, etc.
- API Settings: Timeout, retry attempts
- Logging Level: DEBUG, INFO, WARNING, ERROR
cfbmodel/
βββ .github/
β βββ workflows/
β βββ ci.yml # CI workflow for automated testing
β βββ model-demo.yml # Manual demo workflow
βββ data_fetcher.py # API client with retry logic and validation
βββ preprocessor.py # Data preprocessing and feature engineering
βββ model.py # ML model definitions with logging
βββ main.py # CLI interface
βββ run_weekly_predictions.py # NEW: Automatic weekly predictions script
βββ test_weekly_predictions.py # NEW: Test script for weekly predictions
βββ config.py # Configuration parameters
βββ test_cfb_model.py # Unit tests
βββ example.py # Usage examples
βββ requirements.txt # Python dependencies
βββ TESTING_SUMMARY.md # Test results and validation
βββ WEEKLY_PREDICTIONS_GUIDE.md # NEW: Guide for weekly predictions
βββ README.md # This file
- β Automatic retry logic for API requests
- β Request timeout handling
- β Comprehensive input validation
- β Detailed error messages
- β Structured logging throughout the codebase
- β Training progress tracking
- β API call monitoring
- β Type hints for better IDE support
- β Docstrings with parameter descriptions
- β Unit tests with pytest
- β Configuration file for easy customization
- β Validates API keys before use
- β Checks data frame emptiness
- β Validates year and week ranges
- β Handles missing files gracefully
The model uses the following features for predictions:
- Offensive Statistics: Total yards, passing yards, rushing yards
- Team Talent Ratings: Recruiting and talent composite scores
- Differential Features: Calculated differences between home and away team stats
- Historical Performance: Season-long averages and trends
Typical results on 2023 season data:
- Training Accuracy: 63-65%
- Test Accuracy: 59-60%
- Cross-Validation Accuracy: 59% (Β±1.6%)
Feature Importance Analysis:
- yards_diff (28%) - Most predictive feature
- home_off_total_yards (15%)
- away_off_total_yards (14%)
- home_off_passing_yards (12%)
- home_off_rushing_yards (12%)
- Extract the ZIP package
- Open the folder in Jupyter Lab, Jupyter Notebook, or VS Code
- Start with
01_intro_to_data.ipynb - Explore and modify notebooks for your own analysis.
This Starter Pack is provided for personal, non-commercial use only by the original purchaser or Patreon subscriber.
By downloading or using this product, you agree to the following:
- β You may use the data, code, and examples for personal projects, academic research, or internal use.
- π« You may not redistribute, resell, republish, or repackage this content β in whole or in part β without written permission.
- π« You may not share access to the ZIP file, notebooks, or datasets publicly or with teams.
- π§ Attribution is appreciated but not required.
If you're interested in licensing this pack for organizational use, educational programs, or media coverage, please contact me.
For questions, feedback, or support: π CollegeFootballData.com π Email: admin@collegefootballdata.com π§΅ Twitter / X: @CFB_Data π Bluesky: @collegefootballdata.com
Thanks for supporting independent sports data!