Skip to content

zachringnight/cfbmodel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

80 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🏈 College Football Data Starter Pack

GitHub Actions Workflow Status

Welcome to the CollegeFootballData.com Starter Pack β€” a curated bundle of structured college football data, custom advanced metrics, and real-world Jupyter notebooks to help you build models, explore trends, and launch your own analytics projects faster.


πŸ“¦ What’s Included

  • βœ… Historical Data

    • Game results (1869–present)
    • Play-by-play, drives, season stats (2003–present)
    • Advanced team-level metrics (EPA, success rate, explosiveness, etc.)
  • βœ… 12 Jupyter Notebooks

    • Code walkthroughs for ranking, predictions, dashboards, and more
  • βœ… PDF Guides

    • Full data dictionary
    • Overview of included notebooks
  • βœ… Metadata

    • Teams, conferences, venues

🧠 Who It's For

This pack is designed for:

  • Data scientists & analysts
  • CFB modelers and pick’em competitors
  • Academic researchers
  • Hobbyists and fans who love working with real data

πŸ“š Folder Structure

πŸ“‚ data πŸ“‚ advanced_game_stats/ πŸ“‚ advanced_season_stats/ πŸ“‚ drives/ πŸ“‚ game_stats/ πŸ“‚ plays/ πŸ“‚ season_stats/ πŸ“„ conferences.csv πŸ“„ games.csv πŸ“„ teams.csv πŸ“„ 00_data_dictionary.ipynb πŸ“„ 01_intro_to_data.ipynb πŸ“„ 02_build_simple_rankings.ipynb πŸ“„ 03_metrics_comparison.ipynb πŸ“„ 04_team_similarity.ipynb πŸ“„ 05_matchup_predictor.ipynb πŸ“„ 06_custom_rankings_by_metric.ipynb πŸ“„ 07_drive_efficiency.ipynb πŸ“„ 08_offense_vs_defense_comparison.ipynb πŸ“„ 09_opponent_adjustments.ipynb πŸ“„ 10_srs_adjusted_metrics.ipynb πŸ“„ 11_metric_distribution_explorer.ipynb πŸ“„ CFBD Starter Pack - Data Files Guide.pdf πŸ“„ CFBD Starter Pack - Notebooks Guide.pdf πŸ“„ headers.md πŸ“„ 12_efficiency_dashboards.ipynb πŸ“„ LICENSE.txt πŸ“„ README.md


βš™οΈ Requirements

To run the Jupyter notebooks, install:

A production-ready machine learning model for predicting college football game outcomes using data from the College Football Data API.

CI Status

Features

  • πŸ”„ Robust API Client with automatic retry logic and timeout handling
  • πŸ€– Machine Learning Models (Random Forest and Gradient Boosting) for game outcome prediction
  • πŸ“Š Feature Engineering from team statistics, talent ratings, and historical data
  • πŸ” Comprehensive Logging for debugging and monitoring
  • βœ… Input Validation with detailed error messages
  • πŸ§ͺ Unit Tests for core functionality
  • βš™οΈ Configurable Parameters via config.py
  • πŸ“ˆ Confidence Scores for each prediction
  • πŸš€ CI/CD Pipeline with automated testing on multiple Python versions

Prerequisites

Installation

Option 1: Direct Installation

  1. Clone this repository:
git clone https://github.com/zachringnight/cfbmodel.git
cd cfbmodel
  1. Install required dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn jupyter
  1. Set up your API key (optional - can also pass via command line):
cp .env.example .env
# Edit .env and add your API key

Option 2: Package Installation

Install as a Python package:

git clone https://github.com/zachringnight/cfbmodel.git
cd cfbmodel
pip install -e .

This will install the package in development mode and make the cfbmodel command available in your terminal.

Usage

The model can be used either by running the main script directly or through the installed package.

πŸ€– Automated Workflow (NEW!)

The model can run automatically via GitHub Actions:

  • Scheduled: Runs every Saturday at 8 AM UTC
  • Manual: Trigger from GitHub Actions tab with custom parameters
  • Outputs: JSON and CSV predictions, trained models, and logs

See .github/WORKFLOW_DOCUMENTATION.md for complete workflow documentation.

Quick Setup:

  1. Add your API key as a GitHub secret named CFB_API_KEY (Security Guide)
  2. The workflow will automatically run weekly
  3. Download predictions from the Actions artifacts

⚠️ Security Notice: Never hardcode API keys in code or commit them to the repository. The workflow is designed to use GitHub Secrets securely. See .github/SECURITY.md for best practices.

πŸš€ Quick Start: Weekly Predictions

Run predictions for the current week's games with automatic week detection:

# Set your API key
export CFB_API_KEY="YOUR_API_KEY"

# Run predictions with structured outputs (JSON + CSV)
python run_predictions_with_outputs.py --train --train-year 2024

# Or use the simpler script (text output only)
python run_weekly_predictions.py --train --train-year 2024

For detailed instructions, see WEEKLY_PREDICTIONS_GUIDE.md.

Training a Model

Train a model using data from a specific season:

# Using the script directly
python main.py --api-key YOUR_API_KEY --year 2023 --train

# Or if installed as a package
cfbmodel --api-key YOUR_API_KEY --year 2023 --train

# Or with executable permission
./main.py --api-key YOUR_API_KEY --year 2023 --train

This will:

  • Fetch game data, team statistics, and talent ratings for the specified year
  • Prepare features for machine learning
  • Train a Random Forest classifier
  • Display training metrics and feature importance
  • Save the trained model to cfb_model.pkl

Making Predictions

Make predictions for games in a specific week:

python main.py --api-key YOUR_API_KEY --year 2024 --predict --week 5

This will:

  • Load the trained model
  • Fetch upcoming games for the specified week
  • Generate predictions with confidence scores
  • Display results for each matchup

Combined Training and Prediction

You can train and predict in one command:

python main.py --api-key YOUR_API_KEY --year 2023 --train --predict --week 10

Testing

Run the unit tests to validate the installation:

python -m pytest test_cfb_model.py -v

All 10 tests should pass, covering:

  • Model initialization and training
  • Input validation and error handling
  • Prediction functionality
  • Data preprocessing

Continuous Integration

This project includes a GitHub Actions CI workflow that automatically tests the model on every push and pull request.

The CI workflow:

  • Tests on Multiple Python Versions: Runs tests on Python 3.9, 3.10, 3.11, and 3.12
  • Validates Module Imports: Ensures all core modules can be imported
  • Tests Model Functionality: Trains and validates the model with synthetic data
  • Runs Unit Tests: Executes the full test suite automatically

Workflow Triggers

  • Push to main, master, develop, or any copilot/** branch
  • Pull requests to main, master, or develop
  • Manual workflow dispatch

View Workflow Status

Check the Actions tab in the GitHub repository to see the status of the CI workflow.

Manual Model Demo

A manual workflow (model-demo.yml) is also available for demonstrating the model:

  • Can be triggered manually from the Actions tab
  • Option to use synthetic test data or real API data
  • Shows complete model training and prediction pipeline
  • Useful for demonstrations and validating model functionality

To run the demo:

  1. Go to the Actions tab
  2. Select "Model Demo (Manual Trigger)"
  3. Click "Run workflow"
  4. Choose synthetic or real data (real data requires CFB_API_KEY secret)

Configuration

Modify config.py to customize model parameters:

  • Model Type: Random Forest or Gradient Boosting
  • Model Hyperparameters: n_estimators, max_depth, learning_rate, etc.
  • API Settings: Timeout, retry attempts
  • Logging Level: DEBUG, INFO, WARNING, ERROR

Project Structure

cfbmodel/
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       β”œβ”€β”€ ci.yml                 # CI workflow for automated testing
β”‚       └── model-demo.yml         # Manual demo workflow
β”œβ”€β”€ data_fetcher.py                # API client with retry logic and validation
β”œβ”€β”€ preprocessor.py                # Data preprocessing and feature engineering
β”œβ”€β”€ model.py                       # ML model definitions with logging
β”œβ”€β”€ main.py                        # CLI interface
β”œβ”€β”€ run_weekly_predictions.py      # NEW: Automatic weekly predictions script
β”œβ”€β”€ test_weekly_predictions.py     # NEW: Test script for weekly predictions
β”œβ”€β”€ config.py                      # Configuration parameters
β”œβ”€β”€ test_cfb_model.py              # Unit tests
β”œβ”€β”€ example.py                     # Usage examples
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ TESTING_SUMMARY.md             # Test results and validation
β”œβ”€β”€ WEEKLY_PREDICTIONS_GUIDE.md    # NEW: Guide for weekly predictions
└── README.md                      # This file

Improvements in This Version

Robustness

  • βœ… Automatic retry logic for API requests
  • βœ… Request timeout handling
  • βœ… Comprehensive input validation
  • βœ… Detailed error messages

Observability

  • βœ… Structured logging throughout the codebase
  • βœ… Training progress tracking
  • βœ… API call monitoring

Code Quality

  • βœ… Type hints for better IDE support
  • βœ… Docstrings with parameter descriptions
  • βœ… Unit tests with pytest
  • βœ… Configuration file for easy customization

Error Handling

  • βœ… Validates API keys before use
  • βœ… Checks data frame emptiness
  • βœ… Validates year and week ranges
  • βœ… Handles missing files gracefully

Model Features

The model uses the following features for predictions:

  • Offensive Statistics: Total yards, passing yards, rushing yards
  • Team Talent Ratings: Recruiting and talent composite scores
  • Differential Features: Calculated differences between home and away team stats
  • Historical Performance: Season-long averages and trends

Model Performance

Typical results on 2023 season data:

  • Training Accuracy: 63-65%
  • Test Accuracy: 59-60%
  • Cross-Validation Accuracy: 59% (Β±1.6%)

Feature Importance Analysis:

  1. yards_diff (28%) - Most predictive feature
  2. home_off_total_yards (15%)
  3. away_off_total_yards (14%)
  4. home_off_passing_yards (12%)
  5. home_off_rushing_yards (12%)

πŸš€ Getting Started

  1. Extract the ZIP package
  2. Open the folder in Jupyter Lab, Jupyter Notebook, or VS Code
  3. Start with 01_intro_to_data.ipynb
  4. Explore and modify notebooks for your own analysis.

πŸ“œ License & Terms of Use

This Starter Pack is provided for personal, non-commercial use only by the original purchaser or Patreon subscriber.

By downloading or using this product, you agree to the following:

  • βœ… You may use the data, code, and examples for personal projects, academic research, or internal use.
  • 🚫 You may not redistribute, resell, republish, or repackage this content β€” in whole or in part β€” without written permission.
  • 🚫 You may not share access to the ZIP file, notebooks, or datasets publicly or with teams.
  • 🧠 Attribution is appreciated but not required.

If you're interested in licensing this pack for organizational use, educational programs, or media coverage, please contact me.


πŸ“¬ Contact

For questions, feedback, or support: 🏠 CollegeFootballData.com πŸ’Œ Email: admin@collegefootballdata.com 🧡 Twitter / X: @CFB_Data 🌐 Bluesky: @collegefootballdata.com


Thanks for supporting independent sports data!

About

No description, website, or topics provided.

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •