Skip to content

Extended discrete choice modeling notebook with Fair's affair data & STAR98 education analysis. Covers Logit, Probit, GLM, diagnostics, model comparison, marginal effects, and advanced topics like censored regression and count models. Complete with visualizations and validation techniques.

License

Notifications You must be signed in to change notification settings

esosetrov/discrete_choice_models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

discrete_choice_models

Extended discrete choice modeling notebook with Fair's affair data & STAR98 education analysis. Covers Logit, Probit, GLM, diagnostics, model comparison, marginal effects, and advanced topics like censored regression and count models. Complete with visualizations and validation techniques.

Overview

Features

  • Multiple Model Types: Logit, Probit, Ordered Logit, Negative Binomial, and Binomial GLM
  • Real Datasets: Fair's Affair Data (1974) and STAR98 Education Data
  • Complete Analysis Pipeline: EDA → Model Estimation → Diagnostics → Validation
  • Advanced Topics: Censored regression, count models, link function comparison
  • Visualizations: ROC curves, residual diagnostics, influence analysis
  • Model Validation: Cross-validation, multicollinearity checks, information criteria

Datasets

1. Fair's Affair Data (1974)

  • Observations: 6,366 women
  • Variables: 9 predictors including marriage rating, age, years married, religiosity
  • Outcome: Binary indicator of extramarital affairs
  • Source: Fair, Ray. 1978. "A Theory of Extramarital Affairs," Journal of Political Economy

2. STAR98 Education Data

  • Observations: 303 California school districts
  • Variables: 13 main + 8 interaction terms
  • Outcome: Number of 9th graders scoring above national median in math
  • Source: California Department of Education standardized testing

Technical Implementation

Core Models

  • Binary Choice: Logit vs Probit comparison with marginal effects
  • Ordered Response: Ordered logit for rating scale data
  • Count Data: Negative binomial for frequency analysis
  • Proportions: Binomial GLM with multiple link functions

Advanced Features

  • Model Diagnostics: Residual analysis, influence measures, multicollinearity checks
  • Validation: 5-fold cross-validation, ROC analysis, prediction tables
  • Comparison Metrics: AIC, BIC, pseudo-R², classification reports
  • Visualization: CDF/PDF comparisons, residual plots, Cook's distance

Findings

Fair's Affair Data

  • Most significant predictors: Marriage satisfaction (strong negative), religiosity (negative), years married (positive)
  • Model performance: 72% accuracy with good calibration but poor minority class detection
  • Logit vs Probit: Functionally equivalent results with near-identical marginal effects

STAR98 Education Data

  • Best predictors: Low-income percentage (strong negative), Asian student percentage (positive)
  • Exceptional fit: Pseudo-R² approaching 1.000 (potential separation issues)
  • Optimal link: Complementary log-log outperforms standard logit/probit

Theoretical Foundations

The notebook covers essential discrete choice theory:

  • Random utility maximization framework
  • Logistic vs normal error distributions
  • Marginal effects interpretation
  • Odds ratios and probability calculations
  • IIA assumption and model limitations

Practical Applications

  1. Policy Analysis: Understanding determinants of social behaviors
  2. Educational Research: Identifying factors affecting student achievement
  3. Model Selection: Comparative evaluation of different specifications
  4. Diagnostic Testing: Comprehensive validation of model assumptions

Dependencies

statsmodels>=0.14.0
pandas>=1.5.0
numpy>=1.24.0
matplotlib>=3.7.0
scipy>=1.10.0
scikit-learn>=1.2.0  # For cross-validation only

Quick Start

  1. Clone the repository
  2. Install requirements: pip install -r requirements.txt
  3. Open the Jupyter notebook: jupyter notebook discrete_choice_models.ipynb
  4. Run cells sequentially or explore specific sections

Notebook Structure

  1. Introduction & Theory - Discrete choice model foundations
  2. Fair's Affair Data - Binary choice modeling with diagnostics
  3. Alternative Specifications - Ordered, count, and censored models
  4. STAR98 GLM Analysis - Binomial regression with interactions
  5. Model Diagnostics - Validation and influence analysis
  6. Advanced Topics - Multicollinearity, model selection, best practices

Limitations & Considerations

  • Class imbalance: Affair data has 68% negative cases
  • Perfect separation: STAR98 data shows quasi-complete separation
  • Model complexity: Some specifications with high-dimensional interactions
  • Historical data: 1974 social norms may differ from contemporary patterns

Citation

If using this code for research, please cite:

  • Fair, R. (1978). A Theory of Extramarital Affairs. Journal of Political Economy
  • McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Note

This notebook is designed for educational and research purposes. Real-world applications may require additional considerations and validation.

About

Extended discrete choice modeling notebook with Fair's affair data & STAR98 education analysis. Covers Logit, Probit, GLM, diagnostics, model comparison, marginal effects, and advanced topics like censored regression and count models. Complete with visualizations and validation techniques.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published