Extended discrete choice modeling notebook with Fair's affair data & STAR98 education analysis. Covers Logit, Probit, GLM, diagnostics, model comparison, marginal effects, and advanced topics like censored regression and count models. Complete with visualizations and validation techniques.
- Multiple Model Types: Logit, Probit, Ordered Logit, Negative Binomial, and Binomial GLM
- Real Datasets: Fair's Affair Data (1974) and STAR98 Education Data
- Complete Analysis Pipeline: EDA → Model Estimation → Diagnostics → Validation
- Advanced Topics: Censored regression, count models, link function comparison
- Visualizations: ROC curves, residual diagnostics, influence analysis
- Model Validation: Cross-validation, multicollinearity checks, information criteria
- Observations: 6,366 women
- Variables: 9 predictors including marriage rating, age, years married, religiosity
- Outcome: Binary indicator of extramarital affairs
- Source: Fair, Ray. 1978. "A Theory of Extramarital Affairs," Journal of Political Economy
- Observations: 303 California school districts
- Variables: 13 main + 8 interaction terms
- Outcome: Number of 9th graders scoring above national median in math
- Source: California Department of Education standardized testing
- Binary Choice: Logit vs Probit comparison with marginal effects
- Ordered Response: Ordered logit for rating scale data
- Count Data: Negative binomial for frequency analysis
- Proportions: Binomial GLM with multiple link functions
- Model Diagnostics: Residual analysis, influence measures, multicollinearity checks
- Validation: 5-fold cross-validation, ROC analysis, prediction tables
- Comparison Metrics: AIC, BIC, pseudo-R², classification reports
- Visualization: CDF/PDF comparisons, residual plots, Cook's distance
- Most significant predictors: Marriage satisfaction (strong negative), religiosity (negative), years married (positive)
- Model performance: 72% accuracy with good calibration but poor minority class detection
- Logit vs Probit: Functionally equivalent results with near-identical marginal effects
- Best predictors: Low-income percentage (strong negative), Asian student percentage (positive)
- Exceptional fit: Pseudo-R² approaching 1.000 (potential separation issues)
- Optimal link: Complementary log-log outperforms standard logit/probit
The notebook covers essential discrete choice theory:
- Random utility maximization framework
- Logistic vs normal error distributions
- Marginal effects interpretation
- Odds ratios and probability calculations
- IIA assumption and model limitations
- Policy Analysis: Understanding determinants of social behaviors
- Educational Research: Identifying factors affecting student achievement
- Model Selection: Comparative evaluation of different specifications
- Diagnostic Testing: Comprehensive validation of model assumptions
statsmodels>=0.14.0
pandas>=1.5.0
numpy>=1.24.0
matplotlib>=3.7.0
scipy>=1.10.0
scikit-learn>=1.2.0 # For cross-validation only- Clone the repository
- Install requirements:
pip install -r requirements.txt - Open the Jupyter notebook:
jupyter notebook discrete_choice_models.ipynb - Run cells sequentially or explore specific sections
- Introduction & Theory - Discrete choice model foundations
- Fair's Affair Data - Binary choice modeling with diagnostics
- Alternative Specifications - Ordered, count, and censored models
- STAR98 GLM Analysis - Binomial regression with interactions
- Model Diagnostics - Validation and influence analysis
- Advanced Topics - Multicollinearity, model selection, best practices
- Class imbalance: Affair data has 68% negative cases
- Perfect separation: STAR98 data shows quasi-complete separation
- Model complexity: Some specifications with high-dimensional interactions
- Historical data: 1974 social norms may differ from contemporary patterns
If using this code for research, please cite:
- Fair, R. (1978). A Theory of Extramarital Affairs. Journal of Political Economy
- McFadden, D. (1974). Conditional Logit Analysis of Qualitative Choice Behavior
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
This notebook is designed for educational and research purposes. Real-world applications may require additional considerations and validation.