Transform messy police crash report data into clean, standardized categories for Power BI, Tableau & ArcGIS.
- MMUCC Standards - 15 categories based on NHTSA Model Minimum Uniform Crash Criteria
- Smart Matching - Regex → Fuzzy with input preprocessing (handles typos, punctuation, suffixes)
- Power BI Ready - Exports fact tables + dimension tables
- Confidence Scores - Flag uncertain matches for human review
# Install dependencies
pip install -r requirements.txt
# Run the app
streamlit run app.pyApp opens at http://localhost:8501
- Upload - CSV or Excel crash data
- Map Columns - Select which MMUCC category each column should match
- Review - Check flagged items, edit as needed
- Export - Download Excel (with dimension tables) or CSV
| File | Purpose |
|---|---|
app.py |
Main Streamlit application |
mmucc_loader.py |
Dictionary loader |
mmucc_dictionaries.json |
MMUCC categories, codes, and synonyms |
matching_engine.py |
Regex/fuzzy matching with preprocessing |
- Manner of Collision
- Injury Severity (KABCO)
- Weather Condition
- Light Condition
- Road Surface Condition
- First Harmful Event
- Contributing Factors (Driver)
- Contributing Factors (Environment/Road)
- Distracted By
- Condition at Time
- Junction Type
- Vehicle Body Type
- Trafficway Type
- Traffic Control Device
- Pre-Crash Maneuver
Edit mmucc_dictionaries.json to add new synonyms:
"manner_of_collision": {
"synonyms": {
"3": ["rear end", "rear-end", "YOUR NEW SYNONYM HERE"],
...
}
}- Crash_Data sheet: Original data + standardized columns
- Dim_* sheets: Lookup tables for each category (for Power BI relationships)
- Needs_Review sheet: Flagged items requiring human review
Standardized columns appear immediately after their source column for easy review:
Weather | Weather_Code | Weather_Standardized | Weather_Confidence | Light | Light_Code | ...
{Column}_Code- Numeric MMUCC code{Column}_Standardized- Standard label{Column}_Confidence- Match confidence (0-100)
- Python 3.10+
- ~50MB disk space
Free to use and adapt for internal business, personal, or educational use. Please don’t sell it or turn it into a paid product.
Licensed under Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0)
Found a bug or have a suggestion? Send feedback to contact@alexengineered.com