Analytics pipeline that processes GitHub Archive data to surface insights about developer activity and repository health.
GitHub Archive (BigQuery public dataset)
↓
dbt models
↓
Looker Studio
Pulls 7 days of GitHub events (~25M rows), runs daily via GitHub Actions.
Staging
stg_github_events- Raw events from GitHub Archivestg_github_pushes- Push events with bots filtered out
Analytics
developer_activity_metrics- Developer stats: push counts, streaks, activity tiersrepository_health_scores- Repo scoring: bus factor, retention, health tierscontribution_patterns- Activity heatmap by hour/dayevent_type_trends- Event composition over timedaily_activity/event_breakdown- Summary tables
cd codepulse_analytics
export GCP_PROJECT_ID=your-project
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
dbt run --profiles-dir .- dbt + BigQuery
- GitHub Actions (scheduled runs)
- Looker Studio (dashboards)