-
Notifications
You must be signed in to change notification settings - Fork 22
GPU concurrency #399
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPU concurrency #399
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements dynamic GPU runner selection to enable concurrent GitHub Actions workflow runs across multiple GPUs. Instead of using a single fixed runner, the workflow now distributes jobs across 5 different GPU runners using a round-robin selection based on the GitHub run number.
Changes:
- Added a
select-runnerjob that picks one of 5 available GPU runners using modulo-based distribution - Updated the main
runjob to use the dynamically selected runner and added concurrency controls per runner
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| runner: ${{ steps.pick.outputs.runner }} | ||
| steps: | ||
| - id: pick | ||
| run: | |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runner list skips gpu1, gpu3, and gpu6 without explanation. Consider adding a comment explaining why these specific GPUs are selected, or use a more consistent naming pattern if possible.
| run: | | |
| run: | | |
| # Only schedule jobs on GPUs that are available for CI use. | |
| # gpu1, gpu3, and gpu6 on b200-02 are reserved/unavailable, so they are intentionally excluded. |
| - id: pick | ||
| run: | | ||
| runners=("b200-02-gpu0" "b200-02-gpu2" "b200-02-gpu4" "b200-02-gpu5" "b200-02-gpu7") | ||
| index=$(( ${{ github.run_number }} % 5 )) |
Copilot
AI
Jan 14, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modulo value is hardcoded as 5. If runners are added or removed from the array, this must be manually updated. Consider using ${#runners[@]} to automatically match the array length.
| index=$(( ${{ github.run_number }} % 5 )) | |
| index=$(( ${{ github.run_number }} % ${#runners[@]} )) |
No description provided.