-
Notifications
You must be signed in to change notification settings - Fork 76
Torch integration #692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Torch integration #692
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds PyTorch integration to MSCCL++ through a comprehensive refactoring that introduces a new algorithm abstraction layer. The changes enable users to define custom collectives using both DSL (domain-specific language) and native CUDA/HIP kernels, with flexible algorithm selection at runtime.
Key changes include:
- New algorithm abstraction with
Algorithm,NativeAlgorithm, andDslAlgorithmclasses replacing the oldExecutionPlanRegistry - Python bindings for algorithm execution and compilation
- Multiple built-in allreduce and allgather algorithm implementations
- Enhanced error handling with better diagnostic messages
- Support for FP8 data types and various reduce operations
Reviewed changes
Copilot reviewed 58 out of 61 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| include/mscclpp/algorithm.hpp | Major refactor: new Algorithm interface with NativeAlgorithm and DslAlgorithm implementations |
| src/algorithms/algorithm.cc | Implementation of algorithm framework with default algorithm builders |
| src/include/algorithms/utils.hpp | New utility header for algorithm setup (missing license) |
| src/include/algorithms/allreduce/common.hpp | Comprehensive allreduce operations with FP8 support |
| src/algorithms/allreduce/*.cu | Multiple allreduce algorithm implementations |
| src/algorithms/allgather/*.cu | Allgather algorithm implementations |
| python/mscclpp/_algorithm.py | Python wrapper for algorithm execution |
| python/mscclpp/_compiler.py | DSL and native code compilation support |
| python/csrc/algorithm.cpp | Python C++ bindings for algorithms |
| examples/torch-integration/*.py | Example integration with PyTorch |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
examples/torch-integration/customized_comm_with_default_algo.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 3 pipeline(s). |
Reorganize current native algorithm implementation and DSL algorithm implementation.
Provide unified API for DSL algo and native algo and provide interface to tune the algo
Provide interface for pytorch integration with native API and DSL