Skip to content

Conversation

@Binyang2014
Copy link
Contributor

@Binyang2014 Binyang2014 commented Nov 19, 2025

Reorganize current native algorithm implementation and DSL algorithm implementation.
Provide unified API for DSL algo and native algo and provide interface to tune the algo
Provide interface for pytorch integration with native API and DSL

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds PyTorch integration to MSCCL++ through a comprehensive refactoring that introduces a new algorithm abstraction layer. The changes enable users to define custom collectives using both DSL (domain-specific language) and native CUDA/HIP kernels, with flexible algorithm selection at runtime.

Key changes include:

  • New algorithm abstraction with Algorithm, NativeAlgorithm, and DslAlgorithm classes replacing the old ExecutionPlanRegistry
  • Python bindings for algorithm execution and compilation
  • Multiple built-in allreduce and allgather algorithm implementations
  • Enhanced error handling with better diagnostic messages
  • Support for FP8 data types and various reduce operations

Reviewed changes

Copilot reviewed 58 out of 61 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
include/mscclpp/algorithm.hpp Major refactor: new Algorithm interface with NativeAlgorithm and DslAlgorithm implementations
src/algorithms/algorithm.cc Implementation of algorithm framework with default algorithm builders
src/include/algorithms/utils.hpp New utility header for algorithm setup (missing license)
src/include/algorithms/allreduce/common.hpp Comprehensive allreduce operations with FP8 support
src/algorithms/allreduce/*.cu Multiple allreduce algorithm implementations
src/algorithms/allgather/*.cu Allgather algorithm implementations
python/mscclpp/_algorithm.py Python wrapper for algorithm execution
python/mscclpp/_compiler.py DSL and native code compilation support
python/csrc/algorithm.cpp Python C++ bindings for algorithms
examples/torch-integration/*.py Example integration with PyTorch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Binyang2014 and others added 2 commits December 3, 2025 05:38
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Binyang2014 Binyang2014 marked this pull request as ready for review December 3, 2025 18:55
@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@Binyang2014
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants