Torch integration #692

Binyang2014 · 2025-11-19T17:11:47Z

Reorganize current native algorithm implementation and DSL algorithm implementation.
Provide unified API for DSL algo and native algo and provide interface to tune the algo
Provide interface for pytorch integration with native API and DSL

Copilot

Pull request overview

This PR adds PyTorch integration to MSCCL++ through a comprehensive refactoring that introduces a new algorithm abstraction layer. The changes enable users to define custom collectives using both DSL (domain-specific language) and native CUDA/HIP kernels, with flexible algorithm selection at runtime.

Key changes include:

New algorithm abstraction with Algorithm, NativeAlgorithm, and DslAlgorithm classes replacing the old ExecutionPlanRegistry
Python bindings for algorithm execution and compilation
Multiple built-in allreduce and allgather algorithm implementations
Enhanced error handling with better diagnostic messages
Support for FP8 data types and various reduce operations

Reviewed changes

Copilot reviewed 58 out of 61 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
include/mscclpp/algorithm.hpp	Major refactor: new `Algorithm` interface with `NativeAlgorithm` and `DslAlgorithm` implementations
src/algorithms/algorithm.cc	Implementation of algorithm framework with default algorithm builders
src/include/algorithms/utils.hpp	New utility header for algorithm setup (missing license)
src/include/algorithms/allreduce/common.hpp	Comprehensive allreduce operations with FP8 support
src/algorithms/allreduce/*.cu	Multiple allreduce algorithm implementations
src/algorithms/allgather/*.cu	Allgather algorithm implementations
python/mscclpp/_algorithm.py	Python wrapper for algorithm execution
python/mscclpp/_compiler.py	DSL and native code compilation support
python/csrc/algorithm.cpp	Python C++ bindings for algorithms
examples/torch-integration/*.py	Example integration with PyTorch

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/torch-integration/customized_allgather.py

examples/torch-integration/customized_comm_with_default_algo.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Binyang2014 · 2025-12-05T19:21:32Z

/azp run

azure-pipelines · 2025-12-05T19:21:55Z

Azure Pipelines successfully started running 3 pipeline(s).

Binyang2014 · 2025-12-06T02:18:52Z

/azp run

azure-pipelines · 2025-12-06T02:19:12Z

Azure Pipelines successfully started running 3 pipeline(s).

Binyang2014 · 2025-12-07T19:24:14Z

/azp run

azure-pipelines · 2025-12-07T19:24:32Z

Azure Pipelines successfully started running 3 pipeline(s).

Binyang2014 added 30 commits October 29, 2025 00:14

doc

9fa4846

doc

61ab551

revise

e5f7a2b

WIP

67e6bcf

WIP

7dca157

WIP

62db986

Merge branch 'main' into binyli/torch-integration

f24b8a6

WIP

1ba1172

WIP

0a5653b

WIP

e77635f

WIP

262485f

update

27bfd7a

WIP

15d2a14

Merge branch 'main' into binyli/torch-integration

f254834

refactor

21903e4

compile pass

1fbec20

update

0ebf12f

WIP

8ba0730

WIP

346cdbe

Refactor

2494ce6

fix

fd6b5e9

WIP

883f9ef

WIP

968f0a9

WIP

b0dcfeb

WIP

270889d

WIP

8d2eaeb

WIP

54ac481

fix

c9e8d17

fix perf

033d862

update python binding

b3c2935

Binyang2014 added 5 commits December 2, 2025 23:32

make it work

8ace561

WIP

cc18f58

Fix

919ba6a

Merge branch 'main' into binyli/torch-integration

da07df6

fix doc build

d1a74ce

Binyang2014 requested a review from Copilot December 3, 2025 04:47

Copilot started reviewing on behalf of Binyang2014 December 3, 2025 04:47 View session

Copilot finished reviewing on behalf of Binyang2014 December 3, 2025 04:48

Copilot AI reviewed Dec 3, 2025

View reviewed changes

examples/torch-integration/customized_allgather.py Show resolved Hide resolved

examples/torch-integration/customized_comm_with_default_algo.py Outdated Show resolved Hide resolved

Binyang2014 and others added 2 commits December 3, 2025 05:38

update doc

115fded

Update examples/torch-integration/customized_comm_with_default_algo.py

d2de261

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Binyang2014 marked this pull request as ready for review December 3, 2025 18:55

Binyang2014 added 4 commits December 4, 2025 19:11

WIP

d1f94c1

update

f28c085

fix

819c6b8

fix for mi300x

7b1a500

Binyang2014 requested review from caiomcbr, chhwang, mahdiehghazim and seagater December 5, 2025 19:24

Binyang2014 added 3 commits December 6, 2025 02:13

update doc string

e818db5

add pybind11

38956fb

lint

a6ac934

fix ut

682ad12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Torch integration #692

Torch integration #692

Uh oh!

Binyang2014 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Binyang2014 commented Dec 5, 2025

Uh oh!

azure-pipelines bot commented Dec 5, 2025

Uh oh!

Binyang2014 commented Dec 6, 2025

Uh oh!

azure-pipelines bot commented Dec 6, 2025

Uh oh!

Binyang2014 commented Dec 7, 2025

Uh oh!

azure-pipelines bot commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Torch integration #692

Are you sure you want to change the base?

Torch integration #692

Uh oh!

Conversation

Binyang2014 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Binyang2014 commented Dec 5, 2025

Uh oh!

azure-pipelines bot commented Dec 5, 2025

Uh oh!

Binyang2014 commented Dec 6, 2025

Uh oh!

azure-pipelines bot commented Dec 6, 2025

Uh oh!

Binyang2014 commented Dec 7, 2025

Uh oh!

azure-pipelines bot commented Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Binyang2014 commented Nov 19, 2025 •

edited

Loading