alibaba / rtp-llm Public

Notifications You must be signed in to change notification settings
Fork 128
Star 944

Code
Issues 27
Pull requests 59
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: alibaba/rtp-llm

Labels 10 Milestones 0

New pull request New

59 Open 274 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

feat: support trtllm_fp4_block_scale_routed_moe

#446 opened Dec 11, 2025 by CrimsonDump

Loading…

Feature/fix flexlb rolling issue

#445 opened Dec 11, 2025 by sunmiaozju

Loading…

feat: support python-xqa with CUDA 12.9 and compatible with CUDA 12.6

#444 opened Dec 11, 2025 by qqbbiu

Loading…

hotfix: fix try catch

#442 opened Dec 10, 2025 by wanglining97

Loading…

hotfix: reuse cache hit bug

#441 opened Dec 10, 2025 by MMadhatter

Loading…

refactor(rtp_llm): fix frontend loop stuck issue

#438 opened Dec 10, 2025 by sunmiaozju

Loading…

fix - fix auto config deepep

#437 opened Dec 9, 2025 by alibaba-miji

Loading…

feat: add process-isolated logging with rank_id and server_id

#436 opened Dec 5, 2025 by sunmiaozju

Loading…

fix: modify pre_decoder_residual under multimodalEmbedding input

#433 opened Dec 5, 2025 by junna2016

Loading…

[draft] master 0.0.2

#427 opened Dec 4, 2025 by jianglan89

Loading…

fix: wrong residual when pre decoder layernorm + mm embedding + quant

#426 opened Dec 4, 2025 by LLLLKKKK

Loading…

feat: support pure tp + ep for cuda graph

#425 opened Dec 3, 2025 by JackTan25

Loading…

fix - backend server not shutdown graceful in mulit rank case

#424 opened Dec 3, 2025 by jianglan89

Loading…

support fp8 fmha for rocm pymodel

#422 opened Dec 3, 2025 by liaocz

Loading…

feat: support return raw output and output ids in debug info

#421 opened Dec 2, 2025 by soaringk

Loading…

refactor: optimize token reorder impl

#419 opened Dec 2, 2025 by MMadhatter

Loading…

feature - add viztracer for inference api

#417 opened Dec 2, 2025 by jianglan89

Loading…

Support DeepSeek v3.2 encoding module

#415 opened Dec 2, 2025 by soaringk

Loading…

refactor gemm

#414 opened Dec 1, 2025 by fff-2013

Loading…

fix: fix max context batch size

#412 opened Nov 28, 2025 by JackTan25

Loading…

Performance Optimization for Beam Search

#411 opened Nov 28, 2025 by zhangjianning-zjn

Loading…

fix: hold host buffer util next forward

#410 opened Nov 28, 2025 by Vinkle-hzt

Loading…

feat: handle cuda oom error for py model

#406 opened Nov 26, 2025 by MMadhatter

Loading…

fix: fix custom_ar bug for rocm

#402 opened Nov 25, 2025 by liaocz

Loading…

feat: update ar & fuse mla reuse cache

#399 opened Nov 25, 2025 by Nancheng-11

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!