Skip to content

Pull requests: alibaba/rtp-llm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat: support trtllm_fp4_block_scale_routed_moe
#446 opened Dec 11, 2025 by CrimsonDump Loading…
Feature/fix flexlb rolling issue
#445 opened Dec 11, 2025 by sunmiaozju Loading…
hotfix: fix try catch
#442 opened Dec 10, 2025 by wanglining97 Loading…
hotfix: reuse cache hit bug
#441 opened Dec 10, 2025 by MMadhatter Loading…
refactor(rtp_llm): fix frontend loop stuck issue
#438 opened Dec 10, 2025 by sunmiaozju Loading…
fix - fix auto config deepep
#437 opened Dec 9, 2025 by alibaba-miji Loading…
[draft] master 0.0.2
#427 opened Dec 4, 2025 by jianglan89 Loading…
feat: support pure tp + ep for cuda graph
#425 opened Dec 3, 2025 by JackTan25 Loading…
support fp8 fmha for rocm pymodel
#422 opened Dec 3, 2025 by liaocz Loading…
refactor: optimize token reorder impl
#419 opened Dec 2, 2025 by MMadhatter Loading…
feature - add viztracer for inference api
#417 opened Dec 2, 2025 by jianglan89 Loading…
Support DeepSeek v3.2 encoding module
#415 opened Dec 2, 2025 by soaringk Loading…
refactor gemm
#414 opened Dec 1, 2025 by fff-2013 Loading…
fix: fix max context batch size
#412 opened Nov 28, 2025 by JackTan25 Loading…
Performance Optimization for Beam Search
#411 opened Nov 28, 2025 by zhangjianning-zjn Loading…
fix: hold host buffer util next forward
#410 opened Nov 28, 2025 by Vinkle-hzt Loading…
feat: handle cuda oom error for py model
#406 opened Nov 26, 2025 by MMadhatter Loading…
fix: fix custom_ar bug for rocm
#402 opened Nov 25, 2025 by liaocz Loading…
feat: update ar & fuse mla reuse cache
#399 opened Nov 25, 2025 by Nancheng-11 Loading…
ProTip! Filter pull requests by the default branch with base:main.