[Optimization] Decode attention support #5767

lizhenyun01 · 2025-12-25T09:19:52Z

Motivation

attention优化及重构第一部分：
- attention重构，合并投机解码/非投机解码分支，消除冗余逻辑
- 拆分decoder_write_cache_with_rope为单独算子，便于维护
- 新增decode attention backend，当前只支持PD分离下D节点
- 优化decode attention C8kernel性能，优化后在group_size=14下单步投机场景性能提升5%-113%

TODO：
- ROPE，write_cache重构及投机解码等分支融合
- C16 C4支持
- 单测完善
- backend逐步替换append_attention

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-25T09:20:00Z

Thanks for your contribution!

lizhenyun01 added 2 commits December 25, 2025 16:59

[Optimizer] Support decode attention static c8 op

12f25a3

[Feature] Support decode attention backend

154d98e

lizhenyun01 had a problem deploying to Metax_ci December 25, 2025 09:19 — with GitHub Actions Error

code style fix

a3a9b47

lizhenyun01 had a problem deploying to Metax_ci December 25, 2025 09:25 — with GitHub Actions Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Optimization] Decode attention support #5767

[Optimization] Decode attention support #5767

Uh oh!

lizhenyun01 commented Dec 25, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[Optimization] Decode attention support #5767

Are you sure you want to change the base?

[Optimization] Decode attention support #5767

Uh oh!

Conversation

lizhenyun01 commented Dec 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Checklist

Uh oh!

paddle-bot bot commented Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

lizhenyun01 commented Dec 25, 2025 •

edited

Loading