[Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling by chang-wenbin · Pull Request #7776 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-05-11T09:43:11Z

Motivation

为支持 DeepSeek V3 MLA 模型在 dummy 模式（不加载真实权重）下完成前向推理，用于性能 profiling 和 benchmark 场景。通过环境变量 RUN_DUMMY_FOR_PROFILE=1 开启端到端耗时测量，同时修正 value padding 的条件判断逻辑。

Modifications

fastdeploy/model_executor/layers/linear.py：KVBatchLinear.__init__ 在 load_choices == "dummy" 时直接创建零初始化的 k_b_proj_weight / v_b_proj_weight 参数，无需加载真实权重
fastdeploy/model_executor/models/deepseek_v3.py：forward 中为 value padding 添加条件判断，仅当 qk_head_dim != v_head_dim 时才执行 pad 操作，避免无效计算
fastdeploy/worker/gpu_model_runner.py：将 EP 模式下 input_length <= 32 的限制改为仅在非 profiling 模式下生效；在 RUN_DUMMY_FOR_PROFILE=1 时增加分布式 barrier 和端到端耗时打印

Usage or Command

RUN_DUMMY_FOR_PROFILE=1 python -m fastdeploy.worker.worker_process ...

Accuracy Tests

N/A（本 PR 为 profiling/benchmarking 工具，不影响模型推理精度）

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-11T09:43:21Z

Thanks for your contribution!

CLAassistant · 2026-05-11T09:43:22Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

chang-wenbin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

PaddlePaddle-bot · 2026-05-11T10:12:15Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-11 20:44:34

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: b6e5802
Merge base: 173d6cd (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

⚠️ 存在 1 个 required 失败任务，另有 2 个 required 任务运行中，请关注后续结果。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
36(0)	36	30	2	3	1	0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	7s	PR问题：修改 fastdeploy/envs.py 未获 RD 审批	请 jiangjiajun 等 RD 成员之一 Approve	Job	-
⏳	`run_tests_with_coverage`	-	运行中	-	Job	-
⏳	`xpu_4cards_case_test / run_xpu_4cards_cases`	-	运行中	-	Job	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 23/26 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Trigger Jenkins for PR`	21m9s	Job	-
⏳	`Run iluvatar Tests / run_iluvatar_cases`	-	Job	-
⏸️	`CI_HPU`	-	-	-
✅	其余 23 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 代码审批（置信度: 高）

Approval

状态: ❌ 失败
错误类型: 代码审批
置信度: 高
根因摘要: PR 修改 fastdeploy/envs.py 未获 FastDeploy RD 审批
分析器: 通用分析(fallback)

根因详情:
审批脚本 scripts/check_approval.sh 检测到本次 PR 修改了 fastdeploy/envs.py，该文件属于受保护文件，需要至少一位 FastDeploy RD 成员（jiangjiajun、liuyuanle、chenjian26、wanglongzhi 之一）进行 Review Approve 后，此 CI 检查才会通过。

关键日志:

0. You must have one FastDeploy RD (Jiang-Jia-Jun(jiangjiajun), yuanlehome(liuyuanle), rainyfly(chenjian26), Wanglongzhi2001(wanglongzhi)) approval for modifying [fastdeploy/envs.py].
There are 1 approved errors.
Process completed with exit code 6.

修复建议:

请以下 FastDeploy RD 成员之一在 PR 页面点击 "Approve"：jiangjiajun、liuyuanle、chenjian26、wanglongzhi
若 fastdeploy/envs.py 的修改是必要的，请在 PR 描述中说明修改原因，方便 RD 审批

修复建议摘要: 请 jiangjiajun/liuyuanle/chenjian26/wanglongzhi 之一 Approve PR

关联变更: PR 修改了 fastdeploy/envs.py（触发了审批要求）
链接: 查看日志

codecov-commenter · 2026-05-11T11:13:03Z

Codecov Report

❌ Patch coverage is 5.40541% with 35 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@173d6cd). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/worker/gpu_model_runner.py	0.00%	9 Missing and 4 partials ⚠️
...l_executor/layers/moe/fused_moe_cutlass_backend.py	11.11%	4 Missing and 4 partials ⚠️
...executor/layers/moe/fused_moe_blackwell_backend.py	0.00%	7 Missing ⚠️
..._executor/layers/moe/fused_moe_deepgemm_backend.py	14.28%	3 Missing and 3 partials ⚠️
fastdeploy/worker/worker_process.py	0.00%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7776   +/-   ##
==========================================
  Coverage           ?   71.48%           
==========================================
  Files              ?      396           
  Lines              ?    55862           
  Branches           ?     8737           
==========================================
  Hits               ?    39931           
  Misses             ?    13173           
  Partials           ?     2758

Flag	Coverage Δ
GPU	`71.48% <5.40%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-11 20:09:06

📋 Review 摘要

PR 概述：为 MoE 模型在 dummy 推理模式下添加 FD_RUN_DUMMY_FOR_PROFILE 环境变量支持，实现 gate 随机替换与端到端耗时测量
变更范围：fastdeploy/envs.py、model_executor/layers/moe/（3 个 backend）、worker/gpu_model_runner.py、worker/worker_process.py
影响面 Tag：[Benchmark] [OP]

📝 PR 规范检查

存在两处问题：

## Modifications 描述的是 linear.py、deepseek_v3.py 等未出现在 diff 中的文件，明显是历史遗留/错贴内容；
## Motivation 及 ## Usage or Command 中仍使用旧环境变量名 RUN_DUMMY_FOR_PROFILE，与实际代码改用的 FD_RUN_DUMMY_FOR_PROFILE 不一致。

标题建议（可直接复制）：

[Benchmark] Add FD_RUN_DUMMY_FOR_PROFILE support for MoE dummy profiling

PR 描述建议（可直接复制）：

## Motivation
为支持 MoE 模型（含 DeepSeek V3）在 dummy 模式（不加载真实权重）下完成前向推理，用于性能 profiling 和 benchmark 场景。通过环境变量 `FD_RUN_DUMMY_FOR_PROFILE=1` 开启端到端耗时测量，并使用随机数据替换 gate 输出以避免零权重导致路由退化。同时移除 EP profiling 模式下的 token 数量上限（32）限制，并将旧环境变量 `RUN_DUMMY_FOR_PROFILE` 统一迁移至 `FD_RUN_DUMMY_FOR_PROFILE`。

## Modifications
- `fastdeploy/envs.py`：新增环境变量 `FD_RUN_DUMMY_FOR_PROFILE`（默认 0），统一管理 dummy profiling 开关
- `fastdeploy/model_executor/layers/moe/fused_moe_blackwell_backend.py`：在 `apply_ep_prefill`、`apply_ep_decode`、`apply_tp` 中，当 `FD_RUN_DUMMY_FOR_PROFILE=1` 时用随机数据替换 gate 输出
- `fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py`：在 4 处 gate_out 计算后添加 dummy 随机替换
- `fastdeploy/model_executor/layers/moe/fused_moe_deepgemm_backend.py`：在 3 处 gate_out 计算后添加 dummy 随机替换
- `fastdeploy/worker/gpu_model_runner.py`：将 EP 模式下 `input_length <= 32` 的限制改为仅在非 profiling 模式下生效；在 `FD_RUN_DUMMY_FOR_PROFILE=1` 时增加分布式 barrier 和端到端耗时打印
- `fastdeploy/worker/worker_process.py`：将旧的 `os.getenv("RUN_DUMMY_FOR_PROFILE", "0")` 改为使用 `envs.FD_RUN_DUMMY_FOR_PROFILE`

## Usage or Command

    FD_RUN_DUMMY_FOR_PROFILE=1 python -m fastdeploy.worker.worker_process ...

## Accuracy Tests
N/A（本 PR 为 profiling/benchmarking 工具，不影响模型推理精度）

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
📝 PR 规范	N/A	`Modifications` 段落描述了不在 diff 中的文件；`Motivation`/`Usage` 中引用了旧环境变量名 `RUN_DUMMY_FOR_PROFILE`
🟡 建议	N/A	`fused_moe_triton/marlin/wint2_backend.py` 三个 MoE backend 未添加 `FD_RUN_DUMMY_FOR_PROFILE` 支持，若使用这些 backend 时 dummy profiling 行为不一致
❓ 疑问	`worker_process.py:1336`	上方注释仍引用旧环境变量名 `RUN_DUMMY_FOR_PROFILE`

总体评价

改动整体思路清晰，正确通过环境变量控制 dummy profiling 模式，EP 限制放开逻辑合理。主要问题是 PR 描述与实际变更文件不符需要更新，以及 triton/marlin/wint2 三个 MoE backend 未同步添加 dummy gate 支持，建议补齐以保证一致性。

chang-wenbin added 2 commits May 11, 2026 17:15

support mla dummy load & optimize mla mem about value padding

b81c275

support dummy run prefile

19b36b4

chang-wenbin had a problem deploying to Metax_ci May 11, 2026 09:43 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

chang-wenbin changed the title ~~Dummy prefill~~ [Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling May 11, 2026

support dummy profile

11b4762

chang-wenbin had a problem deploying to Metax_ci May 11, 2026 11:47 — with GitHub Actions Error

support dummy profile

22c9d0c

chang-wenbin had a problem deploying to Metax_ci May 11, 2026 11:48 — with GitHub Actions Error

Merge remote-tracking branch 'origin/develop' into dummy_dsk

d943156

chang-wenbin had a problem deploying to Metax_ci May 11, 2026 11:48 — with GitHub Actions Error

support dummy profile

b6e5802

chang-wenbin had a problem deploying to Metax_ci May 11, 2026 11:56 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

PaddlePaddle-bot reviewed May 11, 2026

View reviewed changes

Comment thread fastdeploy/worker/worker_process.py

zhoutianzi666 approved these changes May 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling#7776

[Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling#7776
chang-wenbin wants to merge 6 commits into
PaddlePaddle:developfrom
chang-wenbin:dummy_prefill

chang-wenbin commented May 11, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

CLAassistant commented May 11, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 11, 2026 •

edited

Loading

Approval

Uh oh!

codecov-commenter commented May 11, 2026 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

chang-wenbin commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 11, 2026

Uh oh!

CLAassistant commented May 11, 2026

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 7/10 通过

2.2 可选任务 — 23/26 通过

3 失败详情（仅 required）

Approval

Uh oh!

codecov-commenter commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

chang-wenbin commented May 11, 2026 •

edited

Loading

PaddlePaddle-bot commented May 11, 2026 •

edited

Loading

codecov-commenter commented May 11, 2026 •

edited

Loading