Skip to content

[Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling#7776

Open
chang-wenbin wants to merge 6 commits into
PaddlePaddle:developfrom
chang-wenbin:dummy_prefill
Open

[Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling#7776
chang-wenbin wants to merge 6 commits into
PaddlePaddle:developfrom
chang-wenbin:dummy_prefill

Conversation

@chang-wenbin
Copy link
Copy Markdown
Collaborator

@chang-wenbin chang-wenbin commented May 11, 2026

Motivation

为支持 DeepSeek V3 MLA 模型在 dummy 模式(不加载真实权重)下完成前向推理,用于性能 profiling 和 benchmark 场景。通过环境变量 RUN_DUMMY_FOR_PROFILE=1 开启端到端耗时测量,同时修正 value padding 的条件判断逻辑。

Modifications

  • fastdeploy/model_executor/layers/linear.pyKVBatchLinear.__init__load_choices == "dummy" 时直接创建零初始化的 k_b_proj_weight / v_b_proj_weight 参数,无需加载真实权重
  • fastdeploy/model_executor/models/deepseek_v3.pyforward 中为 value padding 添加条件判断,仅当 qk_head_dim != v_head_dim 时才执行 pad 操作,避免无效计算
  • fastdeploy/worker/gpu_model_runner.py:将 EP 模式下 input_length <= 32 的限制改为仅在非 profiling 模式下生效;在 RUN_DUMMY_FOR_PROFILE=1 时增加分布式 barrier 和端到端耗时打印

Usage or Command

RUN_DUMMY_FOR_PROFILE=1 python -m fastdeploy.worker.worker_process ...

Accuracy Tests

N/A(本 PR 为 profiling/benchmarking 工具,不影响模型推理精度)

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 11, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


chang-wenbin seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

PaddlePaddle-bot

This comment was marked as outdated.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 11, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-11 20:44:34

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

⚠️ 存在 1 个 required 失败任务,另有 2 个 required 任务运行中,请关注后续结果。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
36(0) 36 30 2 3 1 0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 7s PR问题:修改 fastdeploy/envs.py 未获 RD 审批 请 jiangjiajun 等 RD 成员之一 Approve Job -
run_tests_with_coverage - 运行中 - Job -
xpu_4cards_case_test / run_xpu_4cards_cases - 运行中 - Job -
其余 7 个必选任务通过 - - - - -

2.2 可选任务 — 23/26 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Trigger Jenkins for PR 21m9s Job -
Run iluvatar Tests / run_iluvatar_cases - Job -
⏸️ CI_HPU - - -
其余 23 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 代码审批(置信度: 高)

Approval

  • 状态: ❌ 失败
  • 错误类型: 代码审批
  • 置信度: 高
  • 根因摘要: PR 修改 fastdeploy/envs.py 未获 FastDeploy RD 审批
  • 分析器: 通用分析(fallback)

根因详情:
审批脚本 scripts/check_approval.sh 检测到本次 PR 修改了 fastdeploy/envs.py,该文件属于受保护文件,需要至少一位 FastDeploy RD 成员(jiangjiajun、liuyuanle、chenjian26、wanglongzhi 之一)进行 Review Approve 后,此 CI 检查才会通过。

关键日志:

0. You must have one FastDeploy RD (Jiang-Jia-Jun(jiangjiajun), yuanlehome(liuyuanle), rainyfly(chenjian26), Wanglongzhi2001(wanglongzhi)) approval for modifying [fastdeploy/envs.py].
There are 1 approved errors.
Process completed with exit code 6.

修复建议:

  1. 请以下 FastDeploy RD 成员之一在 PR 页面点击 "Approve":jiangjiajunliuyuanlechenjian26wanglongzhi
  2. fastdeploy/envs.py 的修改是必要的,请在 PR 描述中说明修改原因,方便 RD 审批

修复建议摘要: 请 jiangjiajun/liuyuanle/chenjian26/wanglongzhi 之一 Approve PR

关联变更: PR 修改了 fastdeploy/envs.py(触发了审批要求)
链接: 查看日志

@chang-wenbin chang-wenbin changed the title Dummy prefill [Benchmark] Add dummy prefill support for DeepSeek V3 MLA profiling May 11, 2026
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 11, 2026

Codecov Report

❌ Patch coverage is 5.40541% with 35 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@173d6cd). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/worker/gpu_model_runner.py 0.00% 9 Missing and 4 partials ⚠️
...l_executor/layers/moe/fused_moe_cutlass_backend.py 11.11% 4 Missing and 4 partials ⚠️
...executor/layers/moe/fused_moe_blackwell_backend.py 0.00% 7 Missing ⚠️
..._executor/layers/moe/fused_moe_deepgemm_backend.py 14.28% 3 Missing and 3 partials ⚠️
fastdeploy/worker/worker_process.py 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7776   +/-   ##
==========================================
  Coverage           ?   71.48%           
==========================================
  Files              ?      396           
  Lines              ?    55862           
  Branches           ?     8737           
==========================================
  Hits               ?    39931           
  Misses             ?    13173           
  Partials           ?     2758           
Flag Coverage Δ
GPU 71.48% <5.40%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot

This comment was marked as outdated.

Copy link
Copy Markdown

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-05-11 20:09:06

📋 Review 摘要

PR 概述:为 MoE 模型在 dummy 推理模式下添加 FD_RUN_DUMMY_FOR_PROFILE 环境变量支持,实现 gate 随机替换与端到端耗时测量
变更范围fastdeploy/envs.pymodel_executor/layers/moe/(3 个 backend)、worker/gpu_model_runner.pyworker/worker_process.py
影响面 Tag[Benchmark] [OP]

📝 PR 规范检查

存在两处问题:

  1. ## Modifications 描述的是 linear.pydeepseek_v3.py未出现在 diff 中的文件,明显是历史遗留/错贴内容;
  2. ## Motivation## Usage or Command 中仍使用旧环境变量名 RUN_DUMMY_FOR_PROFILE,与实际代码改用的 FD_RUN_DUMMY_FOR_PROFILE 不一致。

标题建议(可直接复制):

  • [Benchmark] Add FD_RUN_DUMMY_FOR_PROFILE support for MoE dummy profiling

PR 描述建议(可直接复制):

## Motivation
为支持 MoE 模型(含 DeepSeek V3)在 dummy 模式(不加载真实权重)下完成前向推理,用于性能 profiling 和 benchmark 场景。通过环境变量 `FD_RUN_DUMMY_FOR_PROFILE=1` 开启端到端耗时测量,并使用随机数据替换 gate 输出以避免零权重导致路由退化。同时移除 EP profiling 模式下的 token 数量上限(32)限制,并将旧环境变量 `RUN_DUMMY_FOR_PROFILE` 统一迁移至 `FD_RUN_DUMMY_FOR_PROFILE`## Modifications
- `fastdeploy/envs.py`:新增环境变量 `FD_RUN_DUMMY_FOR_PROFILE`(默认 0),统一管理 dummy profiling 开关
- `fastdeploy/model_executor/layers/moe/fused_moe_blackwell_backend.py`:在 `apply_ep_prefill``apply_ep_decode``apply_tp` 中,当 `FD_RUN_DUMMY_FOR_PROFILE=1` 时用随机数据替换 gate 输出
- `fastdeploy/model_executor/layers/moe/fused_moe_cutlass_backend.py`:在 4 处 gate_out 计算后添加 dummy 随机替换
- `fastdeploy/model_executor/layers/moe/fused_moe_deepgemm_backend.py`:在 3 处 gate_out 计算后添加 dummy 随机替换
- `fastdeploy/worker/gpu_model_runner.py`:将 EP 模式下 `input_length <= 32` 的限制改为仅在非 profiling 模式下生效;在 `FD_RUN_DUMMY_FOR_PROFILE=1` 时增加分布式 barrier 和端到端耗时打印
- `fastdeploy/worker/worker_process.py`:将旧的 `os.getenv("RUN_DUMMY_FOR_PROFILE", "0")` 改为使用 `envs.FD_RUN_DUMMY_FOR_PROFILE`

## Usage or Command

    FD_RUN_DUMMY_FOR_PROFILE=1 python -m fastdeploy.worker.worker_process ...

## Accuracy Tests
N/A(本 PR 为 profiling/benchmarking 工具,不影响模型推理精度)

## Checklist

- [ ] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别 文件 概述
📝 PR 规范 N/A Modifications 段落描述了不在 diff 中的文件;Motivation/Usage 中引用了旧环境变量名 RUN_DUMMY_FOR_PROFILE
🟡 建议 N/A fused_moe_triton/marlin/wint2_backend.py 三个 MoE backend 未添加 FD_RUN_DUMMY_FOR_PROFILE 支持,若使用这些 backend 时 dummy profiling 行为不一致
❓ 疑问 worker_process.py:1336 上方注释仍引用旧环境变量名 RUN_DUMMY_FOR_PROFILE

总体评价

改动整体思路清晰,正确通过环境变量控制 dummy profiling 模式,EP 限制放开逻辑合理。主要问题是 PR 描述与实际变更文件不符需要更新,以及 triton/marlin/wint2 三个 MoE backend 未同步添加 dummy gate 支持,建议补齐以保证一致性。

Comment thread fastdeploy/worker/worker_process.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants