[XPU] add build_sampling_params op. by Jiajun-Ji · Pull Request #7738 · PaddlePaddle/FastDeploy

Jiajun-Ji · 2026-05-07T10:44:54Z

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

将XPU下的padding_sampling_params的py实现改为XPU kernel实现build_sampling_params，此外将infer_seed更新收敛到build_sampling_params内部，并将infer_seed的increment_value步进对齐GPU实现。

Modifications

Usage or Command

Accuracy Tests

测试XPU kernel内INT64取模正常

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-05-07T10:45:03Z

Thanks for your contribution!

Copilot

Pull request overview

该 PR 为 XPU 后端新增 build_sampling_params 自定义算子，用 XPU kernel 替换原先 Python 的 sampling 参数 padding 逻辑，并尝试将 infer_seed 的更新收敛到算子内部，以对齐 GPU 的 seed 步进策略（尤其在 speculative decoding 场景）。

Changes:

新增 XPU build_sampling_params kernel + plugin wrapper + Paddle static op，并在 XPU speculative verify（TARGET_MATCH）路径中接入。
XPU ModelRunner 侧引入 increment_value（对齐 GPU：非 speculative 为 4，speculative 为 (num_speculative_tokens+1)*4），并调整 infer_seed 的更新时机。
新增 custom_ops/xpu_ops/test/test_build_sampling_params.py 单测，对比 Python 参考实现并覆盖多种 batch 形态与 wrap-around。

PR 元信息检查（需补充）

标题已包含 [XPU] tag，格式符合要求。
描述中 “Modifications / Usage or Command / Accuracy Tests” 等小节未补全；若该算子会影响采样结果或可复现性，建议补充 accuracy 对比与对应运行命令/环境信息；如不加单测或无法跑到 XPU CI，也需注明原因（本 PR 已新增单测文件，但仍建议在描述里给出如何运行）。

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
fastdeploy/worker/xpu_model_runner.py	计算并下发 `increment_value`，并调整 speculative 场景下 `infer_seed` 的更新逻辑
fastdeploy/model_executor/layers/sample/sampler.py	XPU verify(TARGET_MATCH) 路径改用 `build_sampling_params`，并透传 `increment_value`
custom_ops/xpu_ops/test/test_build_sampling_params.py	新增 XPU op 单测，与 Python 参考实现对齐校验
custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/build_sampling_params.cpp	新增 plugin wrapper（CPU + XPU3 分发）
custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/build_sampling_params.xpu	新增 Kunlun3 XPU kernel 实现
custom_ops/xpu_ops/src/plugin/include/xpu/plugin.h	导出 `build_sampling_params` 声明
custom_ops/xpu_ops/src/ops/mtp/build_sampling_params.cc	新增 Paddle static op 注册与调用桥接

            # 7. Updata 'infer_seed' and step_paddle()
-            self.share_inputs["infer_seed"].add_(self.infer_seed_increment)
-            self.share_inputs["infer_seed"][:] %= self.MAX_INFER_SEED
+            if not self.speculative_decoding:


+                share_inputs["seq_lens_this_time"],
+                share_inputs["seq_lens_encoder"],
+                token_num_output_cpu=int(share_inputs["cu_seqlens_q_output"][-1]),
+                increment_value=increment_value,


+  api::Context* ctx = xpu_ctx->x_context();
+  if (top_p.is_cpu()) {
+    ctx = new api::Context(api::kCPU);


+  // Shared prefix-sum buffer: each cluster computes its own pad_start via
+  // a two-pass scan over seq_lens_this_time / seq_lens_encoder.
+  // We use a simple approach: core 0 of cluster 0 writes per-batch start
+  // offsets into a global scratch area is not available here, so instead we
+  // compute pad_start with a sequential scan in core 0 of each cluster.
+  // Because clusters run concurrently we cannot share a global accumulator;
+  // instead each cluster independently sums the first `bi` entries.
+  // This is O(bs) per cluster but bs is typically small (<=512).


PaddlePaddle-bot · 2026-05-07T11:25:19Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-11 14:58:48

CI报告基于以下代码生成（30分钟更新一次）:

PR commit: f46f132
Merge base: 172ab60 (branch: develop)
查看完整 Diff
CI 详情

1 任务总览

CI 尚有 1 个 Required 任务失败，1 个 Required 任务运行中，1 个 Required 任务等待中，合并暂时受阻，请关注 Approval 失败并获取必要审批。

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
36(0)	36	29	3	2	2	0

2 任务状态汇总

2.1 Required任务 : 7/10 通过

必选任务阻塞合并，失败需优先处理。

状态	任务	耗时	根因	修复建议	日志	重跑
❌	`Approval`	11s	PR问题：新增自定义Op，缺少FD RD和Paddle RD各一人审批	请FD RD和Paddle RD各一人在PR页面点击Approve	Job	-
⏳	`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	-	运行中	-	Job	-
⏸️	`Run Four Cards Tests / run_4_cards_tests`	-	等待中	-	-	-
✅	其余 7 个必选任务通过	-	-	-	-	-

2.2 可选任务 — 22/26 通过

可选任务不阻塞合并，失败仅供参考。

状态	任务	耗时	日志	重跑
❌	`Check PR Template`	13s	Job	-
❌	`Trigger Jenkins for PR` (CI_METAX)	20m6s	Job	-
⏳	`Run iluvatar Tests / run_iluvatar_cases`	-	-	-
⏸️	`CI_HPU`	-	-	-
✅	其余 22 个可选任务通过	-	-	-

3 失败详情（仅 required）

Approval — 审批流程（置信度: 高）

Approval

状态: ❌ 失败
错误类型: 审批流程
置信度: 高
根因摘要: PR新增自定义Op，缺少FD RD和Paddle RD各一人审批
分析器: 通用分析(fallback)

根因详情:
PR 标题为 "[XPU] add build_sampling_params op."，新增自定义 Op 触发了审批检查脚本 scripts/check_approval.sh。该脚本要求：(1) FastDeploy RD 中至少一人审批（qingqing01/Jiang-Jia-Jun/heavengate）；(2) PaddlePaddle RD 中至少一人审批（jeff41404/yongqiangma）。当前两项均未满足，脚本输出 "There are 2 approved errors." 并以 exit code 6 退出。

关键日志:

0. You must have one FastDeploy RD (qingqing01(dangqingqing), Jiang-Jia-Jun(jiangjiajun), heavengate(dengkaipeng)) approval for adding custom op.
1. You must have one PaddlePaddle RD (jeff41404(gaoxiang), yongqiangma(mayongqiang)) approval for adding custom op.

There are 2 approved errors.
##[error]Process completed with exit code 6.

修复建议:

请 FastDeploy RD（@qingqing01 / @Jiang-Jia-Jun / @heavengate）之一在 PR 页面点击 Approve
请 PaddlePaddle RD（@jeff41404 / @yongqiangma）之一在 PR 页面点击 Approve

修复建议摘要: 请FD RD和Paddle RD各一人在PR页面点击Approve

链接: 查看日志

codecov-commenter · 2026-05-07T13:02:54Z

Codecov Report

❌ Patch coverage is 0% with 1 line in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@172ab60). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/layers/sample/sampler.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #7738   +/-   ##
==========================================
  Coverage           ?   71.61%           
==========================================
  Files              ?      396           
  Lines              ?    55702           
  Branches           ?     8709           
==========================================
  Hits               ?    39891           
  Misses             ?    13070           
  Partials           ?     2741

Flag	Coverage Δ
GPU	`71.61% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

+            top_k=sampling_metadata.top_k,
            top_k_list=sampling_metadata.top_k_list,
-            topp_seed=topp_seed,
+            topp_seed=sampling_metadata.topp_seed,


+        self.increment_value = (
+            4 if not self.speculative_decoding else (self.speculative_config.num_speculative_tokens + 1) * 4
+        )


            # 7. Updata 'infer_seed' and step_paddle()
-            self.share_inputs["infer_seed"].add_(self.infer_seed_increment)
-            self.share_inputs["infer_seed"][:] %= self.MAX_INFER_SEED
+            if not self.speculative_decoding:


+  // Shared prefix-sum buffer: each cluster computes its own pad_start via
+  // a two-pass scan over seq_lens_this_time / seq_lens_encoder.
+  // We use a simple approach: core 0 of cluster 0 writes per-batch start
+  // offsets into a global scratch area is not available here, so instead we
+  // compute pad_start with a sequential scan in core 0 of each cluster.
+  // Because clusters run concurrently we cannot share a global accumulator;
+  // instead each cluster independently sums the first `bi` entries.
+  // This is O(bs) per cluster but bs is typically small (<=512).
+
+  for (int bi = clusterid; bi < bs; bi += nclusters) {
+    if (cid == 0) {
+      // Read per-batch parameters from global memory.
+      float lm_top_p;
+      int64_t lm_top_k;
+      int64_t lm_seed;
+      int lm_slt;  // seq_lens_this_time[bi]
+      int lm_sle;  // seq_lens_encoder[bi]
+
+      GM2LM_ASYNC(top_p + bi, &lm_top_p, sizeof(float));
+      GM2LM_ASYNC(top_k + bi, &lm_top_k, sizeof(int64_t));
+      GM2LM_ASYNC(infer_seed + bi, &lm_seed, sizeof(int64_t));
+      GM2LM_ASYNC(seq_lens_this_time + bi, &lm_slt, sizeof(int));
+      GM2LM(seq_lens_encoder + bi, &lm_sle, sizeof(int));  // sync barrier
+
+      bool is_decoder = (lm_sle == 0);
+      int repeat = is_decoder ? lm_slt : 1;
+
+      // Compute pad_start = sum of token counts for batches [0, bi).
+      int pad_start = 0;
+      for (int k = 0; k < bi; k++) {
+        int slt_k, sle_k;
+        GM2LM_ASYNC(seq_lens_this_time + k, &slt_k, sizeof(int));
+        GM2LM(seq_lens_encoder + k, &sle_k, sizeof(int));
+        pad_start += (sle_k == 0) ? slt_k : 1;
+      }


+RequestFuncOutput(no=2347, request_id='None', generated_text='', reasoning_content='', success=False, latency=0.0, end_timestamp=0.0, output_tokens=0, ttft=0.0, arrival_time=[], itl=[], tpot=0.0, prompt_len=0, prompt_tokens=0, reasoning_tokens=0, res_ttft=0, error='{"error":{"message":"request[chatcmpl-814e8d96-3da8-46b0-b4da-31925c313041] generator error: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192), Traceback (most recent call last):\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/openai/serving_chat.py\\", line 168, in create_chat_completion\\n    prompt_token_ids = await self.engine_client.format_and_add_data(current_req_dict)\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 300, in format_and_add_data\\n    await self.add_requests(request)\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 390, in add_requests\\n    raise EngineError(error_msg, error_code=400)\\nfastdeploy.utils.EngineError: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192)\\n","type":"invalid_request_error","param":null,"code":null}}', metrics={}, tool_calls=[], output_ids=[])
+RequestFuncOutput(no=2347, request_id='None', generated_text='', reasoning_content='', success=False, latency=0.0, end_timestamp=0.0, output_tokens=0, ttft=0.0, arrival_time=[], itl=[], tpot=0.0, prompt_len=0, prompt_tokens=0, reasoning_tokens=0, res_ttft=0, error='{"error":{"message":"request[chatcmpl-799cdf97-ab7e-4823-80e4-1833bf5f7d90] generator error: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192), Traceback (most recent call last):\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/openai/serving_chat.py\\", line 168, in create_chat_completion\\n    prompt_token_ids = await self.engine_client.format_and_add_data(current_req_dict)\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 300, in format_and_add_data\\n    await self.add_requests(request)\\n  File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 390, in add_requests\\n    raise EngineError(error_msg, error_code=400)\\nfastdeploy.utils.EngineError: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192)\\n","type":"invalid_request_error","param":null,"code":null}}', metrics={}, tool_calls=[], output_ids=[])


Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

        _, next_tokens = top_k_top_p_sampling(
            probs,
-            top_p=top_p,
-            top_k=top_k,
+            top_p=sampling_metadata.top_p,
+            top_k=sampling_metadata.top_k,
            top_k_list=sampling_metadata.top_k_list,
-            topp_seed=topp_seed,
+            topp_seed=sampling_metadata.topp_seed,
        )


                sampling_metadata.seed,
-                paddle.reshape(share_inputs["seq_lens_this_time"], shape=[-1]),
-                paddle.reshape(share_inputs["seq_lens_encoder"], shape=[-1]),
+                share_inputs["seq_lens_this_time"],
+                share_inputs["seq_lens_encoder"],
+                token_num_output_cpu=int(share_inputs["cu_seqlens_q_output"][-1]),
+                increment_value=increment_value,
            )


-            self.share_inputs["infer_seed"][:] %= self.MAX_INFER_SEED
+            if not self.speculative_decoding:
+                self.share_inputs["infer_seed"].add_(self.infer_seed_increment)
+                self.share_inputs["infer_seed"][:] %= self.MAX_INFER_SEED


+  int64_t pad_idx = 0;
+  for (int bi = 0; bi < bs; bi++) {
+    bool is_decoder = (seq_lens_encoder[bi] == 0);
+    int repeat = is_decoder ? seq_lens_this_time[bi] : 1;
+    int64_t bi_seed = infer_seed[bi];
+    for (int local_pos = 0; local_pos < repeat; local_pos++) {
+      int64_t offset = is_decoder ? static_cast<int64_t>(local_pos) * 4 : 0LL;
+      top_p_padding[pad_idx] = top_p[bi];
+      top_k_padding[pad_idx] = top_k[bi];
+      topp_seed[pad_idx] = (bi_seed + offset) % BUILD_SAMPLING_MAX_INFER_SEED;
+      pad_idx++;
+    }
+    infer_seed[bi] =
+        (infer_seed[bi] + increment_value) % BUILD_SAMPLING_MAX_INFER_SEED;
+  }


+  // Shared prefix-sum buffer: each cluster computes its own pad_start via
+  // a two-pass scan over seq_lens_this_time / seq_lens_encoder.
+  // We use a simple approach: core 0 of cluster 0 writes per-batch start
+  // offsets into a global scratch area is not available here, so instead we
+  // compute pad_start with a sequential scan in core 0 of each cluster.
+  // Because clusters run concurrently we cannot share a global accumulator;
+  // instead each cluster independently sums the first `bi` entries.
+  // This is O(bs) per cluster but bs is typically small (<=512).
+
+  for (int bi = clusterid; bi < bs; bi += nclusters) {
+    if (cid == 0) {
+      // Read per-batch parameters from global memory.
+      float lm_top_p;
+      int64_t lm_top_k;
+      int64_t lm_seed;
+      int lm_slt;  // seq_lens_this_time[bi]
+      int lm_sle;  // seq_lens_encoder[bi]
+
+      GM2LM_ASYNC(top_p + bi, &lm_top_p, sizeof(float));
+      GM2LM_ASYNC(top_k + bi, &lm_top_k, sizeof(int64_t));
+      GM2LM_ASYNC(infer_seed + bi, &lm_seed, sizeof(int64_t));
+      GM2LM_ASYNC(seq_lens_this_time + bi, &lm_slt, sizeof(int));
+      GM2LM(seq_lens_encoder + bi, &lm_sle, sizeof(int));  // sync barrier
+
+      bool is_decoder = (lm_sle == 0);
+      int repeat = is_decoder ? lm_slt : 1;
+
+      // Compute pad_start = sum of token counts for batches [0, bi).
+      int pad_start = 0;
+      for (int k = 0; k < bi; k++) {
+        int slt_k, sle_k;
+        GM2LM_ASYNC(seq_lens_this_time + k, &slt_k, sizeof(int));
+        GM2LM(seq_lens_encoder + k, &sle_k, sizeof(int));
+        pad_start += (sle_k == 0) ? slt_k : 1;
+      }


PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-05-11 15:46:02

📋 Review 摘要

PR 概述：将 XPU 下 padding_sampling_params Python 实现替换为 XPU kernel（build_sampling_params），同时将 infer_seed 更新逻辑内化到 kernel，并对齐 GPU 的 increment_value 步进策略。

变更范围：custom_ops/xpu_ops/、fastdeploy/model_executor/layers/sample/sampler.py、fastdeploy/worker/xpu_model_runner.py

影响面 Tag：[XPU] [OP]

📝 PR 规范检查

## Modifications 和 ## Usage or Command 两个 section 内容为空（仅模板注释），Checklist 全部未勾选。

标题建议（可直接复制）：

[XPU][OP] Add build_sampling_params XPU kernel to replace Python padding_sampling_params

PR 描述建议（可直接复制，必须复刻 checklist §D2 模板的完整结构）：

## Motivation
将XPU下的padding_sampling_params的py实现改为XPU kernel实现build_sampling_params，此外将infer_seed更新收敛到build_sampling_params内部，并将infer_seed的increment_value步进对齐GPU实现。

## Modifications
- 新增 `custom_ops/xpu_ops/src/ops/mtp/build_sampling_params.cc`：Paddle 自定义 Op 注册（`PD_BUILD_STATIC_OP`），声明输入/输出/属性
- 新增 `custom_ops/xpu_ops/src/plugin/src/kernel/kunlun3cpp/mtp_kernel/build_sampling_params.xpu`：XPU3 kernel，每个 cluster 处理一个 batch item，并行填充 top_p/top_k/topp_seed，同时原地更新 infer_seed
- 新增 `custom_ops/xpu_ops/src/plugin/src/wrapper/mtp_wrapper/build_sampling_params.cpp`：CPU/XPU3 双路 wrapper，含 CPU reference 实现
- 更新 `custom_ops/xpu_ops/src/plugin/include/xpu/plugin.h`：新增 `build_sampling_params` 函数声明
- 更新 `fastdeploy/model_executor/layers/sample/sampler.py`：`_verify_and_sample_xpu` 改用 XPU kernel；`_normal_sample_xpu` 去掉 `padding_sampling_params`；`forward_xpu` 新增 `increment_value` 参数
- 更新 `fastdeploy/worker/xpu_model_runner.py`：动态计算 `increment_value`；推测解码模式下 seed 更新移交 kernel 内部
- 新增 `custom_ops/xpu_ops/test/test_build_sampling_params.py`：6 个单测覆盖纯 decoder、纯 encoder、混合、单 item、seed 溢出、每批单 token 场景

## Usage or Command
N/A（内部 XPU kernel 替换，不涉及对外接口变更）

## Accuracy Tests
测试XPU kernel内INT64取模正常（见 PR 图片）

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [x] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

问题

级别	文件	概述
📝 PR 规范	—	`## Modifications` 和 `## Usage or Command` section 内容为空，Checklist 全部未勾选
🟡 建议	`fastdeploy/worker/xpu_model_runner.py:171`	推测解码模式下 `infer_seed_increment` 已不再使用，但仍以 speculative `increment_value` 填充，易产生误解

总体评价

实现整体设计合理，XPU3 kernel 与 CPU wrapper 语义一致，infer_seed 原地更新和 seed 偏移逻辑正确对齐 GPU 路径；单测覆盖了 6 种典型场景（纯 decoder/encoder、混合、seed 溢出等）。编译系统通过 GLOB_RECURSE/os.walk 自动发现新文件，无需手动注册。建议补充 PR 描述的 Modifications 和 Usage 内容。

PaddlePaddle-bot · 2026-05-11T07:48:47Z


+        self.increment_value = (
+            4 if not self.speculative_decoding else (self.speculative_config.num_speculative_tokens + 1) * 4
+        )


🟡 建议 infer_seed_increment 在推测解码模式下实际不再使用（seed 更新已移入 build_sampling_params kernel），但此处仍以 speculative 的 increment_value 填充该张量，容易产生误解。

建议在推测解码模式下将该张量的 fill_value 保持为 4（非推测步进），或者添加注释说明其仅在非推测模式下生效：

# infer_seed_increment is only used in non-speculative mode; # in speculative mode, the seed update is handled inside build_sampling_params kernel. self.infer_seed_increment = paddle.full( shape=[self.scheduler_config.max_num_seqs, 1], fill_value=4, # always 4; speculative mode updates seed in-kernel dtype="int64", ).cpu()

Copilot AI review requested due to automatic review settings May 7, 2026 10:44

Jiajun-Ji had a problem deploying to Metax_ci May 7, 2026 10:44 — with GitHub Actions Failure

Copilot started reviewing on behalf of Jiajun-Ji May 7, 2026 10:45 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Jiajun-Ji had a problem deploying to Metax_ci May 7, 2026 11:33 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Copilot AI review requested due to automatic review settings May 8, 2026 03:04

Jiajun-Ji had a problem deploying to Metax_ci May 8, 2026 03:04 — with GitHub Actions Failure

Copilot started reviewing on behalf of Jiajun-Ji May 8, 2026 03:05 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

Jiajun-Ji had a problem deploying to Metax_ci May 8, 2026 05:29 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

Jiajun-Ji added 4 commits May 8, 2026 16:14

[XPU] add build_sampling_params op.

6b50448

remove padding_sampling_params in _normal_sample_xpu.

0e6cffb

fix top_k_top_p_sampling func call.

3cac20b

remove error_output.txt

cfc5936

Copilot AI review requested due to automatic review settings May 8, 2026 08:14

Jiajun-Ji force-pushed the xpu-build_sampling_params branch from 651d7cb to cfc5936 Compare May 8, 2026 08:14

Jiajun-Ji temporarily deployed to Metax_ci May 8, 2026 08:14 — with GitHub Actions Inactive

Copilot started reviewing on behalf of Jiajun-Ji May 8, 2026 08:15 View session

Copilot AI reviewed May 8, 2026

View reviewed changes

This comment was marked as outdated.

Sign in to view

add auto release cpu ctx.

f46f132

Jiajun-Ji had a problem deploying to Metax_ci May 11, 2026 06:16 — with GitHub Actions Failure

This comment was marked as outdated.

Sign in to view

PaddlePaddle-bot reviewed May 11, 2026

View reviewed changes

		RequestFuncOutput(no=2347, request_id='None', generated_text='', reasoning_content='', success=False, latency=0.0, end_timestamp=0.0, output_tokens=0, ttft=0.0, arrival_time=[], itl=[], tpot=0.0, prompt_len=0, prompt_tokens=0, reasoning_tokens=0, res_ttft=0, error='{"error":{"message":"request[chatcmpl-814e8d96-3da8-46b0-b4da-31925c313041] generator error: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192), Traceback (most recent call last):\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/openai/serving_chat.py\\", line 168, in create_chat_completion\\n prompt_token_ids = await self.engine_client.format_and_add_data(current_req_dict)\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 300, in format_and_add_data\\n await self.add_requests(request)\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 390, in add_requests\\n raise EngineError(error_msg, error_code=400)\\nfastdeploy.utils.EngineError: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192)\\n","type":"invalid_request_error","param":null,"code":null}}', metrics={}, tool_calls=[], output_ids=[])
		RequestFuncOutput(no=2347, request_id='None', generated_text='', reasoning_content='', success=False, latency=0.0, end_timestamp=0.0, output_tokens=0, ttft=0.0, arrival_time=[], itl=[], tpot=0.0, prompt_len=0, prompt_tokens=0, reasoning_tokens=0, res_ttft=0, error='{"error":{"message":"request[chatcmpl-799cdf97-ab7e-4823-80e4-1833bf5f7d90] generator error: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192), Traceback (most recent call last):\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/openai/serving_chat.py\\", line 168, in create_chat_completion\\n prompt_token_ids = await self.engine_client.format_and_add_data(current_req_dict)\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 300, in format_and_add_data\\n await self.add_requests(request)\\n File \\"/home/paddle_test/works/fd/FastDeploy/fastdeploy/entrypoints/engine_client.py\\", line 390, in add_requests\\n raise EngineError(error_msg, error_code=400)\\nfastdeploy.utils.EngineError: Input text is too long, input_ids_len (8191) + min_tokens(1) >= max_model_len(8192)\\n","type":"invalid_request_error","param":null,"code":null}}', metrics={}, tool_calls=[], output_ids=[])

Conversation

Jiajun-Ji commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 任务总览

2 任务状态汇总

2.1 Required任务 : 7/10 通过

2.2 可选任务 — 22/26 通过

3 失败详情（仅 required）

Approval

Uh oh!

This comment was marked as outdated.

Uh oh!

codecov-commenter commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

📝 PR 规范检查

问题

总体评价

Uh oh!

PaddlePaddle-bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Jiajun-Ji commented May 7, 2026 •

edited

Loading

PaddlePaddle-bot commented May 7, 2026 •

edited

Loading

codecov-commenter commented May 7, 2026 •

edited

Loading