Update version by BingooYang · Pull Request #13 · PFCCLab/flashinfer

BingooYang · 2026-05-08T10:23:30Z

📌 Description

升级版本到0.6.11
当前适配跑通：

#	测试	状态
1	`tests/attention/test_attention_sink_blackwell.py -k test_blackwell_trtllm_gen_context_attention_sink`	PASS 72/72
2	`tests/attention/test_attention_sink_blackwell.py -k test_blackwell_trtllm_gen_decode_attention_sink`	PASS 72/72
3	`tests/moe/test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke`	PASS
4	`tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2]`	PASS
5	`test_trtllm_gen_fused_moe.py::test_renormalize_routing[...FP8_Block_DeepSeek-1024-1024-8-RandomHiddenStates]`	PASS
6	`test_trtllm_gen_fused_moe.py::test_sigmoid_routing[...FP8_Block_DeepSeek-1024-1024-8]`	PASS
7	`test_trtllm_gen_fused_moe.py::test_dyn_block_kernel_routing[...FP8_Block_DeepSeek...]`	PASS
8	`test_trtllm_gen_fused_moe.py::test_tier_1024_experts_routing[...FP8_Block_DeepSeek...]`	PASS
9	`test_trtllm_gen_fused_moe.py::test_deepseek_ngroup1_block_per_token_routing[...FP8_Block_DeepSeek...]`	PASS
10	`test_trtllm_gen_fused_moe.py::test_routing_dtype_flexibility[...FP8_Block_DeepSeek...]`	PASS
11	`test_trtllm_gen_fused_moe.py::test_mxfp8_block_scale_moe_relu2_non_gated[...Shuffled E32_K4]`	PASS
12	`test_trtllm_gen_fused_moe.py::test_mxfp8_block_scale_moe_relu2_deepseekv3_topk22`	PASS
13	`test_trtllm_gen_fused_moe.py::test_fp8_block_scale_autotune_valid_configs[...MxFp8_Relu2]`	PASS
14	`test_trtllm_gen_fused_moe.py::test_fp8_per_tensor_autotune_valid_configs_nonefp8[...PerTensor_Swiglu]`	PASS
15	`test_trtllm_gen_fused_moe.py::test_llama4_routing[...FP8_Tensor-1024-1024-8]`	FAIL（SM100 无该 kernel，非 paddle 问题）
16	`test_trtllm_gen_fused_moe.py::test_deepseekv3_routing`	SKIP（测试内部 guard）
17	`test_trtllm_gen_fused_moe.py::test_nvfp4_moe_gemm_bias`	SKIP（测试内部 guard）
18	`tests/norm/test_fused_rmsnorm_silu.py`	PASS 102/152（50 skip：`torch.float4_e2m1fn_x2` 未暴露，与 paddle 适配无关）

🔍 Related Issues

🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete.

✅ Pre-commit Checks

I have installed pre-commit by running pip install pre-commit (or used your preferred method).
I have installed the hooks with pre-commit install.
I have run the hooks manually with pre-commit run --all-files and fixed any reported issues.

If you are unsure about how to set up pre-commit, see the pre-commit documentation.

🧪 Tests

Tests have been added or updated as needed.
All tests are passing (unittest, etc.).

Reviewer Notes

- enable paddle torch proxy in conftest via paddle.enable_compat(scope={"flashinfer"}) - in tests/attention/test_attention_sink_blackwell.py: prepend paddle.enable_compat(), replace torch.manual_seed with paddle.seed, replace torch.testing.assert_close with numpy.testing.assert_allclose, parametrize to a minimal shape for quick verification - flashinfer/utils.py: access TorchVersion via torch.torch_version proxy with fallback for paddle compat where paddle.torch_version is not exposed - flashinfer/cute_dsl/fp4_common.py: add "from __future__ import annotations" to defer evaluation of "int | torch.device | str | None" annotation which fails under paddle proxy (torch.device is a CallableProxyModule, not a type) adapt prefill trtllm paged attention for paddle compat - flashinfer/prefill.py: convert workspace_size (tensor scalar from numel()*element_size()) to Python int via .item() before passing to the tvm_ffi C++ kernel, which expects int but receives ffi.Tensor under paddle (doc item PFCCLab#11) - tests/conftest.py: revert paddle.enable_compat() to global scope so that `import torch` at conftest module level (outside flashinfer scope) also resolves via the proxy paddle compat: decode workspace_size .item(), moe fp8 index via int8 view, autotuner shape tuple, moe test support allreduce fusion dist.group.WORLD compat modify readme modify format fix env issue fix some issue paddle compat: fix dtype.itemsize + expand trtllm_allreduce_fusion test - flashinfer/comm/trtllm_ar.py: paddle.dtype has no `itemsize`; add _DTYPE_SIZE_MAP + _dtype_itemsize() fallback used in _should_use_oneshot (fixes AttributeError when use_oneshot=None triggers the heuristic). - tests/comm/test_trtllm_allreduce_fusion.py: restore full parametrize scope (patterns/layouts/pdls/oneshots/trigger/fp32_acc); drop leftover [DBG] prints; guard `if __name__ == "__main__"` block so mp-spawn children do not re-enter it under pytest (was double-initializing paddle TCPStore and SIGABRT in libuv). Verified: pytest tests/comm/test_trtllm_allreduce_fusion.py::test_trtllm_allreduce_fusion[True-1024-dtype0-2] and [False-1024-dtype0-2] both pass on 2xGPU. add adaptation paddle skill paddle compat: revert over-adaptation in test_trtllm_gen_fused_moe `torch.cuda.get_device_capability`, `tensor.device`, and `tensor.to(device)` are fully aligned under `paddle.enable_compat()`. Revert the earlier paddle-specific detours (`torch.device.cuda.get_device_capability`, `paddle.device(x.place)`, `paddle.get_device()`) back to plain torch APIs. Also record the finding in adaptation-paddle skill (§10, items 31-34) as a "do-not-over-adapt" reference for future MoE test reviews. Verified: `pytest tests/moe/test_trtllm_gen_fused_moe.py -k test_moe_quantization_classes` passes (1 passed). paddle compat: restore test_trtllm_gen_fused_moe to upstream + minimal patches The previous adaptation commented out / trimmed ~1800 lines from upstream, making future rebases painful and dropping valid test coverage. Reset the file to exact upstream content (github.com/flashinfer-ai/flashinfer main) and keep only the minimum compat patches needed to run on paddle: test file patches: - add `import paddle; paddle.enable_compat()` at top - `block.aminmax()` -> `block.float().aminmax()` (paddle missing bf16 kernel) - fp8 slice assign via `.view(torch.int8)` on both sides (paddle missing fp8 set_value kernel) - `expertLogits.cpu()` -> `.cpu().float()` (paddle missing cpu-bf16 topk) - `torch.random.manual_seed` -> `torch.manual_seed` (paddle.random lacks manual_seed) - `torch.device(device="cuda")` -> `torch.device("cuda")` (paddle Device rejects kwarg) same `torch.device(...)` kwarg fix in tests/moe/utils.py. library patch (flashinfer/autotuner.py): - `torch.cuda.OutOfMemoryError` missing under paddle. Use a sentinel placeholder class (NOT `RuntimeError` - that would silently swallow real kernel errors). Verified: `pytest test_trtllm_gen_fused_moe.py::test_fp8_block_scale_routed_activation_type_relu2_smoke` passes. Larger parametrized cases still need library-side fixes (e.g. `core.py::_init_packed_topk_ids` bitwise_or dtype mismatch). Docs (skills/adaptation-paddle): record new patches 31-36 and the "do-not-trim-upstream" lesson. paddle compat: fix bitwise_or dtype mismatch in _init_packed_topk_ids torch implicitly promotes int16->int32 in `(expert_ids << 16) | expert_weights`. Paddle's bitwise_or does not, so it raises ValueError: The type of data we are trying to retrieve (int16) does not match the type of data (int32) Explicitly .to(torch.int32) after .view(torch.int16). Works on both backends. With this fix, routing-family tests (renormalize/sigmoid/deepseekv3/topk/ llama4/dyn_block/tier_1024/deepseek_ngroup1/routing_dtype_flexibility) all progress past the dtype check. Remaining failures on this machine are infrastructure (cubin artifactory unreachable), not paddle-compat. modify skill fix some issues paddle compat: test_fused_rmsnorm_silu zero-patch adaptation tests/norm/test_fused_rmsnorm_silu.py runs under paddle.enable_compat() with no source changes (conftest.py already enables compat). Full run: 102 passed, 50 skipped (all skips due to torch.float4_e2m1fn_x2 missing from paddle torch-proxy, not a kernel adaptation issue). - adp_test.md: add row 18 recording PASS 102/152 - adaptation_exp.md: add section XI (flashinfer-ai#37-39) documenting zero-patch result, rationale, reproduction command, and the methodology recommendation (bare-run first, consult adaptation table only on failure). fix format fix some issue

BingooYang force-pushed the update_last branch 2 times, most recently from 9c5aeab to 9dc4281 Compare May 13, 2026 11:27

BingooYang force-pushed the update_last branch from 9dc4281 to d641241 Compare May 13, 2026 12:24

BingooYang merged commit 2c119af into PFCCLab:0.6 May 13, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update version#13

Update version#13
BingooYang merged 1 commit into
PFCCLab:0.6from
BingooYang:update_last

BingooYang commented May 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BingooYang commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Description

🔍 Related Issues

🚀 Pull Request Checklist

✅ Pre-commit Checks

🧪 Tests

Reviewer Notes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BingooYang commented May 8, 2026 •

edited

Loading