Skip to content

Support PaddlePaddle with compatible API and tvm-ffi#2

Closed
SigureMo wants to merge 10 commits into
mainfrom
support-paddlepaddle-with-compatible-api-and-tvmffi
Closed

Support PaddlePaddle with compatible API and tvm-ffi#2
SigureMo wants to merge 10 commits into
mainfrom
support-paddlepaddle-with-compatible-api-and-tvmffi

Conversation

@SigureMo
Copy link
Copy Markdown

@SigureMo SigureMo commented Oct 2, 2025

A new approach to replace #1. This PR need PaddlePaddle/Paddle#75651 and PaddlePaddle/Paddle#75650.

@SigureMo SigureMo marked this pull request as draft October 2, 2025 07:19
Comment thread flashinfer/fused_moe/core.py Outdated
@SigureMo SigureMo force-pushed the support-paddlepaddle-with-compatible-api-and-tvmffi branch from fd12761 to 6299602 Compare October 13, 2025 02:32
@SigureMo SigureMo force-pushed the support-paddlepaddle-with-compatible-api-and-tvmffi branch from a038d38 to 955aedf Compare October 24, 2025 11:47
@SigureMo SigureMo closed this Dec 13, 2025
@SigureMo SigureMo deleted the support-paddlepaddle-with-compatible-api-and-tvmffi branch December 13, 2025 19:44
BingooYang pushed a commit that referenced this pull request May 8, 2026
<!-- .github/pull_request_template.md -->

## 📌 Description

To fix the following bug:
When the CuteDSL MoE kernels were ported from TensorRT-LLM to
FlashInfer, the mPtrPermutedIdxToExpandedIdx field was accidentally
dropped from the routing kernel's DataBase struct in RoutingKernel.h.
TRT-LLM's routing kernel produces three reverse-mapping outputs:

1. mPtrExpandedIdxToPermutedIdx[expandedIdx] = permutedIdx — forward
mapping
2. mPtrPermutedIdxToExpandedIdx[permutedIdx] = expandedIdx — reverse to
expanded index (token_idx * topk + k)
3. mPtrPermutedIdxToTokenIdx[permutedIdx] = tokenIdx — reverse to token
index only

FlashInfer's port kept only #1 and #3, dropping #2. The binding in
moe_utils_binding.cu then had to wire the Python buffer
permuted_idx_to_expanded_idx to the only available reverse-mapping field
— mPtrPermutedIdxToTokenIdx — which writes plain tokenIdx instead of
expandedIdx.
The Impact
The CuteDSL kernels (GEMM1 gather, moe_output_memset, GEMM2 finalize)
all expect expanded indices and derive the token index via expanded_idx
// topk. When they received plain tokenIdx instead, they computed
tokenIdx // topk — yielding the wrong A row for gather, wrong zero-init
for memset, and wrong scatter position + wrong routing scale for
finalize.

<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->

## 🔍 Related Issues

<!-- Link any related issues here -->

## 🚀 Pull Request Checklist

Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.

### ✅ Pre-commit Checks

- [ ] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [ ] I have installed the hooks with `pre-commit install`.
- [ ] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.

> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).

## 🧪 Tests

- [ ] Tests have been added or updated as needed.
- [ ] All tests are passing (`unittest`, etc.).

## Reviewer Notes

<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Refined MOE (Mixture of Experts) routing infrastructure by extending
index mapping capabilities across multiple kernel implementations to
improve internal data flow consistency.

* **Tests**
* Strengthened accuracy validation thresholds from 0.925 to 0.97 with
adjusted error tolerance parameters, ensuring more rigorous testing of
MOE operations under FP4 quantization conditions.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants