[Do Not Merge] [WIP till AMD releases MXFP4 of MiniMax M2.5] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4)#827
Open
functionstackx wants to merge 6 commits intomainfrom
Open
Conversation
Add MiniMax M2.1 MXFP4 benchmark config for MI355x with vLLM v0.16.0. - Model: amd/MiniMax-M2.1-MXFP4 - TP=2 and TP=4 (matching MiniMax M2.5 FP8 pattern) - Only VLLM_ROCM_USE_AITER=1 env var (per Andy Luo recipe) - Seq lengths: 1k1k, 1k8k, 8k1k (conc 4-64) Closes #826 Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Contributor
Author
|
@claude can u comment out TP=4 and only do TP=2 for now since TP=4 is bugged vllm-project/vllm#35637 |
Collaborator
|
Claude finished @functionstackx's task in 53s —— View job Commented out TP=4 for MiniMax M2.1 MXFP4Due to vllm-project/vllm#35637, TP=4 has been disabled. Only TP=2 will run.
Changes made:
|
TP=4 is bugged for this model per vllm-project/vllm#35637. Comment out TP=4 search-space entries, keeping only TP=2. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Contributor
Author
|
@chunfangamd i am seeing good mxfp4 performance when comparing to FP8. just waiting for AMD to release mxfp4 version of MiniMax M2.5 now. currently AMD only have mxfp4 version of MiniMax M2.1.
|
Author: hongxiayang - Keep AITER for attention but disable it specifically for MoE, so the fused MoE falls back to triton kernels that can handle N=384, when TP=4 and N=192 when TP=8. - Install the amd-quark library to fix the crash when TP=4 with VLLM_ROCM_USE_AITER_MOE=0.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

[wip for path clearing] while waiting for AMD to release mxfp4 minimax m2.5.https://huggingface.co/amd/models?search=mxfp4
Add MiniMax M2.1 MXFP4 benchmark config for MI355x with vLLM v0.16.0.
Closes #826
Generated with Claude Code