Skip to content

[WIP] H200 MINIMAX vLLM extend configs #869

Open
hshrivastava-droid wants to merge 9 commits intomainfrom
nv/h200-minimax
Open

[WIP] H200 MINIMAX vLLM extend configs #869
hshrivastava-droid wants to merge 9 commits intomainfrom
nv/h200-minimax

Conversation

@hshrivastava-droid
Copy link
Collaborator

@hshrivastava-droid hshrivastava-droid commented Mar 5, 2026

Summary

Extends the MiniMax M2.5 FP8 H200 vLLM benchmark configuration with updated parallelism and concurrency settings:

  • Config (nvidia-master.yaml): Changed TP from 4 → 8 and extended max concurrency from 64 → 128 across all sequence length configs (1k1k, 1k8k, 8k1k)
  • Benchmark script (minimaxm2.5_fp8_h200.sh): Added EP_SIZE env var check and conditional --enable-expert-parallel flag for vLLM, required for MoE models like MiniMax M2.5
  • Perf changelog: Added entry documenting the config update

Changes

File Change
.github/configs/nvidia-master.yaml TP 4→8, concurrency 64→128 for minimaxm2.5-fp8-h200-vllm
benchmarks/single_node/minimaxm2.5_fp8_h200.sh Add expert parallelism support via EP_SIZE env var
perf-changelog.yaml Add changelog entry for this PR

Context

MiniMax M2.5 is a Mixture-of-Experts (MoE) model that requires expert parallelism (EP) to run efficiently. Pure TP=8 alone is insufficient — EP must be enabled alongside it. This PR adds the conditional --enable-expert-parallel flag following the vLLM MiniMax recipe.

Test Plan

  • Run minimaxm2.5-fp8-h200-vllm benchmarks on H200 to validate TP=8 + EP configuration
  • Verify server starts successfully with --enable-expert-parallel

osl: 1024
search-space:
- { tp: 4, conc-start: 4, conc-end: 64 }
- { tp: 8, conc-start: 4, conc-end: 16 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pure tp=8 wont work on minimax. you need to enable ep with it and then in the benchmark bash script do "if ep >1 then --enalbe-expert-parallel"

https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html

https://github.com/SemiAnalysisAI/InferenceX/blob/main/benchmarks/single_node/minimaxm2.5_fp8_h200.sh

Image

@functionstackx functionstackx requested a review from a team March 6, 2026 01:04
@github-actions
Copy link
Contributor

github-actions bot commented Mar 6, 2026

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

@kedarpotdar-nv
Copy link
Collaborator

@claude please generate pr description

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 7, 2026

Claude finished @kedarpotdar-nv's task in 47s —— View job


Generating PR Description

  • Gather context from PR changes
  • Generate PR description
  • Update PR

Done — PR description has been updated with a summary of the changes, a diff table, context on why EP is needed for MiniMax M2.5 (MoE model), and a test plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants