[WIP] H200 MINIMAX vLLM extend configs by hshrivastava-droid · Pull Request #869 · SemiAnalysisAI/InferenceX

hshrivastava-droid · 2026-03-05T19:40:26Z

Summary

Extends the MiniMax M2.5 FP8 H200 vLLM benchmark configuration with updated parallelism and concurrency settings:

Config (nvidia-master.yaml): Changed TP from 4 → 8 and extended max concurrency from 64 → 128 across all sequence length configs (1k1k, 1k8k, 8k1k)
Benchmark script (minimaxm2.5_fp8_h200.sh): Added EP_SIZE env var check and conditional --enable-expert-parallel flag for vLLM, required for MoE models like MiniMax M2.5
Perf changelog: Added entry documenting the config update

Changes

File	Change
`.github/configs/nvidia-master.yaml`	TP 4→8, concurrency 64→128 for `minimaxm2.5-fp8-h200-vllm`
`benchmarks/single_node/minimaxm2.5_fp8_h200.sh`	Add expert parallelism support via `EP_SIZE` env var
`perf-changelog.yaml`	Add changelog entry for this PR

Context

MiniMax M2.5 is a Mixture-of-Experts (MoE) model that requires expert parallelism (EP) to run efficiently. Pure TP=8 alone is insufficient — EP must be enabled alongside it. This PR adds the conditional --enable-expert-parallel flag following the vLLM MiniMax recipe.

Test Plan

Run minimaxm2.5-fp8-h200-vllm benchmarks on H200 to validate TP=8 + EP configuration
Verify server starts successfully with --enable-expert-parallel

functionstackx · 2026-03-05T21:48:48Z

.github/configs/nvidia-master.yaml

    osl: 1024
    search-space:
-    - { tp: 4, conc-start: 4, conc-end: 64 }
+    - { tp: 8, conc-start: 4, conc-end: 16 }


pure tp=8 wont work on minimax. you need to enable ep with it and then in the benchmark bash script do "if ep >1 then --enalbe-expert-parallel"

https://docs.vllm.ai/projects/recipes/en/latest/MiniMax/MiniMax-M2.html

https://github.com/SemiAnalysisAI/InferenceX/blob/main/benchmarks/single_node/minimaxm2.5_fp8_h200.sh

github-actions · 2026-03-06T01:04:26Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

kedarpotdar-nv · 2026-03-07T01:14:23Z

@claude please generate pr description

Klaud-Cold · 2026-03-07T01:14:46Z

Claude finished @kedarpotdar-nv's task in 47s —— View job

Generating PR Description

Gather context from PR changes
Generate PR description
Update PR

Done — PR description has been updated with a summary of the changes, a diff table, context on why EP is needed for MiniMax M2.5 (MoE model), and a test plan.

extend configs

45ddaf0

hshrivastava-droid requested a review from a team March 5, 2026 19:40

hshrivastava-droid requested review from ankursingh-nv, jgangani and kedarpotdar-nv as code owners March 5, 2026 19:40

github-project-automation bot added this to InferenceMAX Board Mar 5, 2026

perf change

53f7cc3

ankursingh-nv added the sweep-enabled label Mar 5, 2026

hshrivastava-droid added the NVIDIA label Mar 5, 2026

remove tp4

c035ae5

functionstackx reviewed Mar 5, 2026

View reviewed changes

enable ep

4db963a

hshrivastava-droid requested a review from functionstackx March 5, 2026 23:28

hshrivastava-droid added 2 commits March 5, 2026 16:40

conc increase

eb48394

fix ep condition

1fcdea2

functionstackx requested a review from a team March 6, 2026 01:04

hshrivastava-droid added 3 commits March 6, 2026 14:46

update: high conc

e425178

Merge remote-tracking branch 'origin/main' into nv/h200-minimax

b2a8e4d

Merge branch 'main' into nv/h200-minimax

80e70fb

ankursingh-nv approved these changes Mar 7, 2026

View reviewed changes

kedarpotdar-nv approved these changes Mar 7, 2026

View reviewed changes

functionstackx removed the sweep-enabled label Mar 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] H200 MINIMAX vLLM extend configs #869

[WIP] H200 MINIMAX vLLM extend configs #869
hshrivastava-droid wants to merge 9 commits intomainfrom
nv/h200-minimax

hshrivastava-droid commented Mar 5, 2026 •

edited by Klaud-Cold

Loading

Uh oh!

functionstackx Mar 5, 2026

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

kedarpotdar-nv commented Mar 7, 2026

Uh oh!

Klaud-Cold commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

hshrivastava-droid commented Mar 5, 2026 • edited by Klaud-Cold Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Context

Test Plan

Uh oh!

functionstackx Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 6, 2026

Uh oh!

kedarpotdar-nv commented Mar 7, 2026

Uh oh!

Klaud-Cold commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Generating PR Description

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hshrivastava-droid commented Mar 5, 2026 •

edited by Klaud-Cold

Loading

Klaud-Cold commented Mar 7, 2026 •

edited

Loading