[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0 by cquil11 · Pull Request #800 · SemiAnalysisAI/InferenceX

cquil11 · 2026-02-26T05:44:04Z

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag.

v0.16.0 notable changes for GPT-OSS/MXFP4:

Async scheduling + pipeline parallelism (30.8% throughput improvement)
New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS
MoE cold start optimization
Triton backend now default non-FlashInfer fallback on SM90/SM100

Closes #798

cquil11 · 2026-02-26T18:00:42Z

Completed sweep: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22429605694

Normal variance +/- 2%

functionstackx

Lgtm

functionstackx · 2026-02-27T18:31:29Z

gonna merge this soon

kedarpotdar-nv · 2026-02-27T18:40:49Z

Looks like small perf regression on B200 1k/1k @ankursingh-nv is investigating

functionstackx · 2026-03-01T20:42:31Z

v0.17 is coming out wednesday, probably gonna merge this v0.16 in soon before then since we doing best effort on gptoss

jgangani · 2026-03-02T07:29:17Z

@functionstackx @ankursingh-nv, Should we then just wait for 0.17 to land and update this PR before merging?

ankursingh-nv · 2026-03-02T22:27:15Z

In generally we should have the version that results in best performance today.
We are investigating it but in the meantime, if v0.17 is released and the out-of-box performance is good, we can skip v0.16

cquil11 · 2026-03-05T00:19:50Z

@ankursingh-nv in general tho, we think its useful to update images as they are released (even if perf is not improved) for posterity and to track perf across all images publicly

fwiw, it appears the "regression" in this PR is just natural variance

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag. v0.16.0 notable changes for GPT-OSS/MXFP4: - Async scheduling + pipeline parallelism (30.8% throughput improvement) - New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS - MoE cold start optimization - Triton backend now default non-FlashInfer fallback on SM90/SM100 Closes #798 Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>

Removed outdated configuration entries and added new vLLM image update details for NVIDIA GPT-OSS. Updated pull request links for changes.

cquil11 requested a review from a team February 26, 2026 05:44

cquil11 added the NVIDIA label Feb 26, 2026

github-project-automation bot added this to InferenceMAX Board Feb 26, 2026

cquil11 added the sweep-enabled label Feb 26, 2026

cquil11 requested review from ankursingh-nv and kedarpotdar-nv February 26, 2026 16:32

cquil11 removed the sweep-enabled label Feb 26, 2026

functionstackx approved these changes Feb 26, 2026

View reviewed changes

functionstackx requested review from csahithi, jgangani and yunzhoul-nv as code owners February 27, 2026 18:32

ankursingh-nv added the sweep-enabled label Mar 6, 2026

github-actions bot and others added 11 commits March 6, 2026 16:59

Update perf-changelog.yaml with new vLLM details

d3e5c26

Removed outdated configuration entries and added new vLLM image update details for NVIDIA GPT-OSS. Updated pull request links for changes.

update container image

e4863ad

add stream-interval

83612e7

update perf-changelog

8d68260

add flock for h100

8c20107

add flock for cw runners as well

d7c0a47

add flock for cw runners as well pt 2

320bb8f

cw h100 and h200 require separate squash for users

b13abb5

cw update

37381b7

cw update pt 2

4ca13fc

cquil11 force-pushed the claude/issue-798-20260226-0534 branch from e7264f5 to 4ca13fc Compare March 6, 2026 22:59

cquil11 added 5 commits March 6, 2026 17:01

reverting hf download

7e64f05

reverting cw changes

a0afe2f

Merge branch 'main' into claude/issue-798-20260226-0534

7a3406a

trigger test check

ebe7cb3

b200 rm locks revert

390816f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800
cquil11 wants to merge 16 commits intomainfrom
claude/issue-798-20260226-0534

cquil11 commented Feb 26, 2026

Uh oh!

cquil11 commented Feb 26, 2026 •

edited

Loading

Uh oh!

functionstackx left a comment

Uh oh!

functionstackx commented Feb 27, 2026

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

functionstackx commented Mar 1, 2026 •

edited

Loading

Uh oh!

jgangani commented Mar 2, 2026

Uh oh!

ankursingh-nv commented Mar 2, 2026

Uh oh!

cquil11 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

cquil11 commented Feb 26, 2026

Uh oh!

cquil11 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

functionstackx commented Feb 27, 2026

Uh oh!

kedarpotdar-nv commented Feb 27, 2026

Uh oh!

functionstackx commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgangani commented Mar 2, 2026

Uh oh!

ankursingh-nv commented Mar 2, 2026

Uh oh!

cquil11 commented Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cquil11 commented Feb 26, 2026 •

edited

Loading

functionstackx commented Mar 1, 2026 •

edited

Loading