[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800
[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800
Conversation
|
Completed sweep: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22429605694 Normal variance +/- 2%
|
|
gonna merge this soon |
|
Looks like small perf regression on B200 1k/1k @ankursingh-nv is investigating |
|
v0.17 is coming out wednesday, probably gonna merge this v0.16 in soon before then since we doing best effort on gptoss |
|
@functionstackx @ankursingh-nv, Should we then just wait for 0.17 to land and update this PR before merging? |
|
In generally we should have the version that results in best performance today. |
|
@ankursingh-nv in general tho, we think its useful to update images as they are released (even if perf is not improved) for posterity and to track perf across all images publicly fwiw, it appears the "regression" in this PR is just natural variance |
Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag. v0.16.0 notable changes for GPT-OSS/MXFP4: - Async scheduling + pipeline parallelism (30.8% throughput improvement) - New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS - MoE cold start optimization - Triton backend now default non-FlashInfer fallback on SM90/SM100 Closes #798 Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Removed outdated configuration entries and added new vLLM image update details for NVIDIA GPT-OSS. Updated pull request links for changes.
e7264f5 to
4ca13fc
Compare




Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag.
v0.16.0 notable changes for GPT-OSS/MXFP4:
Closes #798