vllm 0.17.0 gpt-oss updates by Rohan138 · Pull Request #867 · SemiAnalysisAI/InferenceX

Rohan138 · 2026-03-05T07:31:04Z

Opening as draft until vllm v0.17.0 is released. This release will include multiple optimizations for gpt-oss:

Enable https://huggingface.co/amd/gpt-oss-120b-w-mxfp4-a-fp8 moe quant: [ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static scales vllm-project/vllm#30357 (10-15%)
RoPE+KVCache fusion: [ROCm] AITER fused RoPE+KVCache vllm-project/vllm#33443 (3-4%)
Enable AITER RoPE: [ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE vllm-project/vllm#35180 (1%)

Some other updates:

gpt-oss CK moe backend: [ROCm][Quantization] Add Composable Kernel (CK) backend support for M… vllm-project/vllm#34301 (Not used in this PR since Triton W4A8 is better, but was previously the default MOE on MI355 before [AMD] Update AMD MI300X, MI325X, MI355X GPT-OSS vLLM images to v0.16.0 #806)
Attention backends can no longer be passed in as env vars in vLLM, update to --attention-backend: [Deprecation] Remove deprecated environment variables vllm-project/vllm#32812
FULL_AND_PIECEWISE is already the default, no need to specify it

Rohan138 · 2026-03-05T07:31:27Z

cc @chunfangamd

functionstackx · 2026-03-05T08:03:30Z

benchmarks/single_node/gptoss_fp4_mi355x.sh

+export AMDGCN_USE_BUFFER_OPS=0
 export VLLM_ROCM_USE_AITER=1
-export VLLM_ROCM_USE_AITER_UNIFIED_ATTENTION=1
-export VLLM_ROCM_USE_AITER_MHA=0
+export VLLM_ROCM_USE_AITER_TRITON_ROPE=1
+export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4
+ATTN_BACKEND="--attention-backend ROCM_AITER_UNIFIED_ATTN"
+FUSE_ROPE_KVCACHE="-cc.pass_config.fuse_rope_kvcache=True -cc.use_inductor_graph_partition=True"

 SERVER_LOG=/workspace/server.log
 PORT=${PORT:-8888}

 set -x
 vllm serve $MODEL --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.95 \
--max-model-len $MAX_MODEL_LEN \
--compilation-config  '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
--block-size=64 \
--no-enable-prefix-caching \
--disable-log-requests > $SERVER_LOG 2>&1 &
+  $ATTN_BACKEND $FUSE_ROPE_KVCACHE \
+  --tensor-parallel-size=$TP \
+  --gpu-memory-utilization 0.95 \
+  --max-model-len $MAX_MODEL_LEN \
+  --block-size=64 \
+  --no-enable-prefix-caching \
+  --disable-log-requests > $SERVER_LOG 2>&1 &


thanks for this PR, can u open an PR in vllm recipes repo to update with these new flags. Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!

https://github.com/vllm-project/recipes/blob/main/OpenAI/GPT-OSS.md#mi355xgfx950

Done: vllm-project/recipes#268

cquil11 · 2026-03-05T15:48:49Z

Wow, awesome that y'all have this queued up and ready to go! We will test and ship as soon as vLLM 0.17.0 is out.

cquil11 · 2026-03-07T19:38:37Z

@Rohan138 #889

Rohan138 added 2 commits March 5, 2026 01:19

vllm 0.17.0 updates

0476859

use aiter rope on mi355

0e91a71

github-project-automation bot added this to InferenceMAX Board Mar 5, 2026

functionstackx reviewed Mar 5, 2026

View reviewed changes

Rohan138 mentioned this pull request Mar 5, 2026

Update gpt-oss mi300,mi325,mi355 for vllm 0.17.0 vllm-project/recipes#268

Draft

cquil11 marked this pull request as ready for review March 7, 2026 19:21

cquil11 requested a review from a team March 7, 2026 19:21

cquil11 requested review from billishyahao and chunfangamd as code owners March 7, 2026 19:21

cquil11 approved these changes Mar 7, 2026

View reviewed changes

cquil11 mentioned this pull request Mar 7, 2026

[AMD] GPT-OSS vLLM 0.17.0 AMD update #889

Open

cquil11 closed this Mar 7, 2026

github-project-automation bot moved this to Done in InferenceMAX Board Mar 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm 0.17.0 gpt-oss updates#867

vllm 0.17.0 gpt-oss updates#867
Rohan138 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
ROCm:amd/update_gptoss_vllm_0.17.0

Rohan138 commented Mar 5, 2026

Uh oh!

Rohan138 commented Mar 5, 2026

Uh oh!

functionstackx Mar 5, 2026

Uh oh!

Rohan138 Mar 5, 2026

Uh oh!

cquil11 commented Mar 5, 2026

Uh oh!

cquil11 commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Rohan138 commented Mar 5, 2026

Uh oh!

Rohan138 commented Mar 5, 2026

Uh oh!

functionstackx Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Rohan138 Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

cquil11 commented Mar 5, 2026

Uh oh!

cquil11 commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants