Skip to content

vllm 0.17.0 gpt-oss updates#867

Closed
Rohan138 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
ROCm:amd/update_gptoss_vllm_0.17.0
Closed

vllm 0.17.0 gpt-oss updates#867
Rohan138 wants to merge 2 commits intoSemiAnalysisAI:mainfrom
ROCm:amd/update_gptoss_vllm_0.17.0

Conversation

@Rohan138
Copy link
Contributor

@Rohan138 Rohan138 commented Mar 5, 2026

Opening as draft until vllm v0.17.0 is released. This release will include multiple optimizations for gpt-oss:

Some other updates:

@Rohan138
Copy link
Contributor Author

Rohan138 commented Mar 5, 2026

cc @chunfangamd

Comment on lines +36 to +54
export AMDGCN_USE_BUFFER_OPS=0
export VLLM_ROCM_USE_AITER=1
export VLLM_ROCM_USE_AITER_UNIFIED_ATTENTION=1
export VLLM_ROCM_USE_AITER_MHA=0
export VLLM_ROCM_USE_AITER_TRITON_ROPE=1
export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT4
ATTN_BACKEND="--attention-backend ROCM_AITER_UNIFIED_ATTN"
FUSE_ROPE_KVCACHE="-cc.pass_config.fuse_rope_kvcache=True -cc.use_inductor_graph_partition=True"

SERVER_LOG=/workspace/server.log
PORT=${PORT:-8888}

set -x
vllm serve $MODEL --port $PORT \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.95 \
--max-model-len $MAX_MODEL_LEN \
--compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
--block-size=64 \
--no-enable-prefix-caching \
--disable-log-requests > $SERVER_LOG 2>&1 &
$ATTN_BACKEND $FUSE_ROPE_KVCACHE \
--tensor-parallel-size=$TP \
--gpu-memory-utilization 0.95 \
--max-model-len $MAX_MODEL_LEN \
--block-size=64 \
--no-enable-prefix-caching \
--disable-log-requests > $SERVER_LOG 2>&1 &
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this PR, can u open an PR in vllm recipes repo to update with these new flags. Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!

https://github.com/vllm-project/recipes/blob/main/OpenAI/GPT-OSS.md#mi355xgfx950

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cquil11
Copy link
Collaborator

cquil11 commented Mar 5, 2026

Wow, awesome that y'all have this queued up and ready to go! We will test and ship as soon as vLLM 0.17.0 is out.

@cquil11 cquil11 marked this pull request as ready for review March 7, 2026 19:21
@cquil11 cquil11 requested a review from a team March 7, 2026 19:21
@cquil11
Copy link
Collaborator

cquil11 commented Mar 7, 2026

@Rohan138 #889

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

3 participants