Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series #313

wuhuikx · 2026-02-12T02:15:48Z

In recent months, the Qwen C-end Infrastructure Engineering Team and the AMD AIFW Team have collaborated to implement extreme latency optimization solutions for Qwen3 235B and Qwen3 VL 235B on the AMD MI300 series GPU platform, based on the SGLang framework.

Remarkable breakthroughs have been achieved in terms of performance, precision, and stability.
• For Qwen3 235B: Compared with the baseline, the Time to First Token (TTFT) has been improved by 1.67×, and the Time Per Output Token (TPOT) has been improved by 2.12×.
• For Qwen3 VL 235B: Compared with the baseline, the Time to First Token (TTFT) has been improved by 1.62×, and the Time Per Output Token (TPOT) has been improved by 1.90×.

This paper elaborates on the performance optimization techniques jointly explored and implemented by the two teams, with a core focus on achieving ultra-low-latency inference.

wuhuikx · 2026-02-12T02:25:21Z

@sunway513

wuhuikx and others added 6 commits February 11, 2026 14:25

Add Qwen latency optimization blog

93f0cc0

Reformat the image

fa66437

Fix some sentences

940c654

Update 2026-02-11-Qwen-latency.md

4150c91

update the preview image

2c5d7cf

Change the images path

2517b8f

wuhuikx and others added 3 commits February 12, 2026 10:41

Update

63e22d8

Update 2026-02-11-Qwen-latency.md

05a702a

Update 2026-02-11-Qwen-latency.md

82409c2

wuhuikx changed the title ~~Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300 Series~~ Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series Feb 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series #313

Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series #313

wuhuikx commented Feb 12, 2026

Uh oh!

wuhuikx commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series #313

Are you sure you want to change the base?

Unleashing Computational Power: Ultimate Latency Optimization of Qwen3 and Qwen3-VL on AMD MI300X Series #313

Conversation

wuhuikx commented Feb 12, 2026

Uh oh!

wuhuikx commented Feb 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant