Skip to content

Performance Improvements for MI300X with GEMM and FP8 Enhancements#811

Open
chunfangamd wants to merge 3 commits intomainfrom
chun_zhentao/dsr1_fp8_mi300x_20260225
Open

Performance Improvements for MI300X with GEMM and FP8 Enhancements#811
chunfangamd wants to merge 3 commits intomainfrom
chun_zhentao/dsr1_fp8_mi300x_20260225

Conversation

@chunfangamd
Copy link
Collaborator

By patching aiter/sgl-kernel versions for MI300X FP8 DSR1 SGLang, we include three improvements:

  1. Include configuration files for three GEMM operations: [gfx942]Add new GEMM configuration files for DSKR1 ROCm/aiter#2024
  2. Improve TPOT by using fp8 bmm in MLA and MI300X for DSR1/V3: [AMD] DSR1/V3 use fp8 bmm in MLA for MI300X sgl-project/sglang#18624
  3. Broaden the optimized paths to all HIP platforms and add tuned FP8 GEMM configs: [ROCm] Optimize Deepseek R1 on MI300X sgl-project/sglang#18242

e2e Tests: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22400655865

- Pin aiter and sgl-kernel to specific commits required by the
v0.5.8-rocm700-mi30x image.
- This Patch should only work with Image
    lmsysorg/sglang:v0.5.8-rocm700-mi30x
- A work with Zhentao Chen
The previous aiter ref (9046b6f) changed get_mla_metadata_v1 to expect
a Tensor for kv_last_page_lens, but the image's sglang still passed an
int, crashing during cuda graph capture.

Fix by fresh-cloning aiter at d2ca5a89, pinning sgl-kernel to 8bd6447
(now at sglang/sgl-kernel), and uninstalling stale packages before
rebuilding to avoid leftover C extension conflicts.
Copy link
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can y'all update sglang with the latest aiter and we wait until next sgl release image before updating inferenceMax? Ideally we wanna track acutal images and not patchwork

@cquil11
Copy link
Collaborator

cquil11 commented Feb 26, 2026

@chunfangamd So the idea is that this will land in SGLang 0.10.0?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants