Skip to content

Conversation

@ltqin
Copy link
Collaborator

@ltqin ltqin commented Dec 1, 2025

Proposed changes

Implement fp8 block scale quantization for fmha fwd

Checklist

./build/bin/tile_example_fmha_fwd -init=3 -b=16 -s=256 -s_k=1024 -h=16 -h_k=1 -d=128 -prec=fp8bf16 -vlayout=r -qscale=2 -kname=1 -mode=0 -v=1 -operm=1 -iperm=1

@ltqin ltqin requested a review from rocking5566 December 3, 2025 00:26
@poyenc
Copy link
Contributor

poyenc commented Jan 19, 2026

LGTM. Please make this PR won't introduce performance regression before mergin it.

poyenc
poyenc previously approved these changes Jan 19, 2026
@ltqin ltqin enabled auto-merge (squash) January 20, 2026 06:21
@ltqin ltqin disabled auto-merge January 20, 2026 06:22
@ltqin ltqin enabled auto-merge (squash) January 20, 2026 14:09
@ltqin ltqin disabled auto-merge January 20, 2026 14:09
@illsilin
Copy link
Collaborator

Please let's make sure we don't break AITER with these changes!

@poyenc
Copy link
Contributor

poyenc commented Jan 21, 2026

@ltqin could you update the CHANGELOG.md as well?

@ltqin ltqin enabled auto-merge (squash) January 22, 2026 03:48
@ltqin ltqin disabled auto-merge January 22, 2026 03:49
@illsilin illsilin merged commit dd0b429 into develop Jan 22, 2026
12 checks passed
@illsilin illsilin deleted the ck_tile/fmha_fwd_block_scale branch January 22, 2026 04:58
poyenc added a commit that referenced this pull request Jan 23, 2026
illsilin pushed a commit that referenced this pull request Jan 23, 2026
ltqin added a commit that referenced this pull request Jan 23, 2026
illsilin added a commit that referenced this pull request Jan 23, 2026
…3633)" (#3635)

This reverts commit de5a1d7.

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants