Skip to content

Conversation

@poyenc
Copy link
Contributor

@poyenc poyenc commented Jan 2, 2026

Problem

When mask_info::decode() parses "0" (no_mask), it only set the type field but left left, right, and sink uninitialized. This caused:

  1. Uninitialized sink could be arbitrary garbage value
  2. traits.has_sink = (mask.sink > 0) in fmha_fwd_runner.hpp:882 might evaluate to true
  3. Even without masking, code could incorrectly dispatch to kHasSink=true instantiations

Solution

  1. Runtime fix: Initialize left=-1, right=-1, sink=0 when decoding no_mask in mask.hpp
  2. Compile-time check: Add static_assert(FmhaMask::IsMasking || !kHasSink) to pipeline problems
  3. Reduce instantiations: Filter out F_mask=no_mask + F_sink=true combinations in codegen scripts:
    • fmha_fwd.py
    • fmha_fwd_splitkv.py
    • fmha_pagedkv_prefill.py

Impact

  • ✅ Fixes potential runtime dispatch bug
  • ✅ Prevents invalid template combinations at compile-time
  • ✅ Reduces binary size by eliminating redundant kernel instantiations
  • ✅ Improves build time

Testing

  • Existing tests should pass
  • Invalid combinations now caught at compile-time

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@poyenc poyenc marked this pull request as ready for review January 2, 2026 14:21
@poyenc poyenc changed the title Fix mask_info initialization for no_mask cases [CK_TILE][FMHA] Fix uninitialized sink_size in mask_info::decode() and filter redundant no-mask+sink instances Jan 2, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an FMHA runtime dispatch hazard caused by uninitialized sink_size for no_mask, adds a compile-time guard against invalid sink+no-mask template combinations, and reduces redundant kernel instantiations in codegen.

Changes:

  • Initialize left/right/sink when decoding no_mask in mask_info::decode().
  • Add static_assert(FmhaMask::IsMasking || !kHasSink) to prevent invalid pipeline instantiations.
  • Filter out no_mask + sink=true combinations in FMHA fwd-related codegen scripts.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
include/ck_tile/ops/fmha/pipeline/block_fmha_pipeline_problem.hpp Adds compile-time validation to prevent kHasSink=true when masking is disabled.
example/ck_tile/01_fmha/mask.hpp Fixes uninitialized fields for no_mask decoding (prevents bogus runtime has_sink).
example/ck_tile/01_fmha/codegen/ops/fmha_pagedkv_prefill.py Skips generating redundant/invalid no_mask + sink kernel variants.
example/ck_tile/01_fmha/codegen/ops/fmha_fwd_splitkv.py Skips generating redundant/invalid no_mask + sink kernel variants.
example/ck_tile/01_fmha/codegen/ops/fmha_fwd.py Adds compatibility filtering to avoid no_mask + sink kernels in fwd generation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

static constexpr auto QScaleEnum = Traits::QScaleEnum;
static constexpr index_t kBlockPerCu = Traits::kBlockPerCu;
static constexpr bool kHasSink = Traits::kHasSink;
static_assert(FmhaMask::IsMasking || !kHasSink);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new static_assert has no diagnostic message, while other static_asserts in this file provide one (e.g., lines 108–123). Adding a short message (e.g., that sink requires masking) would make template instantiation failures much easier to understand.

Copilot uses AI. Check for mistakes.
static constexpr bool kIsPagedKV = Traits::kIsPagedKV;
static constexpr index_t kBlockPerCu = Traits::kBlockPerCu;
static constexpr bool kHasSink = Traits::kHasSink;
static_assert(FmhaMask::IsMasking || !kHasSink);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new static_assert has no diagnostic message, while other static_asserts in this file provide one (e.g., lines 108–123). Adding a short message (e.g., that sink requires masking) would make template instantiation failures much easier to understand.

Copilot uses AI. Check for mistakes.
static constexpr bool kMergeNumHeadGroupsSeqLenQ = Traits::kMergeNumHeadGroupsSeqLenQ;
static constexpr index_t kBlockPerCu = Traits::kBlockPerCu;
static constexpr bool kHasSink = Traits::kHasSink;
static_assert(FmhaMask::IsMasking || !kHasSink);
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new static_assert has no diagnostic message, while other static_asserts in this file provide one (e.g., lines 108–123). Adding a short message (e.g., that sink requires masking) would make template instantiation failures much easier to understand.

Copilot uses AI. Check for mistakes.
Comment on lines +829 to +834
# sink_size is only meaningful when mask is applied
if (
kernel_ctx.pipeline.F_mask in no_mask_keys
and kernel_ctx.pipeline.F_sink == "t"
):
return False
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says sink_size is only meaningful when no masking is applied, but the condition directly below filters out the no-mask + sink=true combination. Please update the comment to match the logic (i.e., sink is only meaningful when masking is enabled).

Copilot uses AI. Check for mistakes.
or pipeline.F_logits == "f"
):
continue
# sink_size is only meaningful when mask is applied
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says sink_size is only meaningful when no masking is applied, but the condition directly below filters out the no-mask + sink=true combination. Please update the comment to match the logic (i.e., sink is only meaningful when masking is enabled).

Suggested change
# sink_size is only meaningful when mask is applied
# sink_size is only meaningful when masking is enabled, so disallow sink when no mask is applied

Copilot uses AI. Check for mistakes.
or pipeline.F_logits == "f"
):
continue
# sink_size is only meaningful when mask is applied
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says sink_size is only meaningful when no masking is applied, but the condition directly below filters out the no-mask + sink=true combination. Please update the comment to match the logic (i.e., sink is only meaningful when masking is enabled).

Suggested change
# sink_size is only meaningful when mask is applied
# sink_size is only meaningful when masking is enabled; disallow sink when no mask is used

Copilot uses AI. Check for mistakes.
@LJ-underdog
Copy link
Contributor

LJ-underdog commented Jan 21, 2026

LGTM @asleepzzz Please approve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants