Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1090,6 +1090,19 @@ def _is_fa3_supported(num_heads, num_gqa_groups, head_dim_qk, head_dim_v, qkv_dt
logger.debug("Disabling FusedAttention for determinism reasons with post_scale_bias")
use_fused_attention = False
fused_attention_backend = None
if (
fused_attention_backend == FusedAttnBackend["F16_arbitrary_seqlen"]
and is_training
and cudnn_version >= (9, 7, 0)
and cudnn_version < (9, 18, 1)
and device_compute_capability >= (10, 0)
):
logger.debug(
"Disabling FusedAtttention because determinism is not supported on Blackwell for "
"FP16/BF16 with 9.7 <= cuDNN < 9.18.1"
)
use_fused_attention = False
fused_attention_backend = None
Comment on lines +1093 to +1105
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Missing is_training guard — may incorrectly disable FusedAttention for inference

Every other determinism filter in this same if use_fused_attention and deterministic: block guards against is_training (see lines 1070–1080 for FP8 and 1081–1092 for F16_arbitrary_seqlen), conveying that those non-determinism issues are backward-pass–specific. The new Blackwell / cuDNN-version filter does not include and is_training, so it will also disable FusedAttention during deterministic inference on Blackwell GPUs with cuDNN 9.7–9.18.1.

If the cuDNN bug only manifests during training (backward pass), the filter is overly broad and will unnecessarily fall back to a slower backend during inference. If it truly affects the forward pass as well, a comment explaining that would help reviewers and future maintainers understand the deviation from the existing pattern.

Consider either:

        if (
            fused_attention_backend == FusedAttnBackend["F16_arbitrary_seqlen"]
            and is_training
            and (cudnn_version >= (9, 7) and cudnn_version < (9, 18, 1))
            and device_compute_capability >= (10, 0)
        ):

or, if inference is also affected, add a comment explaining why is_training is deliberately omitted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated, check again


# use_flash_attention may have been set above
use_flash_attention_2 = use_flash_attention and use_flash_attention_2
Expand Down
Loading