[release/2.10] Add torch.backends.cuda.math_sdp.fp32_precision by anatoliylitv · Pull Request #2941 · ROCm/pytorch

anatoliylitv · 2026-01-26T02:40:44Z

Overview
This PR adds a new float32 precision API
torch.backends.cuda.math_sdp.fp32_precision to configure fp32 precision
behavior of SDPBackend.MATH

Rationale
The test/test_transformers.py testing suite calculates the numerical
tolerance by comparing output tensors from the same precision ("reference")
and higher precision ("golden"), both calculated by SDPBackend.MATH.
However, the golden output is calculated with TF32 rather than FP32, which in
fact is less accurate than the FA/ME backend if they used IEEE rather than
TF32 for their accumulation.

The loss of precison causes false negatives in SDPA tests like
TestSDPACudaOnlyCUDA.test_flash_attention_vs_math_ref_grads_batch_size_8_seq_len_q_143_seq_len_k_4_head_dim_203_is_causal_False_dropout_p_0_22_float16_scale_l1_enable_gqa_True_n_heads1_cuda_float16
, at least on ROCM platform. The false negative disappears after forcing
higher_precision_dtype = torch.float64

Major Changes
To restore the precision of golden output, a new API
torch.backends.cuda.math_sdp.fp32_precision is introduced, which allows
configuration of "matmul" precision during SDPBackend.MATH, and a new
decorator @math_sdp_precision("ieee") is added to all tests that use
check_out_and_grad. At last, an assert is added to the inner most function
_check_equal as a sanity check to ensure math_sdp has the right precison
configured for torch.float32 golden tensors.

Known Issues
The backward phase honors the configuration when calling backward(), regardless
the configuration when creating the graph.

This is copy of PR pytorch#167157 which owner on vacation.

torch.backends.cuda.math_sdp.fp32_precision to configure fp32 precision behavior of SDPBackend.MATH

rocm-repo-management-api · 2026-01-26T02:49:37Z

Jenkins build for 9cc2bd01cb88d4c927b4aeaafceb3412cdd5dcd5 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

pruthvistony · 2026-02-04T17:55:49Z

The PR - pytorch#169694 is still open.

Adds a new float32 precision API

9cc2bd0

torch.backends.cuda.math_sdp.fp32_precision to configure fp32 precision behavior of SDPBackend.MATH

anatoliylitv changed the title ~~Add torch.backends.cuda.math_sdp.fp32_precision~~ [release/2.10] Add torch.backends.cuda.math_sdp.fp32_precision Jan 26, 2026

pruthvistony requested review from alugorey, jeffdaily, jerrymannil and xinyazhang February 4, 2026 17:53

alugorey approved these changes Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[release/2.10] Add torch.backends.cuda.math_sdp.fp32_precision#2941

[release/2.10] Add torch.backends.cuda.math_sdp.fp32_precision#2941
anatoliylitv wants to merge 1 commit intorelease/2.10from
anatoliylitv/math_sdp_ieee_upstream_manual_merge_2.10

anatoliylitv commented Jan 26, 2026

Uh oh!

rocm-repo-management-api Bot commented Jan 26, 2026 •

edited

Loading

Uh oh!

pruthvistony commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

anatoliylitv commented Jan 26, 2026

Uh oh!

rocm-repo-management-api Bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pruthvistony commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rocm-repo-management-api Bot commented Jan 26, 2026 •

edited

Loading