Skip to content

Fix cuDNN convolution precision on Ampere+ GPUs#3127

Merged
davisking merged 1 commit intodavisking:masterfrom
joelnn:fix-cudnn9-tf32-precision
Dec 28, 2025
Merged

Fix cuDNN convolution precision on Ampere+ GPUs#3127
davisking merged 1 commit intodavisking:masterfrom
joelnn:fix-cudnn9-tf32-precision

Conversation

@joelnn
Copy link
Contributor

@joelnn joelnn commented Dec 28, 2025

On Ampere and later GPUs (SM 8.0+), cuDNN's default math mode permits TF32 Tensor Core operations which use reduced mantissa precision. This causes numerical differences when comparing CUDA vs CPU convolution results, particularly in cudnnConvolutionBackwardFilter().

Explicitly set CUDNN_FMA_MATH to force true FP32 computation for consistent numerical results across all GPU architectures.

On Ampere and later GPUs (SM 8.0+), cuDNN's default math mode permits
TF32 Tensor Core operations which use reduced mantissa precision. This
causes numerical differences when comparing CUDA vs CPU convolution
results, particularly in cudnnConvolutionBackwardFilter().

Explicitly set CUDNN_FMA_MATH to force true FP32 computation for
consistent numerical results across all GPU architectures.
@davisking
Copy link
Owner

Sweet, thanks for another PR :D

@davisking davisking merged commit 07c1e73 into davisking:master Dec 28, 2025
10 of 11 checks passed
@Cydral
Copy link
Contributor

Cydral commented Feb 2, 2026

@joelnn, Would it be worth considering CUDNN_TENSOR_OP_MATH_ALLOW_CONVERSION instead of CUDNN_FMA_MATH to maintain Tensor Core performance on Ampere+ GPUs? This would only require relaxing one or two test tolerances from 1e-3 to 2e-3 in test_conv(), which seems acceptable given the significant performance benefit...

@joelnn
Copy link
Contributor Author

joelnn commented Feb 2, 2026

@Cydral thats alright with me, I mainly wanted tests to pass. I would've thought that reducing precision should be opt-in, but following the default policy of cuDNN would also be a reasonable policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants