[ROCm] Include ROCM support for CUDA extensions by amd-sriram · Pull Request #4180 · pytorch/audio

amd-sriram · 2026-02-23T19:45:26Z

Motivation

Port cuda extensions to ROCm:

RNNTLoss
lfilter (iir)
forced align
CU CTC
rir (removed)

Technical Details

Changes to tools/setup_helpers/extension.py

cuda source files are added for _USE_ROCM flag.
e.g.
if _USE_CUDA or _USE_ROCM:
sources.append("iir_cuda.cu")

Fixing compilation issues

The following fixes have been made to fix the following errors:

1. TORCH_HIP_VERSION is not defined

/skishore/github/audio/src/libtorchaudio/utils_hip.cpp:20:10: error: ‘TORCH_HIP_VERSION’ was not declared in this scope; did you mean ‘TORCH_ABI_VERSION’?

TORCH_HIP_VERSION is defined in tools/setup_helpers/extension.py , similiar to ttps://github.com/ROCm/pytorch/blob/develop/cmake/public/LoadHIP.cmake#L166 math(EXPR TORCH_HIP_VERSION "(${HIP_VERSION_MAJOR} * 100) + ${HIP_VERSION_MINOR}")

2. kernel launch parameters are not proper

/skishore/github/audio/src/libtorchaudio/iir_hip.hip:75:8: error: too few arguments provided to function-like macro invocation 

         75 |        hipLaunchKernelGGL(( (iir_cu_kernel<scalar_t>), dim3(blocks), dim3(threads), 0, 0,

Correct the parameters in THO_DISPATCH_V2 based on https://github.com/ROCm/pytorch/blob/develop/test/cpp_extensions/libtorch_agn_2_9_extension/csrc/kernel.cpp#L361

  THO_DISPATCH_V2(m.scalar_type(), "mv_tensor_accessor_cpu",
                  AT_WRAP(([&]() {
                    auto resa = Accessor_cpu<scalar_t, 1>(reinterpret_cast<scalar_t*>(res.data_ptr()), res.sizes().data(), res.strides().data());
                    auto ma = Accessor_cpu<scalar_t, 2>(reinterpret_cast<scalar_t*>(m.data_ptr()), m.sizes().data(), m.strides().data());
                    auto va = Accessor_cpu<scalar_t, 1>(reinterpret_cast<scalar_t*>(v.data_ptr()), v.sizes().data(), v.strides().data());
                    mv_tensor_accessor_kernel<Accessor_cpu, scalar_t>(resa, ma, va);
                  })),
                  AT_FLOATING_TYPES);

Test Plan

Run this branch in both Nvidia machine and AMD machine, check if it installs and run the unit tests for the cuda extensions:

python -m pip install . --no-build-isolation

pytest test/torchaudio_unittest/functional/functional_cuda_test.py -k test_rnnt  
pytest test/torchaudio_unittest/functional/torchscript_consistency_cuda_test.py -k test_rnnt 
pytest test/torchaudio_unittest/functional/autograd_cuda_test.py -k test_rnnt
pytest test/torchaudio_unittest/transforms/autograd_cuda_test.py -k test_rnnt
pytest test/torchaudio_unittest/transforms/torchscript_consistency_cuda_test.py -k test_rnnt 

pytest test/torchaudio_unittest/functional/functional_cuda_test.py -k test_lfilter
pytest test/torchaudio_unittest/functional/autograd_cuda_test.py -k test_lfilter
pytest test/torchaudio_unittest/functional/batch_consistency_test.py -k test_lfilter
pytest test/torchaudio_unittest/functional/torchscript_consistency_cuda_test.py -k test_lfilter

pytest test/torchaudio_unittest/functional/functional_cuda_test.py -k test_forced_align

pytest test/torchaudio_unittest/models/decoder/cuda_ctc_decoder_test.py

Test Result

Number of passed unit tests:

Syntax	Number of unit tests passing for each test run
RNNT loss	18, 1, 3, 3, 1
lfilter	19, 6, 2, 1
forced_align	120
cu ctc	3

Attached log for torch 2.11
torch211_log.txt

Doesn't support torch 2.10

* fix build error on ROCM * Update CMakeLists.txt Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * address comments and fix cuda detction on rocm Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

* disable mvdr test * skip more tests * update script * fix kaldi import * add more skips

Co-authored-by: Cursor <cursoragent@cursor.com>

…ip rocm so it is the same as upstream

Rocm rnnt loss feature

pytorch-bot · 2026-02-23T19:45:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4180

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

remove hipblas flags

Remove hip namespace shim

amd-sriram · 2026-03-06T21:39:32Z

@NicolasHug Could you please review this PR. Thanks.

micmelesse and others added 18 commits November 5, 2025 22:40

fix build error on ROCM (#2)

5484f33

Fix build error 2 rocm pr 2 (#3)

bdddf4b

* fix build error on ROCM * Update CMakeLists.txt Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com> * address comments and fix cuda detction on rocm Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>

Disable tests (#5)

673faa1

* disable mvdr test * skip more tests * update script * fix kaldi import * add more skips

remove cmakelists, since it is moved inside, resolve import statement

c516a52

Co-authored-by: Cursor <cursoragent@cursor.com>

add source code for rnnt loss, define TORCH_HIP_VERSION variable

b232095

add namespace shim file

fa315a1

fix THO_DISPATCH syntax so that hipification works

c13048a

fix torch version, add lfilter rocm to sources

6c8ab7f

add rocm source code for forced align, add shim namespace file for rocm

5b46635

remove extra skip if rocm flags for certain unit tests

ab3424f

build cuda ctc decoder for rocm

45aec76

the lfilter test passes, the other tests are the same, so removing sk…

bc95506

…ip rocm so it is the same as upstream

add end of file

553f9b0

fix ufmt issue

f795931

fix clang format

e7bad35

remove skip if rocm flag from the test

82acc52

Merge pull request #12 from ROCm/rocm_rnnt_loss_feature

b9c7682

Rocm rnnt loss feature

Merge branch 'pytorch:main' into main

e132d40

amd-sriram requested a review from a team as a code owner February 23, 2026 19:45

pytorch-bot Bot added the module: rocm label Feb 23, 2026

meta-cla Bot added the CLA Signed label Feb 23, 2026

amd-sriram marked this pull request as draft February 25, 2026 14:04

amd-sriram and others added 5 commits March 1, 2026 18:37

remove hipblas flags

11bbe29

Merge pull request #13 from ROCm/fix_rnnt_loss

189dda5

remove hipblas flags

remove hip namespace shim

cf99efb

remove reference of hip namespace shim

faf4a35

Merge pull request #14 from ROCm/fix_rnnt_loss_r211

4c189d2

Remove hip namespace shim

amd-sriram marked this pull request as ready for review March 6, 2026 21:35

This comment was marked as spam.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ROCm] Include ROCM support for CUDA extensions#4180

[ROCm] Include ROCM support for CUDA extensions#4180
amd-sriram wants to merge 23 commits intopytorch:mainfrom
ROCm:main

amd-sriram commented Feb 23, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 23, 2026

Uh oh!

amd-sriram commented Mar 6, 2026

Uh oh!

This comment was marked as spam.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

amd-sriram commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Changes to tools/setup_helpers/extension.py

Fixing compilation issues

Test Plan

Test Result

Uh oh!

pytorch-bot Bot commented Feb 23, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/audio/4180

Uh oh!

amd-sriram commented Mar 6, 2026

Uh oh!

This comment was marked as spam.

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

amd-sriram commented Feb 23, 2026 •

edited

Loading