Add cuda_buffer_backend and torch_buffer_backend for rosidl::Buffer by yuanknv · Pull Request #1 · ros2/rosidl_buffer_backends

yuanknv · 2026-04-07T07:30:21Z

Description

This pull request adds CUDA and PyTorch buffer backend implementations for the rosidl::Buffer, enabling zero-copy GPU memory sharing between ROS 2 publishers and subscribers .

CUDA buffer backend: Enables zero-copy GPU data transport with fully asynchronous - data could stay on the GPU accorss ROS nodes. allocate_msg allocates from a CUDA Virtual Memory Management (VMM) based IPC memory pool; each block carries a pre-exported POSIX FD for zero-overhead IPC reuse. from_buffer returns a WriteHandle/ReadHandle that manages GPU stream ordering via CUDA events (no cudaStreamSynchronize in the pipeline). On transmit, the plugin checks locality via a shared-memory endpoint registry: for same-host same-GPU peers, it sends the block's FD over a Unix socket and an IPC event handle for cross-process GPU sync; otherwise it falls back to CPU serialization. On receive, the block is imported and mapped (cached per source block), with a shared-memory refcount and UID validation to prevent stale reuse. A background recycler thread handles event synchronization and block reclamation off the callback thread.

Torch Buffer Backend: A device-agnostic layer on top of device buffer backends (e.g. cuda_buffer_backend) that lets users work with torch::Tensor directly. allocate_msg creates a TorchBufferImpl wrapping a buffer with tensor metadata (shape, strides, dtype); the device is auto-detected at compile time - if no accelerated buffer backend is installed, falls back to CPU. from_buffer returns a torch::Tensor view backed by the device buffer's handle (write or read, captured in the tensor deleter for event lifetime safety). to_buffer copies a pre-existing torch tensor into the allocated buffer. On transmit, the TorchBufferDescriptor carries tensor metadata alongside a nested device_data field that RMW serializes via whichever device backend plugin is registered.

This pull request consists of the following key components:

cuda_buffer: Core CUDA buffer library providing a VMM-backed CUDA IPC memory pool, a host endpoint manager for locality discovery over shared memory, and user-facing allocate_msg/from_buffer/to_buffer APIs with RAII CUDA event based GPU synchronization (ReadHandle/WriteHandle).
cuda_buffer_backend: BufferBackend plugin registered via pluginlib. Handles endpoint discovery, CudaBufferDescriptor serialization with VMM IPC handles, IPC refcount lifecycle, and automatic CPU fallback when CUDA IPC is unavailable.
cuda_buffer_backend_msgs: ROS 2 message definition for CudaBufferDescriptor.
torch_buffer: PyTorch buffer library wrapping device buffers with tensor metadata (shape, strides, dtype). Provides allocate_msg/from_buffer/to_buffer APIs that auto-detect device backend at compile time.
torch_buffer_backend: BufferBackend plugin for PyTorch tensors. Handles TorchBufferDescriptor serialization with nested device buffer delegation.
torch_buffer_backend_msgs: ROS 2 message definition for TorchBufferDescriptor.

Is this user-facing behavior change?

No.

Did you use Generative AI?

Yes. Claude (claude-4.6-opus) via Cursor was used to assist with creating an initial prototype version of the changes contained in this PR.

Additional Information

This PR is part of the broader ROS 2 native buffer feature introduced in this post.

ahcorde · 2026-04-08T14:50:09Z

+  {
+    (void)endpoint_info;
+    (void)existing_endpoints;
+    (void)endpoint_supported_backends;


check if backend exists in endpoint_supported_backends ?

Thanks for the catch, on_discovering_endpoint now checks whether torch is present in endpoint_supported_backends and returns false if not, which let the RMW to fall back to the default CPU serialization path for that endpoint.

ahcorde · 2026-04-08T21:05:50Z

+find_package(cuda_buffer_backend_msgs REQUIRED)
+find_package(rmw REQUIRED)
+find_package(rcutils REQUIRED)
+find_package(CUDAToolkit REQUIRED)


we need to include this dependency in the package.xml, the requirement it's probrably nvcc ?
is this key enough https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L8367C1-L8367C12 ?

Thanks for the pointer -- added nvidia-cuda to the dependency list, which should provide everything find_package(CUDAToolkit) needs.

ahcorde · 2026-04-08T21:07:57Z

+  set(CMAKE_CUDA_HOST_COMPILER "${CMAKE_CXX_COMPILER}")
+endif()
+
+find_package(Torch REQUIRED)


Not sure about this one. is there a ubuntu package fot this one ?

wget https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip unzip libtorch-shared-with-deps-latest.zip

then I used -DCMAKE_PREFIX_PATH=<path to pytorch>

I finally used version 11.8, but nvcc is 12.0 in ubuntu

https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip

Maybe we should think if it's worth it to add a vendor package to install libtorch

@mjcarroll

Agreed, that would be very helpful. A libtorch_vendor package would simplify the setup and remove the CMAKE_PREFIX_PATH hack.
Using the cu118 build with a CUDA 12.x runtime should be fine. I think a newer driver is always backward-compatible with older CUDA applications."

ahcorde

I also detected some linters failures

and this is not passing

test_cuda_image_cpu_fallback_fastrtps_launch with this error:

6: FAIL: test_cpu_fallback_paths (cuda_buffer_backend.TestCudaImageCpuFallbackFastRTPS.test_cpu_fallback_paths)
6: Test all CPU fallback paths and normal IPC simultaneously over FastRTPS.
6: ----------------------------------------------------------------------
6: Traceback (most recent call last):
6:   File "/tmp/ws/src/rosidl_buffer_backends/cuda_buffer_backend/cuda_buffer_backend/test/test_cuda_image_cpu_fallback_fastrtps_launch.py", line 203, in test_cpu_fallback_paths
6:     self.assertTrue(
6: AssertionError: False is not true : Cross-device fallback validation failed (expected backend="cpu")

Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com> Signed-off-by: yuanknv <113960800+yuanknv@users.noreply.github.com>

…uffer_api.hpp Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com> Signed-off-by: yuanknv <113960800+yuanknv@users.noreply.github.com>

yuanknv · 2026-04-10T23:27:41Z

I also detected some linters failures

and this is not passing

test_cuda_image_cpu_fallback_fastrtps_launch with this error:

6: FAIL: test_cpu_fallback_paths (cuda_buffer_backend.TestCudaImageCpuFallbackFastRTPS.test_cpu_fallback_paths)
6: Test all CPU fallback paths and normal IPC simultaneously over FastRTPS.
6: ----------------------------------------------------------------------
6: Traceback (most recent call last):
6:   File "/tmp/ws/src/rosidl_buffer_backends/cuda_buffer_backend/cuda_buffer_backend/test/test_cuda_image_cpu_fallback_fastrtps_launch.py", line 203, in test_cpu_fallback_paths
6:     self.assertTrue(
6: AssertionError: False is not true : Cross-device fallback validation failed (expected backend="cpu")

Thanks for the review and feedback! I've fixed the linter errors in torch_buffer (which were missed from the local run). However, I can't reproduce the test_cuda_image_cpu_fallback_fastrtps_launch failure on my end. Would you mind sharing your test setup?

ahcorde

I'm getting some errors when compiling the code

I have to set up the g++ and gcc version on torch_buffer

set(CMAKE_C_COMPILER gcc-12)
set(CMAKE_CXX_COMPILER g++-12)

I'm getting this link error:

/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaGraphAddDependencies_v2@libcudart.so.12'
/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaStreamGetCaptureInfo_v3@libcudart.so.12'
/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaStreamUpdateCaptureDependencies_v2@libcudart.so.12'

yuanknv · 2026-04-17T21:58:45Z

I'm getting some errors when compiling the code

I have to set up the g++ and gcc version on torch_buffer

set(CMAKE_C_COMPILER gcc-12)
set(CMAKE_CXX_COMPILER g++-12)

I'm getting this link error:

/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaGraphAddDependencies_v2@libcudart.so.12'
/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaStreamGetCaptureInfo_v3@libcudart.so.12'
/usr/bin/ld: /home/ahcorde/buffer_backends/install/opt/libtorch_vendor/lib/libtorch_cuda.so: undefined reference to `cudaStreamUpdateCaptureDependencies_v2@libcudart.so.12'

Thanks for testing ! Both issues are fixed.

For the undefined references: the libtorch_vendor always downloaded 2.11.0+cu126 regardless of host CUDA, so older toolkits got mismatched symbols — it now auto-detects the CUDA toolkit version and picks the matching LibTorch variant and the latest version published for it (overridable via -DLIBTORCH_CUDA_VERSION= / -DLIBTORCH_VERSION=).

For the g++-12 pin: that was the CUDA toolkit's host_config.h rejecting a newer host GCC. (CUDA 11.8 → max GCC 11, 12.0–12.3 → GCC 12) Since torch_buffer compiles no CUDA device code, we now define __NV_NO_HOST_COMPILER_CHECK=1 to bypass it, so no compiler pin is needed anymore

yuanknv added 3 commits April 7, 2026 00:14

initial implementation of cuda_buffer_backend and torch_buffer_backend

0b2e189

clean up

ab2d00f

bug fix

d8b838f

ahcorde requested changes Apr 8, 2026

View reviewed changes

yuanknv requested review from MiguelCompany, ahcorde, hidmic, karsten-nvidia, mjcarroll, nvcyc and tfoote April 10, 2026 16:30

yuanknv and others added 4 commits April 10, 2026 11:02

Apply suggestions from code review

06d403e

Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com> Signed-off-by: yuanknv <113960800+yuanknv@users.noreply.github.com>

Apply suggestions from code review

7715637

Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com> Signed-off-by: yuanknv <113960800+yuanknv@users.noreply.github.com>

Update torch_buffer_backend/torch_buffer/include/torch_buffer/torch_b…

06c267b

…uffer_api.hpp Co-authored-by: Alejandro Hernández Cordero <ahcorde@gmail.com> Signed-off-by: yuanknv <113960800+yuanknv@users.noreply.github.com>

address comments

98e286a

yuanknv added 7 commits April 10, 2026 18:02

fix lints

e0d3a7e

add libtorch_vendor package, update to_buffer function

cb2e100

update API and readme

3b4dc6b

move the libtorch_vendor to the root folder

9def199

update from_buffer function and libtorch version

f8cd8c9

update readme

6f2593a

add torch zero-copy from_buffer param

701188b

ahcorde requested changes Apr 17, 2026

View reviewed changes

libtorch_vendor auto detect CUDA version

34f035f

add contributing file

f3808ae

Conversation

yuanknv commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Is this user-facing behavior change?

Did you use Generative AI?

Additional Information

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahcorde Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

yuanknv Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ahcorde Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

yuanknv Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahcorde Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

ahcorde Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

yuanknv Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

ahcorde left a comment

Choose a reason for hiding this comment

Uh oh!

yuanknv commented Apr 10, 2026

Uh oh!

ahcorde left a comment

Choose a reason for hiding this comment

Uh oh!

yuanknv commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yuanknv commented Apr 7, 2026 •

edited

Loading