GStreamer plugin suite for zero-copy video processing on NVIDIA Jetson platforms (Xavier, Orin). Wraps the Tegra-native NvBufSurface / NVMM memory model in proper GStreamer elements, enabling hardware-accelerated crop, scale, format conversion, and inter-process video sharing — all without CPU copies.
Built with C++14 internals and a C ABI boundary to GStreamer.
On Jetson, the native video buffer type is NvBufSurface (NVMM) — physically contiguous, DMA-coherent memory managed by the Tegra VIC hardware engine. The standard GStreamer nvcodec plugin targets discrete desktop GPUs via CUDA and doesn't understand NVMM.
This creates a gap:
nvv4l2decoderoutputsvideo/x-raw(memory:NVMM)but no upstream GStreamer element can consume it without a CPU copy- Crop/scale on NVMM requires the proprietary
nvvidconvelement, which is tied to specific JetPack versions - No standard
GstAllocatorexists forNvBufSurface, so every team writes their own
gst-nvmm-cpp fills this gap with open-source, tested, upstream-ready GStreamer elements.
Video crop, scale, and color format conversion using the Tegra VIC (Video Image Compositor) hardware engine. Zero CPU involvement.
| Property | Type | Default | Description |
|---|---|---|---|
crop-x |
uint | 0 | Source crop X offset (pixels) |
crop-y |
uint | 0 | Source crop Y offset (pixels) |
crop-w |
uint | 0 | Source crop width (0 = full width) |
crop-h |
uint | 0 | Source crop height (0 = full height) |
flip-method |
int | 0 | 0=none, 1=90CW, 2=180, 3=90CCW, 4=flipH, 5=transpose, 6=flipV, 7=inv-transpose |
Supported formats: NV12, RGBA, I420, BGRA
Caps: video/x-raw(memory:NVMM), format={NV12,RGBA,I420,BGRA}, width=[1,8192], height=[1,8192]
Exports NVMM video frames to a POSIX shared memory segment for zero-copy inter-process communication. Consumers (ROS2 nodes, inference engines, visualization tools) attach to the segment and read frames without serialization overhead.
| Property | Type | Default | Description |
|---|---|---|---|
shm-name |
string | /nvmm_sink_0 |
POSIX shared memory segment name |
export-dmabuf |
bool | false | Export DMA-buf fd in shared memory header |
Shared memory protocol:
[ ShmHeader (128 bytes) ][ frame data ]
The ShmHeader contains:
- Magic (
0x4E564D4D= "NVMM"), version, width, height, pixel format - Per-plane pitches and offsets
- DMA-buf fd (when
export-dmabuf=true) - Monotonic frame number and PTS timestamp (nanoseconds)
- Atomic
readyflag for lock-free consumer reads
Reads NVMM video frames from a POSIX shared memory segment written by nvmmsink or an external producer. Pushes frames into a GStreamer pipeline.
| Property | Type | Default | Description |
|---|---|---|---|
shm-name |
string | /nvmm_sink_0 |
POSIX shared memory segment name to read from |
is-live |
bool | true | Whether this source is a live source |
Auto-detects video format, resolution, and plane layout from the ShmHeader on the first frame.
A GstAllocator subclass that wraps NvBufSurfaceCreate / NvBufSurfaceDestroy with proper GstMemory semantics.
// Allocate an NVMM buffer with explicit video format/dimensions
// (follows the GstGLMemory/GstVulkanImageMemory pattern)
GstAllocator *alloc = gst_nvmm_allocator_new(0 /* NVBUF_MEM_DEFAULT */);
GstMemory *mem = gst_nvmm_allocator_alloc_video(alloc,
GST_VIDEO_FORMAT_NV12, 1920, 1080);
// Check if memory is NVMM
if (gst_is_nvmm_memory(mem)) {
NvBufSurface *surface = gst_nvmm_memory_get_surface(mem);
// ... use surface directly with NVIDIA APIs
}
gst_memory_unref(mem);
gst_object_unref(alloc);+----------------------------------------------------------------+
| GStreamer Pipeline |
| |
| +----------+ +--------------+ +-----------+ |
| | decoder |--->| nvmmconvert |--->| nvmmsink |--> SHM |
| |(nvv4l2) | | (VIC h/w) | | (POSIX) | |
| +----------+ +--------------+ +-----------+ |
| | | | |
| v v v |
| +----------------------------------------------+ |
| | GstNvmmAllocator | |
| | alloc --> NvBufSurfaceCreate | |
| | map --> NvBufSurfaceMap | |
| | unmap --> NvBufSurfaceUnMap | |
| | free --> NvBufSurfaceDestroy | |
| | fd --> bufferDesc (DMA-buf) | |
| +----------------------------------------------+ |
| | |
| v |
| +--------------------+ |
| | NvBufSurface API | (libnvbufsurface.so) |
| | NvBufSurfTransform| (libnvbufsurftransform.so) |
| | Tegra VIC Engine | |
| +--------------------+ |
+----------------------------------------------------------------+
+-----------+
SHM -->|nvmmappsrc |--> downstream pipeline (ROS2, inference, etc.)
+-----------+
The ABI boundary to GStreamer is C (plugin_init, GObject type system). Inside that boundary, everything is C++14.
| Pattern | Implementation |
|---|---|
| RAII for NvBufSurface | nvmm::NvmmBuffer owns the surface, calls NvBufSurfaceDestroy in destructor |
| Result type | nvmm::Result<T> (C++14 aligned_storage-based, supports move-only types) |
| Non-owning byte views | nvmm::ByteSpan — lightweight replacement for std::span<uint8_t> |
| Type-safe enums | nvmm::MemoryType, nvmm::ColorFormat, nvmm::FlipMethod |
| Zero GLib in internals | G_BEGIN_DECLS/G_END_DECLS only at plugin boundaries |
- GStreamer >= 1.16 development libraries
- Meson >= 0.62, Ninja
- C++14 compiler (GCC 7+ or Clang 5+)
- On Jetson: JetPack 5 (L4T 35.x) or JetPack 6 (L4T 36.x)
# Host (x86_64) -- builds with mock NvBufSurface API for testing
docker build -f docker/Dockerfile.dev -t gst-nvmm-cpp:dev .
docker run --rm gst-nvmm-cpp:dev# Build (uses ubuntu:22.04, mounts host NVIDIA libs at runtime)
docker build --network host -f docker/Dockerfile.jetson -t gst-nvmm-cpp:jetson .
# Run tests + pipelines (mount NVIDIA runtime libs and GStreamer plugins)
docker run --runtime nvidia --rm --network host --privileged \
-v /usr/lib/aarch64-linux-gnu/tegra:/usr/lib/aarch64-linux-gnu/tegra:ro \
-v /usr/lib/aarch64-linux-gnu/tegra-egl:/usr/lib/aarch64-linux-gnu/tegra-egl:ro \
-v /usr/lib/aarch64-linux-gnu/gstreamer-1.0:/usr/lib/aarch64-linux-gnu/gstreamer-1.0:ro \
-v /usr/src/jetson_multimedia_api:/usr/src/jetson_multimedia_api:ro \
-v /usr/share/glvnd:/usr/share/glvnd:ro \
-v /etc/alternatives:/etc/alternatives:ro \
-v /etc/ld.so.conf.d:/etc/ld.so.conf.d:ro \
gst-nvmm-cpp:jetsonpip3 install meson
meson setup builddir -Dcpp_std=c++14 -Dbuildtype=debugoptimized
ninja -C builddir
meson test -C builddir --verboseOn hosts without Jetson libraries, meson automatically detects the absence of libnvbufsurface.so and builds with the mock API — a header-only stub that implements the same NvBufSurface struct layout and function signatures using heap memory. All tests pass against the mock.
Validated on two Jetson platforms (both in Docker and native):
- Jetson Xavier NX — JetPack 5.1 (L4T R35.2.1), GStreamer 1.16.3
- Jetson Orin NX — JetPack 6 (L4T R36.4.3), GStreamer 1.20.3
All 7 test suites pass on both Xavier NX and Orin NX:
1/7 nvmm_buffer OK 10 passed (create, map, move, release, export_fd, planes)
2/7 nvmm_transform OK 6 passed (scale, crop, convert, flip, null safety)
3/7 gst_nvmm_allocator OK 6 passed (create, alloc, surface map, per-plane, roundtrip)
4/7 nvmm_sink OK 4 passed (create, properties, state, shm lifecycle)
5/7 nvmm_appsrc OK 3 passed (create, properties, sink-to-source IPC)
6/7 gstcheck_elements OK 8 passed (discovery, state, properties, caps, pipeline)
7/7 integration OK 7 passed (shm roundtrip, multi-shm, dynamic props, stress)
Ok: 7 Fail: 0
8 pipeline tests also pass via scripts/jetson-test.sh:
passthrough, flip-180, scale, crop, format-convert, decoder, tee-2way, 30f-throughput.
| Test | Result |
|---|---|
| State changes x100 (NULL→READY→NULL) | PASS |
| 500f pool stress (1080p→720p, flip) | PASS (21s) |
| 50 rapid pool recreate cycles | PASS |
| tee x3 with different transforms | PASS |
| Caps renegotiation (4 resolution changes) | PASS |
| Sanitizer | Tests | Result |
|---|---|---|
| AddressSanitizer | 22 (buffer + transform + allocator) | Clean |
| ThreadSanitizer | 22 (buffer + transform + allocator) | Clean |
1000 iterations each. VIC transform includes hardware sync.
Xavier NX (JetPack 5.1)
| Operation | Resolution | Avg (us) | Min (us) | Max (us) |
|---|---|---|---|---|
| alloc/free | NV12 1080p | 591 | 128 | 2291 |
| alloc/free | RGBA 1080p | 2095 | 1072 | 2129 |
| alloc/free | NV12 4K | 3245 | 2091 | 2701 |
| alloc/free | RGBA 4K | 10104 | 7110 | 9164 |
| map/unmap | NV12 1080p | 231 | 222 | 493 |
| VIC transform | 1080p -> 480p | 1947 | 1577 | 4752 |
| VIC transform | 1080p -> 720p | 1655 | 1594 | 1826 |
| VIC transform | 4K -> 1080p | 4002 | 3938 | 4913 |
Orin NX (JetPack 6)
| Operation | Resolution | Avg (us) | Min (us) | Max (us) |
|---|---|---|---|---|
| alloc/free | NV12 1080p | 117 | 14 | 1551 |
| alloc/free | RGBA 1080p | 366 | 33 | 1072 |
| map/unmap | NV12 1080p | 298 | 275 | 374 |
| map/unmap | NV12 480p | 49 | 39 | 61 |
| VIC transform | 1080p -> 480p | 35 | 27 | 49 |
| VIC transform | 1080p -> 720p | 95 | 85 | 114 |
| VIC transform | 4K -> 1080p | 285 | 217 | 459 |
| VIC transform | 4K -> 480p | 31 | 26 | 67 |
Orin allocation is 5x faster than Xavier NX. VIC transform 14-114x faster depending on resolution.
Both platforms pass: passthrough, flip, scale, crop, format convert, 500f stress, tee, decoder pipelines.
Evidence that the Tegra VIC (Video Image Compositor) hardware engine is engaged:
-
NvBufSurfTransform defaults to VIC compute on Jetson — the API selects
NvBufSurfTransformCompute_Defaultwhich maps to VIC on Tegra (not GPU or CPU). -
Transform latency confirms hardware acceleration — 21 us per 1080p-to-480p scale operation. A CPU-based scale at 1080p would take several milliseconds. The ~47,000 FPS throughput is only achievable via dedicated hardware.
-
NVMM SURFACE_ARRAY memory type confirms DMA-coherent allocation — tests use
NVBUF_MEM_DEFAULTwhich resolves toNVBUF_MEM_SURFACE_ARRAYon Jetson. This memory type is physically contiguous and managed by the VIC/NVDEC hardware engines. Tests FAIL when usingNVBUF_MEM_SYSTEM(malloc'd memory) for operations that require hardware access, proving the hardware path is in use. -
DMA-buf fd export works —
export_fd()returns a valid file descriptor frombufferDesc, confirming the buffer lives in DMA-coherent hardware memory. -
VIC device node —
/dev/nvhost-vicis present and accessible.
All three transfer directions verified on Jetson:
| Path | Pipeline | Result |
|---|---|---|
| CPU -> GPU | videotestsrc ! nvvidconv ! NVMM ! nvmmsink |
OK |
| GPU -> GPU | nvv4l2decoder(NVMM) ! nvvidconv ! NVMM(scaled) ! nvmmsink |
OK |
| GPU -> CPU | nvv4l2decoder(NVMM) ! nvvidconv ! x-raw ! jpegenc ! file |
OK |
| Resolution | Alloc | Map | Transform (to 480p) | Pipeline |
|---|---|---|---|---|
| FHD 1920x1080 | 3103 us | 77 us | 5324 us | OK (133 KB JPEG) |
| 4K 3840x2160 | 263 us | 105 us | 17028 us | OK (491 KB JPEG) |
NvmmBuffer API results at both resolutions:
- NV12 plane layout: 2 planes (Y + UV)
- FHD data_size: 3,407,872 bytes (3.2 MB)
- 4K data_size: 12,582,912 bytes (12 MB)
- DMA-buf fd export: works at both resolutions
Tested with real GStreamer pipelines on Jetson:
# CPU -> GPU: test pattern to NVMM shared memory
gst-launch-1.0 videotestsrc num-buffers=3 ! \
'video/x-raw,width=1920,height=1080,format=I420' ! \
nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! \
nvmmsink shm-name=/test_cpu2gpu sync=false
# GPU -> GPU: H264 decode (NVMM) -> scale (NVMM) -> NVMM out
gst-launch-1.0 videotestsrc num-buffers=10 ! \
'video/x-raw,width=1920,height=1080' ! x264enc tune=zerolatency ! \
nvv4l2decoder ! 'video/x-raw(memory:NVMM)' ! \
nvvidconv ! 'video/x-raw(memory:NVMM),width=640,height=480' ! \
nvmmsink shm-name=/test_gpu2gpu sync=false
# GPU -> CPU: decode to NVMM, convert to CPU, save JPEG
gst-launch-1.0 videotestsrc num-buffers=1 ! \
'video/x-raw,width=1920,height=1080' ! x264enc tune=zerolatency ! \
nvv4l2decoder ! 'video/x-raw(memory:NVMM)' ! \
nvvidconv ! 'video/x-raw,format=I420' ! \
nvjpegenc ! filesink location=gpu2cpu_1080p.jpg
# 4K CPU -> NVMM -> CPU roundtrip
gst-launch-1.0 videotestsrc num-buffers=1 pattern=smpte ! \
'video/x-raw,width=3840,height=2160,format=I420' ! \
nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! \
nvvidconv ! 'video/x-raw,format=I420' ! \
nvjpegenc ! filesink location=4k_roundtrip.jpg
# 4K -> FHD scale via NVMM (GPU -> GPU)
gst-launch-1.0 videotestsrc num-buffers=1 pattern=ball ! \
'video/x-raw,width=3840,height=2160,format=I420' ! \
nvvidconv ! 'video/x-raw(memory:NVMM),format=NV12' ! \
nvvidconv ! 'video/x-raw(memory:NVMM),width=1920,height=1080' ! \
nvvidconv ! 'video/x-raw,format=I420' ! \
nvjpegenc ! filesink location=4k_to_fhd.jpgVerified inter-process video sharing via POSIX shared memory:
# Producer (background): write SMPTE frames to shm
gst-launch-1.0 videotestsrc num-buffers=50 pattern=smpte ! \
'video/x-raw,width=640,height=480,format=I420,framerate=10/1' ! \
nvmmsink shm-name=/ipc_test sync=true &
# Consumer: read from shm, save JPEG
gst-launch-1.0 -e nvmmappsrc shm-name=/ipc_test is-live=true ! \
videoconvert ! jpegenc ! filesink location=ipc_480p.jpgIPC consumer output at 480p and 1080p (H264 decode → NVMM → CPU → shm → consumer):
Also verified the SHM protocol with a standalone C consumer (ROS2-style):
- Header fields (magic, resolution, format, frame number, timestamp) read correctly
- Pixel data integrity verified via write/read roundtrip
All operations verified via gst-launch-1.0 on Jetson Xavier NX:
| Operation | Output |
|---|---|
| Passthrough | ![]() |
| Flip 180° | ![]() |
| Flip horizontal | ![]() |
| Scale 1080p→480p | ![]() |
| Crop (100,50,800,600) | ![]() |
All images generated on Jetson Xavier NX with real NVMM hardware:
# 1. Install meson
pip3 install meson
# 2. Clone and build
git clone https://github.com/PavelGuzenfeld/gst-nvmm-cpp.git
cd gst-nvmm-cpp
meson setup builddir -Dcpp_std=c++14 -Dbuildtype=debugoptimized -Dwerror=false
ninja -C builddir
# 3. Run tests (clear GStreamer registry cache first)
rm -f ~/.cache/gstreamer-1.0/registry.*.bin
LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra meson test -C builddir --verbose
# 4. Run benchmarks
LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra ./builddir/benchmarks/bench_nvmm
# 5. Use the plugins in pipelines
export GST_PLUGIN_PATH=$(pwd)/builddir/gst/nvmmconvert:$(pwd)/builddir/gst/nvmmsink:$(pwd)/builddir/gst/nvmmappsrc:$(pwd)/builddir/gst/nvmmalloc
export LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra:$(pwd)/builddir/gst/nvmmalloc
gst-inspect-1.0 nvmmconvertgst-launch-1.0 \
filesrc location=video.mp4 ! qtdemux ! h264parse ! nvv4l2decoder \
! 'video/x-raw(memory:NVMM)' \
! nvmmconvert \
! 'video/x-raw(memory:NVMM),width=640,height=480' \
! nvmmsink shm-name=/camera_feedgst-launch-1.0 \
... ! nvmmconvert crop-x=100 crop-y=50 crop-w=800 crop-h=600 ! ...# Rotate 180 degrees
gst-launch-1.0 ... ! nvmmconvert flip-method=2 ! ...
# Mirror horizontally
gst-launch-1.0 ... ! nvmmconvert flip-method=4 ! ...Process A (producer):
gst-launch-1.0 \
nvv4l2decoder ! nvmmsink shm-name=/video_feedProcess B (consumer):
gst-launch-1.0 \
nvmmappsrc shm-name=/video_feed ! videoconvert ! autovideosink// In a ROS2 node -- attach to shared memory written by nvmmsink
int fd = shm_open("/camera_feed", O_RDONLY, 0);
void *ptr = mmap(NULL, shm_size, PROT_READ, MAP_SHARED, fd, 0);
auto *header = static_cast<ShmHeader *>(ptr);
while (rclcpp::ok()) {
if (header->ready && header->frame_number > last_frame) {
auto *frame_data = (uint8_t *)ptr + sizeof(ShmHeader);
sensor_msgs::msg::Image msg;
msg.width = header->width;
msg.height = header->height;
msg.data.assign(frame_data, frame_data + header->data_size);
publisher->publish(msg);
last_frame = header->frame_number;
}
}| JetPack | L4T | Jetson | NvBufSurface | NvSciBuf | Status |
|---|---|---|---|---|---|
| 5.1.x | R35.x | Xavier NX | Yes | No | Tested |
| 6.x | R36.x | Orin NX | Yes | Yes | Tested |
| N/A | N/A | x86_64 desktop | Mock API | No | Testing only |
The build system auto-detects JetPack version via /etc/nv_tegra_release and enables NvSciBuf support on JP6.
42 tests across 7 suites:
| Suite | Tests | What it covers |
|---|---|---|
nvmm_buffer |
9 | NvmmBuffer RAII: create, map, unmap, move, export_fd, planes (NV12, RGBA, I420) |
nvmm_transform |
6 | NvmmTransform: scale, crop_and_scale, format convert, flip, null safety |
gst_nvmm_allocator |
5 | GstNvmmAllocator: create, alloc/free, map/unmap, write/read round-trip, non-NVMM rejection |
nvmm_sink |
4 | GstNvmmSink: element creation, properties, state transitions, shm lifecycle |
nvmm_appsrc |
3 | GstNvmmAppSrc: element creation, properties, sink-to-source integration via shm |
gstcheck_elements |
8 | Element discovery (3), state transitions (2), property validation, pad template caps, pipeline wiring |
integration |
7 | Shm data round-trip, multiple shm segments, dynamic properties, pipeline bin, alloc stress, protocol validation, error handling |
# Run all tests (Docker, x86_64)
docker build -f docker/Dockerfile.dev -t gst-nvmm-cpp:dev .
docker run --rm gst-nvmm-cpp:dev
# Run all tests (Jetson, native)
LD_LIBRARY_PATH=/usr/lib/aarch64-linux-gnu/tegra meson test -C builddir --verbosegst-nvmm-cpp/
├── gst/
│ ├── common/ # Shared C++ types and RAII wrappers
│ │ ├── nvmm_types.hpp # Result<T>, ByteSpan, enums, error codes
│ │ ├── nvmm_buffer.hpp # NvmmBuffer -- RAII wrapper for NvBufSurface
│ │ ├── nvmm_transform.hpp # NvmmTransform -- NvBufSurfTransform wrapper
│ │ ├── nvmm_buffer.cpp # Jetson implementation
│ │ ├── nvmm_transform.cpp # Jetson implementation
│ │ ├── nvbufsurface_mock.h # Mock API for x86_64 host builds
│ │ ├── *_mock.cpp # Mock implementations
│ │ └── meson.build
│ ├── nvmmalloc/ # GstNvmmAllocator plugin
│ ├── nvmmconvert/ # nvmmconvert element plugin
│ ├── nvmmsink/ # nvmmsink element plugin
│ └── nvmmappsrc/ # nvmmappsrc element plugin
├── tests/ # 42 unit + integration tests
├── benchmarks/ # Throughput benchmarks (CSV output)
├── test_output/ # Sample images from Jetson pipeline tests
├── docker/ # Dockerfiles for dev, JP5, JP6
├── meson.build # Top-level build (auto-detects Jetson)
└── README.md
These issues document the upstream gaps this project addresses:
- #4979 -- nvcodec: No Tegra/NVMM allocator path
- #4980 -- Missing GstAllocator wrapper for NvBufSurface
- #4981 -- NvBufSurfTransform has no GStreamer element
LGPL-2.1-or-later. See COPYING.












