KStar Diffuser Plugins for Bimanual Robotic Manipulation

[CVPR 2025] Repository-aligned implementation notes and usage guide for the paper "Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation".

Authors

Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie

Links

Paper: arXiv
PDF: PDF

Updates

[03/2025] KStar Diffuser was released on arXiv and accepted by CVPR 2025

Introduction

Bimanual robotic manipulation is harder than single-arm control because the policy must model synchronization, collision-aware structure, and kinematic feasibility across two arms at the same time. The paper proposes Kinematics enhanced Spatial-TemporAl gRaph Diffuser (KStar Diffuser), which improves diffusion-based imitation learning with:

A dynamic spatial-temporal robot graph that encodes dual-arm structure and motion history
A differentiable kinematics regularizer that aligns predicted end-effector poses with feasible joint-space behavior
A diffusion policy conditioned on language, multi-view RGB-D observations, and robot state

For the planned open-source release, we expose the method as two plugin-style components:

A graph encoding plugin for dual-arm structural and temporal reasoning
A kinematic regularizer plugin for turning joint predictions into kinematically meaningful end-effector constraints

They can be attached to different downstream policy backbones rather than being tied to one fixed end-to-end architecture.

Highlights

Dynamic spatial-temporal graph over both Panda arms and multiple history steps
Differentiable forward kinematics regularization with pytorch_kinematics and Panda URDFs
Multiple graph encoders are supported: GCN, GAT, MPNN, GraphSAGE, and EGNN
Plugin-style design that can be attached to different policy heads
The public release focuses on the reusable graph and regularizer plugins

Method

The paper proposes KStar Diffuser as a diffusion policy for bimanual manipulation that explicitly reasons about robot structure and kinematics. In this repository, the public-facing part can be understood as two reusable modules:

Graph Encoding in src/utils/data_utils.py and src/models/gnn_models.py
Kinematic Regularizer in src/utils/model_utils.py

The original full research code plugs these two modules into a downstream policy, but the open-source core is centered on the plugins themselves.

Graph Encoding

The graph branch is the most distinctive part of this repository. It is enabled by enable_graph=True and is built from dual-arm joint coordinates over multiple history steps.

Node features

Implemented in src/utils/data_utils.py::build_node_features.

For each joint at each history step, the node feature concatenates:

Joint 3D coordinate
Arm identity one-hot label: right arm [1, 0], left arm [0, 1]
Distances from the current joint to all other joints at the same timestep

If there are 14 joints in total, the default node feature dimension is:

3 + 2 + 14 = 19

This matches the default config:

gnn_input_dim: 19

Edge construction

Implemented in src/utils/data_utils.py::build_edge_features.

The graph contains two edge types:

Spatial edges inside each timestep
- right arm chain: (0,1) (1,2) ... (5,6)
- left arm chain: (7,8) (8,9) ... (12,13)
Temporal edges across history
- the same joint index is connected between adjacent timesteps

The edge attribute is a scalar distance:

spatial edge: Euclidean distance between connected joints in the same timestep
temporal edge: Euclidean displacement of the same joint between neighboring timesteps

Supported graph backbones

Implemented in src/models/gnn_models.py.

gcn
gat
mpnn
sage
egnn

The graph encoder output can be pooled and attached to any downstream policy, planner, or decoder that needs robot-structure-aware features.

Kinematic Regularizer

The second exposed module is the kinematic regularizer. Its role is to convert future joint predictions into end-effector-space constraints through differentiable forward kinematics.

Implemented across:

src/utils/model_utils.py
a Panda-compatible URDF file used by pytorch_kinematics

In the current implementation, the regularizer works as follows:

Predict future joint positions for the right and left arms
Run differentiable forward kinematics with pytorch_kinematics
Convert transformation matrices to RLBench-style pose representation
Use the resulting poses as an additional conditioning signal
Optionally optimize an auxiliary joint prediction loss together with the downstream task loss

In the original full integration, the resulting end-effector pose is projected back into a conditioning feature and combined with graph-enhanced observation features.

One example objective is:

loss = loss_lambda * ee_loss + (1 - loss_lambda) * joint_loss

From a plugin perspective, this module does not require a specific decoder. It only assumes that your downstream network can produce future joint predictions that can be fed into the kinematics chain.

Project Structure

.
├── src/
│   ├── utils/data_utils.py                # Graph node / edge construction
│   ├── models/gnn_models.py               # GCN, GAT, MPNN, GraphSAGE, EGNN graph encoders
│   ├── models/egnn.py                     # EGNN layer used by the graph encoder
│   ├── utils/model_utils.py               # Pose conversion and kinematics helpers
│   └── utils/CONSTANT.py                  # Task constants
├── requirements.txt
├── LICENSE
├── stard.png
└── README.md

Installation

1. Create a Python environment

The dependency comments in requirements.txt suggest a Python 3.8 setup.

conda create -n krgb python=3.8 -y
conda activate krgb

2. Install plugin dependencies

The two public modules mainly depend on:

torch
torch_geometric
pytorch_kinematics
numpy

To stay consistent with the repository environment, you can install the full requirements:

pip install -r requirements.txt

The rest of the repository contains an end-to-end research implementation, but that is not the main public API of the open-source release.

Dataset / Benchmark

Benchmark in the paper

The paper evaluates KStar Diffuser on RLBench2 simulated bimanual tasks.

Setting	Tasks
Simulated RLBench2	`push_box`, `lift_ball`, `handover_item_easy`, `pick_laptop`, `sweep_to_dustpan`

The paper reports both 20-demo and 100-demo training settings in simulation.

Plugin input assumptions

The graph plugin expects a joint-coordinate tensor with shape:

[history, num_joints, 3]

In the current implementation:

history is typically 3
num_joints is 14 for two 7-DoF Panda arms
each joint is represented by its 3D coordinate

The regularizer plugin expects predicted future joint positions that can be fed into a Panda forward-kinematics chain. In the example implementation, that prediction has shape:

[batch, horizon, 14]

Usage

Plugin 1: Graph Encoding

import numpy as np
from torch_geometric.data import Data
from src.utils.data_utils import build_node_features, build_edge_features
from src.models.gnn_models import GCNGraph

history = 3
num_joints = 14
joint_coordinations = np.random.randn(history, num_joints, 3)

node_features = build_node_features(joint_coordinations, history)
edge_index, edge_attr = build_edge_features(joint_coordinations, history)

graph_data = Data(
    x=node_features.view(history * num_joints, -1),
    edge_index=edge_index,
    edge_attr=edge_attr.unsqueeze(-1),
)

gnn = GCNGraph(
    input_dim=19,
    hidden_dim=128,
    output_dim=128,
    num_layers=4,
    activation="silu",
    norm="layer",
)

graph_output = gnn(graph_data.x, graph_data.edge_index)

If you want edge-aware message passing, replace GCNGraph with GATGraph, MPNNGraph, or EqGraph and pass edge_attr.

Plugin 2: Kinematic Regularizer

The minimal example below shows forward kinematics for one arm. In the full bimanual setting, the same procedure is applied to the right and left arms separately.

import torch
import pytorch_kinematics as pk
from src.utils.model_utils import matrix_to_rlb_pose, proc_quaternion

chain = pk.build_serial_chain_from_urdf(
    open("path/to/panda.urdf").read(),
    "Pandatip",
)

joint_prediction = torch.randn(8, 7)
fk_output = chain.forward_kinematics(joint_prediction)
ee_pose = proc_quaternion(matrix_to_rlb_pose(fk_output.get_matrix()))

This regularizer can be attached to any downstream model that predicts future joint positions. In the original full implementation, the resulting end-effector pose is projected back into a conditioning feature and combined with the graph-enhanced observation representation.

Plugin-style integration

The intended usage is:

Use the graph encoding plugin to inject structural and temporal robot bias
Use the kinematic regularizer plugin to constrain future predictions in a physically meaningful way
Attach one or both plugins to your own policy backbone, action head, or diffusion decoder

Results

The following numbers come from the paper, not from a fresh re-run inside this repository.

Simulated RLBench2 Results

Training demos	Push Box	Lift Ball	Handover Item (easy)	Pick Laptop	Sweep Dustpan	Overall
20 demos	79.3	87.0	23.7	17.0	83.0	58.0
100 demos	83.0	98.7	27.0	43.7	89.0	68.2

According to the paper, KStar Diffuser outperforms prior transformer-based and diffusion-based baselines by more than 10 percentage points in overall success rate, and the ablation study confirms that both the Spatial-Temporal Graph and the Kinematics Regularizer are important.

Citation

If you use this repository or the KStar Diffuser paper in your research, please cite:

@InProceedings{Lv_2025_CVPR,
    author    = {Lv, Qi and Li, Hao and Deng, Xiang and Shao, Rui and Li, Yinchuan and Hao, Jianye and Gao, Longxiang and Wang, Michael Yu and Nie, Liqiang},
    title     = {Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {17394-17404}
}

Acknowledgement

The method description and reported results are based on the KStar Diffuser paper
The simulator stack builds on RLBench and PyRep
The implementation uses PyTorch, Hugging Face Transformers, Diffusers, and PyTorch Geometric

License

This repository is released under the Apache License 2.0. See the top-level LICENSE file for the full text.

External third-party dependencies keep their own original licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
stard.png		stard.png

Folders and files

Latest commit

History

Repository files navigation

KStar Diffuser Plugins for Bimanual Robotic Manipulation

Authors

Links

Table of Contents

Updates

Introduction

Highlights

Method

Graph Encoding

Node features

Edge construction

Supported graph backbones

Kinematic Regularizer

Project Structure

Installation

1. Create a Python environment

2. Install plugin dependencies

Dataset / Benchmark

Benchmark in the paper

Plugin input assumptions

Usage

Plugin 1: Graph Encoding

Plugin 2: Kinematic Regularizer

Plugin-style integration

Results

Simulated RLBench2 Results

Citation

Acknowledgement

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages