Skip to content

iLearn-Lab/CVPR25-KStar-Diffuser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KStar Diffuser Plugins for Bimanual Robotic Manipulation

[CVPR 2025] Repository-aligned implementation notes and usage guide for the paper "Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation".

Authors

Qi Lv, Hao Li, Xiang Deng, Rui Shao, Yinchuan Li, Jianye Hao, Longxiang Gao, Michael Yu Wang, Liqiang Nie

Links


Table of Contents


Updates

  • [03/2025] KStar Diffuser was released on arXiv and accepted by CVPR 2025

Introduction

Bimanual robotic manipulation is harder than single-arm control because the policy must model synchronization, collision-aware structure, and kinematic feasibility across two arms at the same time. The paper proposes Kinematics enhanced Spatial-TemporAl gRaph Diffuser (KStar Diffuser), which improves diffusion-based imitation learning with:

  1. A dynamic spatial-temporal robot graph that encodes dual-arm structure and motion history
  2. A differentiable kinematics regularizer that aligns predicted end-effector poses with feasible joint-space behavior
  3. A diffusion policy conditioned on language, multi-view RGB-D observations, and robot state

For the planned open-source release, we expose the method as two plugin-style components:

  1. A graph encoding plugin for dual-arm structural and temporal reasoning
  2. A kinematic regularizer plugin for turning joint predictions into kinematically meaningful end-effector constraints

They can be attached to different downstream policy backbones rather than being tied to one fixed end-to-end architecture.


Highlights

  • Dynamic spatial-temporal graph over both Panda arms and multiple history steps
  • Differentiable forward kinematics regularization with pytorch_kinematics and Panda URDFs
  • Multiple graph encoders are supported: GCN, GAT, MPNN, GraphSAGE, and EGNN
  • Plugin-style design that can be attached to different policy heads
  • The public release focuses on the reusable graph and regularizer plugins

Method

The paper proposes KStar Diffuser as a diffusion policy for bimanual manipulation that explicitly reasons about robot structure and kinematics. In this repository, the public-facing part can be understood as two reusable modules:

  1. Graph Encoding in src/utils/data_utils.py and src/models/gnn_models.py
  2. Kinematic Regularizer in src/utils/model_utils.py

The original full research code plugs these two modules into a downstream policy, but the open-source core is centered on the plugins themselves.

KStar Diffuser plugin overview

Graph Encoding

The graph branch is the most distinctive part of this repository. It is enabled by enable_graph=True and is built from dual-arm joint coordinates over multiple history steps.

Node features

Implemented in src/utils/data_utils.py::build_node_features.

For each joint at each history step, the node feature concatenates:

  1. Joint 3D coordinate
  2. Arm identity one-hot label: right arm [1, 0], left arm [0, 1]
  3. Distances from the current joint to all other joints at the same timestep

If there are 14 joints in total, the default node feature dimension is:

3 + 2 + 14 = 19

This matches the default config:

gnn_input_dim: 19

Edge construction

Implemented in src/utils/data_utils.py::build_edge_features.

The graph contains two edge types:

  1. Spatial edges inside each timestep
    • right arm chain: (0,1) (1,2) ... (5,6)
    • left arm chain: (7,8) (8,9) ... (12,13)
  2. Temporal edges across history
    • the same joint index is connected between adjacent timesteps

The edge attribute is a scalar distance:

  • spatial edge: Euclidean distance between connected joints in the same timestep
  • temporal edge: Euclidean displacement of the same joint between neighboring timesteps

Supported graph backbones

Implemented in src/models/gnn_models.py.

  • gcn
  • gat
  • mpnn
  • sage
  • egnn

The graph encoder output can be pooled and attached to any downstream policy, planner, or decoder that needs robot-structure-aware features.

Kinematic Regularizer

The second exposed module is the kinematic regularizer. Its role is to convert future joint predictions into end-effector-space constraints through differentiable forward kinematics.

Implemented across:

  • src/utils/model_utils.py
  • a Panda-compatible URDF file used by pytorch_kinematics

In the current implementation, the regularizer works as follows:

  1. Predict future joint positions for the right and left arms
  2. Run differentiable forward kinematics with pytorch_kinematics
  3. Convert transformation matrices to RLBench-style pose representation
  4. Use the resulting poses as an additional conditioning signal
  5. Optionally optimize an auxiliary joint prediction loss together with the downstream task loss

In the original full integration, the resulting end-effector pose is projected back into a conditioning feature and combined with graph-enhanced observation features.

One example objective is:

loss = loss_lambda * ee_loss + (1 - loss_lambda) * joint_loss

From a plugin perspective, this module does not require a specific decoder. It only assumes that your downstream network can produce future joint predictions that can be fed into the kinematics chain.


Project Structure

.
├── src/
│   ├── utils/data_utils.py                # Graph node / edge construction
│   ├── models/gnn_models.py               # GCN, GAT, MPNN, GraphSAGE, EGNN graph encoders
│   ├── models/egnn.py                     # EGNN layer used by the graph encoder
│   ├── utils/model_utils.py               # Pose conversion and kinematics helpers
│   └── utils/CONSTANT.py                  # Task constants
├── requirements.txt
├── LICENSE
├── stard.png
└── README.md

Installation

1. Create a Python environment

The dependency comments in requirements.txt suggest a Python 3.8 setup.

conda create -n krgb python=3.8 -y
conda activate krgb

2. Install plugin dependencies

The two public modules mainly depend on:

  • torch
  • torch_geometric
  • pytorch_kinematics
  • numpy

To stay consistent with the repository environment, you can install the full requirements:

pip install -r requirements.txt

The rest of the repository contains an end-to-end research implementation, but that is not the main public API of the open-source release.


Dataset / Benchmark

Benchmark in the paper

The paper evaluates KStar Diffuser on RLBench2 simulated bimanual tasks.

Setting Tasks
Simulated RLBench2 push_box, lift_ball, handover_item_easy, pick_laptop, sweep_to_dustpan

The paper reports both 20-demo and 100-demo training settings in simulation.

Plugin input assumptions

The graph plugin expects a joint-coordinate tensor with shape:

[history, num_joints, 3]

In the current implementation:

  • history is typically 3
  • num_joints is 14 for two 7-DoF Panda arms
  • each joint is represented by its 3D coordinate

The regularizer plugin expects predicted future joint positions that can be fed into a Panda forward-kinematics chain. In the example implementation, that prediction has shape:

[batch, horizon, 14]

Usage

Plugin 1: Graph Encoding

import numpy as np
from torch_geometric.data import Data
from src.utils.data_utils import build_node_features, build_edge_features
from src.models.gnn_models import GCNGraph

history = 3
num_joints = 14
joint_coordinations = np.random.randn(history, num_joints, 3)

node_features = build_node_features(joint_coordinations, history)
edge_index, edge_attr = build_edge_features(joint_coordinations, history)

graph_data = Data(
    x=node_features.view(history * num_joints, -1),
    edge_index=edge_index,
    edge_attr=edge_attr.unsqueeze(-1),
)

gnn = GCNGraph(
    input_dim=19,
    hidden_dim=128,
    output_dim=128,
    num_layers=4,
    activation="silu",
    norm="layer",
)

graph_output = gnn(graph_data.x, graph_data.edge_index)

If you want edge-aware message passing, replace GCNGraph with GATGraph, MPNNGraph, or EqGraph and pass edge_attr.

Plugin 2: Kinematic Regularizer

The minimal example below shows forward kinematics for one arm. In the full bimanual setting, the same procedure is applied to the right and left arms separately.

import torch
import pytorch_kinematics as pk
from src.utils.model_utils import matrix_to_rlb_pose, proc_quaternion

chain = pk.build_serial_chain_from_urdf(
    open("path/to/panda.urdf").read(),
    "Pandatip",
)

joint_prediction = torch.randn(8, 7)
fk_output = chain.forward_kinematics(joint_prediction)
ee_pose = proc_quaternion(matrix_to_rlb_pose(fk_output.get_matrix()))

This regularizer can be attached to any downstream model that predicts future joint positions. In the original full implementation, the resulting end-effector pose is projected back into a conditioning feature and combined with the graph-enhanced observation representation.

Plugin-style integration

The intended usage is:

  1. Use the graph encoding plugin to inject structural and temporal robot bias
  2. Use the kinematic regularizer plugin to constrain future predictions in a physically meaningful way
  3. Attach one or both plugins to your own policy backbone, action head, or diffusion decoder

Results

The following numbers come from the paper, not from a fresh re-run inside this repository.

Simulated RLBench2 Results

Training demos Push Box Lift Ball Handover Item (easy) Pick Laptop Sweep Dustpan Overall
20 demos 79.3 87.0 23.7 17.0 83.0 58.0
100 demos 83.0 98.7 27.0 43.7 89.0 68.2

According to the paper, KStar Diffuser outperforms prior transformer-based and diffusion-based baselines by more than 10 percentage points in overall success rate, and the ablation study confirms that both the Spatial-Temporal Graph and the Kinematics Regularizer are important.


Citation

If you use this repository or the KStar Diffuser paper in your research, please cite:

@InProceedings{Lv_2025_CVPR,
    author    = {Lv, Qi and Li, Hao and Deng, Xiang and Shao, Rui and Li, Yinchuan and Hao, Jianye and Gao, Longxiang and Wang, Michael Yu and Nie, Liqiang},
    title     = {Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {17394-17404}
}

Acknowledgement

  • The method description and reported results are based on the KStar Diffuser paper
  • The simulator stack builds on RLBench and PyRep
  • The implementation uses PyTorch, Hugging Face Transformers, Diffusers, and PyTorch Geometric

License

This repository is released under the Apache License 2.0. See the top-level LICENSE file for the full text.

External third-party dependencies keep their own original licenses.

About

Official repository of CVPR 2025 - Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages