Skip to content

Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension#5082

Open
mingxueg-nv wants to merge 4 commits intoisaac-sim:developfrom
mingxueg-nv:Adds_Assemble_Trocar_task_Based_RLinf
Open

Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension#5082
mingxueg-nv wants to merge 4 commits intoisaac-sim:developfrom
mingxueg-nv:Adds_Assemble_Trocar_task_Based_RLinf

Conversation

@mingxueg-nv
Copy link
Copy Markdown
Contributor

Description

Adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3), with RLinf support.

Key additions:

  • Task MDP: observations (body + Dex3 joint states), reward functions (4-stage sparse), termination conditions (timeout, success, object drop), and reset events (scene reset, task stage reset, random tray rotation).
  • Camera presets: front camera and left/right wrist cameras (TiledCamera, 224×224) configured for GR00T visual input.
  • Robot presets: G1 29-DoF + Dex3 articulation configuration.
  • GR00T data config: IsaacLabDataConfig defining video/state/action modality keys, transforms (SinCos state encoding, min-max action normalization, color jitter), and model-specific settings.
  • RLinf extension update: minor update to isaaclab_contrib/rl/rlinf/extension.py to support the new task registration.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@mingxueg-nv mingxueg-nv requested a review from Mayankm96 as a code owner March 23, 2026 04:12
@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 23, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 23, 2026

Greptile Summary

This PR adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3) robot, including a 4-stage sparse MDP (lift → tip-align → insert → place), three TiledCamera presets for GR00T visual input, robot articulation config, a GR00T data config, and a minor extension patch that adds a post-reset render-warmup to fix stale camera frames under RTX rendering.

Key issues found:

  • Hardcoded local robot USD path (/mnt/Data/lw_v3/… in robot_config.py) — every other asset uses a URL-based USD_ROOT; this will fail on any machine other than the author's workstation.
  • Sparse reward is 50× the documented value — rewards are emitted as 1 / env.step_dt (= 50.0) on stage transitions, but comments and RewardsCfg describe them as "1.0 per stage".
  • Stage advancement depends on implicit reward evaluation orderupdate_task_stage is only called from lift_trocars_reward; the other three reward functions silently assume it runs first each step.
  • Stale extras after render warmup_patched_reset re-computes obs after the warmup but returns the original extras, which may contain pre-warmup camera/observation data.
  • Module-level observation caches (_body_obs_cache, _dex3_obs_cache) keyed only on device+batch-size will silently corrupt data if two env instances coexist in the same process.
  • trocar_2 is missing collision_props that trocar_1 sets explicitly.

Confidence Score: 2/5

  • Not ready to merge: the hardcoded local robot USD path will break the task for all users outside the author's machine, and the sparse reward scaling discrepancy needs to be resolved before training results can be reasoned about.
  • Two P1 issues block merging: (1) the robot USD is at a local absolute path that no other contributor can resolve, making the task unusable out-of-the-box; (2) the sparse reward emits 50× the documented value per stage transition, which could mask training bugs and is inconsistent with the reward-weight comments. Additionally the implicit reward-ordering dependency and stale-extras bug in the extension patch need addressing before the feature is production-reliable.
  • config/robot_config.py (hardcoded path), mdp/rewards.py (reward scaling and ordering), rl/rlinf/extension.py (stale extras), mdp/observations.py (global cache)

Important Files Changed

Filename Overview
source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py Adds render warmup monkey-patch to env.reset. Stale extras returned after re-render is a correctness concern when extras contains observation data.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config/robot_config.py Robot USD path is hardcoded to /mnt/Data/lw_v3/…, breaking portability for any non-author machine. All other assets use a URL-based path.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/g129_dex3_env_cfg.py Main environment config assembling scene, observations, rewards, terminations, and events. trocar_2 lacks collision_props; USD_ROOT has an outstanding FIXME; otherwise structurally sound.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/observations.py Body and Dex3 joint state observations with preallocated tensor caches. Module-level global caches will interfere if multiple env instances coexist in the same process.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/rewards.py 4-stage sparse reward functions. Sparse reward value is 1/step_dt (=50) not 1.0 as documented; stage advancement relies on implicit ordering of reward term evaluation.
source/isaaclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/mdp/terminations.py Drop and task-success termination conditions. Logic is clear; imports get_task_stage from rewards (coupling that could be tidied but is not a bug).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Episode Start - Stage 0"] -->|"Both trocars lifted above table"| B["Stage 1: Lifted\nlift_trocars_reward fires"]
    B -->|"Tip distance < threshold"| C["Stage 2: Tips Aligned\ntrocar_tip_alignment_reward fires"]
    C -->|"Parallel AND centers close"| D["Stage 3: Inserted\ntrocar_insertion_reward fires"]
    D -->|"Both in target zone AND below z_min"| E["Stage 4: Placed\ntrocar_placement_reward fires"]
    E --> F["task_success_termination - Episode Done"]

    A -->|"Timeout 20s"| G["time_out termination"]
    A -->|"Object z below 0.5m"| H["object_drop_termination"]
    B --> H
    C --> H
    D --> H

    F --> I["Reset Events"]
    G --> I
    H --> I

    I --> R1["reset_scene_to_default"]
    I --> R2["reset_task_stage - clears caches"]
    I --> R3["reset_tray_with_random_rotation"]
    R1 --> A
    R2 --> A
    R3 --> A
Loading

Comments Outside Diff (1)

  1. source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py, line 13-31 (link)

    P2 Stale extras returned after render warmup

    _patched_reset re-computes obs from env.observation_manager after the warmup renders, but the extras dict that is returned still comes from _original_reset (i.e. before any warmup frames). If extras contains camera or privileged-observation data (as is common in Isaac Lab resets via "observations" or "log" keys), those values will reflect the pre-warmup render rather than the freshly-rendered state that obs reflects. This is an inconsistency between obs and extras that could silently corrupt the first observation seen by the policy after a reset.

    Consider also re-computing any camera-dependent fields of extras, or at least documenting that extras intentionally reflects the pre-warmup state.

Reviews (1): Last reviewed commit: "Add assemble_trocar task for G129-Dex3 a..." | Re-trigger Greptile

Comment on lines +276 to +282
stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device) / env.step_dt,
torch.zeros(env.num_envs, device=env.device),
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Sparse reward is 50× larger than documented

The RewardsCfg docstring and inline comments state each stage gives a reward of 1.0 on completion. However, the actual value emitted is 1.0 / env.step_dt. With decimation=4 and sim.dt=1/200, step_dt = 0.02 s, so the actual reward per transition step is 50.0, not 1.0. The same pattern is repeated in trocar_tip_alignment_reward (line 456), trocar_insertion_reward (line 570), and trocar_placement_reward (line 696).

If the intent is to produce a reward equivalent to "50 reward-units delivered over one policy step" (which is a valid design choice for sparse rewards), the comments and RewardsCfg docstring need to be updated to reflect this. If the intent is to give exactly 1.0, the division by step_dt should be removed.

Suggested change
stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device) / env.step_dt,
torch.zeros(env.num_envs, device=env.device),
)
reward = torch.where(
stage_just_completed,
torch.ones(env.num_envs, device=env.device),
torch.zeros(env.num_envs, device=env.device),
)

Comment on lines +101 to +206
def update_task_stage(
env: ManagerBasedRLEnv,
asset_cfg1: SceneEntityCfg,
asset_cfg2: SceneEntityCfg,
table_height: float = 0.85483,
lift_threshold: float = 0.05,
tip_align_threshold: float = 0.015,
insertion_dist_threshold: float = 0.03,
insertion_angle_threshold: float = 0.15,
placement_x_min: float = -1.8,
placement_x_max: float = -1.4,
placement_y_min: float = 1.5,
placement_y_max: float = 1.8,
placement_z_min: float = 0.9,
print_log: bool = False,
) -> torch.Tensor:
"""Update task stage based on current state.

This function checks conditions and advances stages automatically.
Once a stage is completed, it never goes back.
"""
stage = get_task_stage(env)

obj1: RigidObject = env.scene[asset_cfg1.name]
obj2: RigidObject = env.scene[asset_cfg2.name]

pos1 = wp.to_torch(obj1.data.root_pos_w)
pos2 = wp.to_torch(obj2.data.root_pos_w)
quat1 = wp.to_torch(obj1.data.root_quat_w)
quat2 = wp.to_torch(obj2.data.root_quat_w)

# Store old stage to detect changes (BEFORE any stage transitions)
old_stage = stage.clone()

# Stage 0 -> 1: Check if lifted
target_z = table_height + lift_threshold
is_lifted_1 = pos1[:, 2] > target_z
is_lifted_2 = pos2[:, 2] > target_z
both_lifted = is_lifted_1 & is_lifted_2
stage = torch.where((stage == 0) & both_lifted, torch.ones_like(stage), stage)

# Stage 1 -> 2: Check if tips are aligned (hole found)
# Get tip positions
tip_pos1 = get_trocar_tip_position(env, asset_cfg1)
tip_pos2 = get_trocar_tip_position(env, asset_cfg2)
tip_dist = torch.norm(tip_pos1 - tip_pos2, dim=-1)

# Tip alignment success
tip_aligned = tip_dist < tip_align_threshold
stage = torch.where((stage == 1) & tip_aligned, torch.full_like(stage, 2), stage)

# Stage 2 -> 3: Check if inserted (parallel + center close)
# Get center distance
center_dist = torch.norm(pos1 - pos2, dim=-1)

# Check alignment
target_axis1 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
target_axis2 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1)
axis1 = quat_apply(quat1, target_axis1)
axis2 = quat_apply(quat2, target_axis2)
dot_prod = torch.sum(axis1 * axis2, dim=-1)
abs_dot = torch.clamp(torch.abs(dot_prod), max=1.0)
angle = torch.acos(abs_dot)

# Insertion success: parallel + center close
is_parallel = angle < insertion_angle_threshold
center_close = center_dist < insertion_dist_threshold
is_inserted = is_parallel & center_close

stage = torch.where((stage == 2) & is_inserted, torch.full_like(stage, 3), stage)

# Stage 3 -> 4: Check if placed in target zone
# Get environment origins to handle multi-env spatial offsets
env_origins = env.scene.env_origins # shape: (num_envs, 3)

# Adjust target zone relative to each environment's origin
curr_x_min = env_origins[:, 0] + min(placement_x_min, placement_x_max) # (num_envs,)
curr_x_max = env_origins[:, 0] + max(placement_x_min, placement_x_max)
curr_y_min = env_origins[:, 1] + min(placement_y_min, placement_y_max)
curr_y_max = env_origins[:, 1] + max(placement_y_min, placement_y_max)

in_zone_1 = (
(pos1[:, 0] >= curr_x_min)
& (pos1[:, 0] <= curr_x_max)
& (pos1[:, 1] >= curr_y_min)
& (pos1[:, 1] <= curr_y_max)
& (pos1[:, 2] < placement_z_min)
)
in_zone_2 = (
(pos2[:, 0] >= curr_x_min)
& (pos2[:, 0] <= curr_x_max)
& (pos2[:, 1] >= curr_y_min)
& (pos2[:, 1] <= curr_y_max)
& (pos2[:, 2] < placement_z_min)
)
both_in_zone = in_zone_1 & in_zone_2
stage = torch.where((stage == 3) & both_in_zone, torch.full_like(stage, 4), stage)

# Print stage transitions (AFTER all stage transitions - always print when stage changes)
if print_log and (stage != old_stage).any():
for env_id in range(env.num_envs):
if stage[env_id] != old_stage[env_id]:
print(f"Env {env_id}: Stage {old_stage[env_id].item()} → {stage[env_id].item()}")

env._task_stage = stage
return stage
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Stage advancement depends on implicit reward evaluation order

update_task_stage (which advances env._task_stage) is only called from lift_trocars_reward. The three subsequent reward functions — trocar_tip_alignment_reward, trocar_insertion_reward, and trocar_placement_reward — call get_task_stage, which simply reads env._task_stage. This means the stage read by the later rewards is only up-to-date if lift_trocars_reward is evaluated first in every step.

Isaac Lab evaluates reward terms in the order they appear in RewardsCfg, which today happens to match (lift_trocars is declared first). However, this ordering is not enforced by the API, and any reordering of the dict (or framework change) will cause the later rewards to silently use a one-step-stale stage value, producing spurious or missed transition rewards.

Consider factoring update_task_stage into an independent event term (evaluated once per step before rewards) or at least adding a docstring/assertion that guards against incorrect evaluation order.

Comment on lines +40 to +49
_body_obs_cache = {
"device": None,
"batch": None,
"idx_t": None,
"idx_batch": None,
"pos_buf": None,
"vel_buf": None,
"torque_buf": None,
"combined_buf": None,
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Module-level observation caches will interfere across multiple environment instances

_body_obs_cache and _dex3_obs_cache are module-level globals keyed only on (device, batch_size). If more than one ManagerBasedRLEnv instance is created in the same process (e.g. training env + eval env, or two parallel environments with different batch sizes), the second instantiation will either evict the first env's cached index tensors and buffers (causing silent wrong results) or — worse — two environments sharing the same batch size will share a single pre-allocated output buffer, causing them to overwrite each other's results.

Consider scoping these caches to the environment instance (e.g. as attributes on env) rather than as module-level globals.

Comment on lines +135 to +151
trocar_2 = RigidObjectCfg(
prim_path="/World/envs/env_.*/trocar_2",
spawn=UsdFileCfg(
usd_path=(
f"{USD_ROOT}/LightWheel/Assets/"
"DisposableLaparoscopicPunctureDevice001/"
"DisposableLaparoscopicPunctureDevice005-xform.usd"
),
rigid_props=sim_utils.RigidBodyPropertiesCfg(
rigid_body_enabled=True,
disable_gravity=False,
),
),
init_state=RigidObjectCfg.InitialStateCfg(
rot=[-0.71475, -0.000243, 0.05853, 0.69692], pos=[-1.50635, 1.90997, 0.8631]
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 trocar_2 is missing explicit collision_props

trocar_1 has collision_props explicitly set (collision_enabled=True, contact_offset=0.001, rest_offset=-0.001) but trocar_2 does not. Since both trocars need to interact physically during the insertion stage, inconsistent collision settings could produce unreliable contact detection. Consider adding equivalent collision_props to trocar_2's UsdFileCfg:

spawn=UsdFileCfg(
    usd_path=(...),
    rigid_props=sim_utils.RigidBodyPropertiesCfg(...),
    collision_props=sim_utils.CollisionPropertiesCfg(
        collision_enabled=True,
        contact_offset=0.001,
        rest_offset=-0.001,
    ),
),

@mingxueg-nv mingxueg-nv changed the title Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup… Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension Mar 23, 2026
@Nic-Ma
Copy link
Copy Markdown

Nic-Ma commented Mar 25, 2026

Hi @kellyguo11 , could you please help review this PR?

Thanks in advance.

@isaaclab-review-bot
Copy link
Copy Markdown

🤖 Isaac Lab Review Bot

Summary: The trocar assembly task design is solid (4-stage sparse reward with clean stage progression), but there are real bugs and architectural issues that need addressing before merge. Request Changes — 10 findings, 1 runtime crash bug.

Findings (ranked by severity)

  1. 🔴 Bug (runtime crash)rewards.py:620: is_parallel.item() will crash with RuntimeError when num_envs > 1 because it's called on a (num_envs,) tensor. Should be is_parallel[0].item().

  2. 🔴 Deprecated APIevents.py:139-141,275: data.default_root_state is deprecated since IsaacLab 3.0 and will be removed. Should use default_root_pose / default_root_vel.

  3. 🟠 Fragile ordering dependencyrewards.py:239: update_task_stage() is only called from lift_trocars_reward(). The other 3 reward functions silently depend on it running first. If anyone reorders RewardsCfg fields, stage values will be stale.

  4. 🟠 Global mutable cache not multi-env safeobservations.py:43: Module-level _body_obs_cache / _dex3_obs_cache dicts will thrash if two env instances exist simultaneously (train + eval).

  5. 🟠 Monkey-patching private renderer stateextension.py:484: Directly mutates _rtx_utils._last_render_update_key private module state. Needs at minimum a comment explaining the workaround and ideally should be a supported API.

  6. 🟡 Task description mismatch — YAML:196: Eval init_params.task_description says "install trocar from box" but the reused *isaaclab_config anchor says "assemble trocar from tray".

  7. 🟡 License header inconsistency — All 10 new files have dual license headers (BSD-3-Clause + Apache-2.0), violating the project's single BSD-3-Clause standard.

  8. 🟡 Dead codereset_robot_to_default_joint_positions is exported but never used in any event config.

  9. 🟡 Hard gr00t dependencygr00t_config.py has top-level imports from gr00t package; needs guard/docs since it's in isaaclab_tasks.

  10. 🔵 Per-step tensor allocationrewards.py:404: tip_offset_local.unsqueeze(0).repeat(num_envs, 1) allocates a new GPU tensor every step unnecessarily. Cache it or use expand().

CI Status: No CI checks reported for this PR.

def _patched_reset(*args, **kwargs):
obs, extras = _original_reset(*args, **kwargs)
try:
import isaaclab_physx.renderers.isaac_rtx_renderer_utils as _rtx_utils
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this patch necessary? this is also restricting the renderer to RTX, is it not possible to use newton renderer as well?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — I agree this is worth scrutinizing.

Looking at the core ManagerBasedEnv.reset() (manager_based_env.py L378-380), it already handles the rerender warmup natively:

if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0:
    for _ in range(self.cfg.num_rerenders_on_reset):
        self.sim.render()

So the framework already re-renders on reset when RTX sensors are present and num_rerenders_on_reset > 0 — and it does so through self.sim.render(), which is renderer-agnostic.

This monkey-patch duplicates that logic but goes directly through RTX-specific internals (_rtx_utils._last_render_update_key), which:

  1. Restricts to RTX — as you noted, Newton renderer users would get no warmup (only the except ImportError: pass path)
  2. Accesses private internals_last_render_update_key is an implementation detail that could change
  3. Bypasses the framework's own render pathself.sim.render() already handles the render-update-key reset internally

If the built-in rerender-on-reset isn't sufficient for the RLinf use case (e.g., the env.reset() call from gym.make().unwrapped isn't triggering the warmup correctly), it would be better to investigate why and fix it at the framework level rather than patching around it here. @mingxueg-nv — is there a specific failure mode you're seeing that the existing num_rerenders_on_reset doesn't handle?

Copy link
Copy Markdown
Contributor Author

@mingxueg-nv mingxueg-nv Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kellyguo11 , during training and evaluation, the first frame of recorded videos shows the robot in its USD default pose instead of the environment's reset pose, the camera observation returned by env.reset() does not get the actual scene state.
I have already tried to set num_rerenders_on_reset, but it doesn't work:

  • In IsaacLab 2.3, sim.render() called self.forward() + _app.update() to flush data and trigger the full render pipeline.

  • In IsaacLab 3.0, this was removed, sim.render() now only updates visualizers. But ManagerBasedEnv.reset() still relies on sim.render() for num_rerenders_on_reset, making it a no-op.

I have removed the limit of render to RTX in patch, thanks.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation and the code references, @mingxueg-nv — that makes sense now.

I traced through the code and can confirm the issue:

  1. ManagerBasedEnv.reset() (L378-379) calls sim.render() inside the num_rerenders_on_reset loop.
  2. In headless mode, sim.render() hits the NO_RENDERING branch (L613-619) which throttles updates and skips forward() — so the scene data never actually gets flushed to the renderer after reset.
  3. This means the first camera observation still captures the USD default pose rather than the post-reset state.

Your updated patch looks good — using set_setting("/app/player/playSimulations", False) + _app.update() is the same pattern that sim.render() uses internally for full render mode, and the explicit sensor.update(dt=0.0, force_recompute=True) loop ensures cameras pick up the new state. This is now renderer-agnostic too, which addresses my earlier concern. 👍

One minor thought: this feels like it could eventually be a fix upstream in ManagerBasedEnv.reset() itself (e.g., using forward() + _app.update() instead of bare sim.render() when render_mode="rgb_array"), so that all camera-based envs benefit. But as a workaround in the RLinf extension, this is clean and well-scoped.

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0

# Licensed under the Apache License, Version 2.0 (the "License");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why we also need the apache license?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The rest of the Isaac Lab codebase uses only the BSD-3-Clause header (e.g., lift/config/__init__.py, stack/config/__init__.py, etc.). This file — and several other new files in this PR (rewards.py, observations.py, etc.) — have a dual header with both BSD-3-Clause and Apache-2.0, plus the full Apache license text block.

This looks like it may have been carried over from an internal NVIDIA template. Since this code is being contributed into the Isaac Lab repo, the standard BSD-3-Clause header should be sufficient. The Apache-2.0 block (lines 6-19) can likely be removed from all the new files to stay consistent with the rest of the project.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kellyguo11 , thanks for reminder, I have removed all duplicated and apache headers from all files.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — checked the latest commit and the headers look clean now. __init__.py, camera_config.py, etc. all have just the standard BSD-3-Clause block. Thanks for the quick fix! ✅

#
# SPDX-License-Identifier: BSD-3-Clause

# Copyright (c) 2022-2026, The Isaac Lab Project Developers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicated license headers

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed, thanks

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, confirmed the duplicate is gone. Thanks! 👍

export RLINF_DATA_CONFIG_CLASS="policy.gr00t_config:IsaacLabDataConfig"
"""

from gr00t.data.dataset import ModalityConfig
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will these imports get picked up as well if we are not running this task? we want to make sure users are not forced to install gr00t if they don't need to run this particular task

Copy link
Copy Markdown
Contributor Author

@mingxueg-nv mingxueg-nv Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @kellyguo11 , the config/__init__.py only imports camera_config and robot_config, gr00t_config.py is not reachable through any import chain. This module is only loaded at runtime when RLinf resolves the string gr00t_config:IsaacLabDataConfig from the YAML config, so users who don't run this task will not trigger the gr00t import. For users running this task, we provide required installation step in the RLinf setup docs.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation! Confirmed — config/__init__.py only imports camera_config and robot_config, so gr00t_config.py is indeed not reachable through any static import chain. The lazy-loading via RLinf string resolution (gr00t_config:IsaacLabDataConfig) keeps the gr00t dependency isolated to users who actually run this task. LGTM on this point.

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — Environment Design Consistency Review

Summary

This PR adds an assemble_trocar manipulation task for the Unitree G129 + Dex3 robot along with an RLinf rendering warmup fix. The task implements a multi-stage surgical trocar assembly: lift → tip-align → insert → place.

Overall assessment: The environment is functional and well-implemented for the task logic, but has significant deviations from established Isaac Lab environment patterns that need to be addressed before merging into isaaclab_tasks. Below is a comprehensive comparison against reference environments (lift, stack, reach, pick_place).


1. Directory Structure

Reference pattern (lift/stack/reach):

task_name/
├── __init__.py           ← empty or minimal, no registrations
├── task_env_cfg.py       ← base/abstract env cfg with MISSING fields
├── mdp/
│   ├── __init__.py       ← uses lazy_export()
│   ├── observations.py
│   ├── rewards.py
│   └── terminations.py
└── config/
    ├── __init__.py       ← empty
    └── robot_name/
        ├── __init__.py   ← gym.register() calls here
        ├── agents/       ← RL agent configs (rsl_rl, skrl, sb3, etc.)
        └── joint_pos_env_cfg.py  ← concrete env cfg

New env structure:

assemble_trocar/
├── __init__.py           ← ❌ gym.register() here (should be in config/g129_dex3/)
├── g129_dex3_env_cfg.py  ← ❌ concrete cfg in task root (should be under config/g129_dex3/)
├── mdp/
│   ├── __init__.py       ← ❌ manual imports (should use lazy_export())
│   ├── events.py         ← ✅ (lift doesn't have this, but it's fine)
│   ├── observations.py   ← ✅
│   ├── rewards.py        ← ✅
│   └── terminations.py   ← ✅
└── config/
    ├── __init__.py       ← ❌ exports robot/camera presets (not registrations)
    ├── camera_config.py  ← ❌ should be in task config, not generic config
    ├── gr00t_config.py   ← ❌ GR00T-specific, has hard external dependency
    ├── isaaclab_ppo_gr00t_assemble_trocar.yaml ← ❌ RLinf training config doesn't belong here
    └── robot_config.py   ← ❌ should use isaaclab_assets or be task-level config

Key deviations:

  1. gym.register() should be in config/g129_dex3/__init__.py, not the task root __init__.py
  2. The concrete env cfg should be under config/g129_dex3/, not the task root
  3. The task root __init__.py should be empty/minimal (like lift/stack)
  4. Missing the base/abstract env cfg pattern (lift has LiftEnvCfg with MISSING fields)
  5. No agents/ directory with standard RL training configs

2. Environment Registration

Reference pattern (Isaac-Lift-Cube-Franka-v0):

# In config/franka/__init__.py
gym.register(
    id="Isaac-Lift-Cube-Franka-v0",
    entry_point="isaaclab.envs:ManagerBasedRLEnv",
    kwargs={
        "env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:FrankaCubeLiftEnvCfg",  # STRING reference
        "rsl_rl_cfg_entry_point": ...,  # agent configs
        "skrl_cfg_entry_point": ...,
    },
    disable_env_checker=True,
)

New env:

# In __init__.py (wrong location!)
gym.register(
    id="Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0",
    kwargs={
        "env_cfg_entry_point": g129_dex3_env_cfg.G1AssembleTrocarEnvCfg,  # DIRECT CLASS reference
    },
)

Issues:

  1. Registration location: Should be in config/g129_dex3/__init__.py, not the task-level __init__.py
  2. Entry point format: Uses direct class reference instead of string f"{__name__}.module:ClassName". While this works, it's inconsistent with every other env in the repo
  3. Naming convention: Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0 includes "RLinf" — task names should be framework-agnostic (e.g., Isaac-Assemble-Trocar-G129-Dex3-v0)
  4. Missing agent configs: No rsl_rl_cfg_entry_point, skrl_cfg_entry_point, etc. — even if only RLinf is used initially, the structure should support standard Isaac Lab training workflows

3. License Headers

Reference pattern (all Isaac Lab files):

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-Clause

New env files have DUAL license headers:

# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)  # ← Isaac Lab header
# SPDX-License-Identifier: BSD-3-Clause

# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION      # ← Apache-2.0 header  
# SPDX-License-Identifier: Apache-2.0
# Licensed under the Apache License, Version 2.0 (the "License")...

This creates a license conflict. Files in isaaclab_tasks should use only the BSD-3-Clause header. The Apache-2.0 header should be removed from all files.


4. Scene Configuration

Reference pattern (lift ObjectTableSceneCfg):

@configclass
class ObjectTableSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = MISSING          # ← abstract, set by concrete cfg
    ee_frame: FrameTransformerCfg = MISSING   # ← abstract
    object: RigidObjectCfg = MISSING          # ← abstract
    table = AssetBaseCfg(...)                 # ← concrete
    plane = AssetBaseCfg(prim_path="/World/GroundPlane", ...)  # ← ground plane
    light = AssetBaseCfg(prim_path="/World/light", ...)

New env AssembleTrocarSceneCfg:

@configclass
class AssembleTrocarSceneCfg(InteractiveSceneCfg):
    robot: ArticulationCfg = G1RobotPresets.g1_29dof_dex3_base_fix(...)  # ← concrete, not abstract
    scene = AssetBaseCfg(...)   # ← "scene" is a very generic name
    trocar_1 = RigidObjectCfg(...)
    trocar_2 = RigidObjectCfg(...)
    tray = ArticulationCfg(...)
    light = AssetBaseCfg(...)   # ← no ground plane!

Issues:

  1. No ground plane — All reference envs define a ground plane (/World/GroundPlane). The new env relies on the scene USD having one, but this should be explicit
  2. No abstract base — Robot is hardcoded instead of using MISSING pattern for robot-agnostic task definition
  3. Field namingscene is a poor name for a scene entity (shadowing the InteractiveSceneCfg concept); should be something descriptive like operating_table or workspace
  4. Camera configs as class attributesfront_camera, left_wrist_camera, right_wrist_camera are set via classmethod calls rather than declarative config instances
  5. tray uses ArticulationCfg with empty actuators — This is a passive prop; RigidObjectCfg would be more semantically correct (or AssetBaseCfg if truly passive)

5. Environment Configuration

Reference pattern (lift LiftEnvCfg):

@configclass
class LiftEnvCfg(ManagerBasedRLEnvCfg):
    scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    commands: CommandsCfg = CommandsCfg()
    rewards: RewardsCfg = RewardsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events: EventCfg = EventCfg()        # ← type-annotated
    curriculum: CurriculumCfg = CurriculumCfg()

New env:

@configclass
class G1AssembleTrocarEnvCfg(ManagerBasedRLEnvCfg):
    scene: AssembleTrocarSceneCfg = AssembleTrocarSceneCfg(num_envs=1, ...)
    observations: ObservationsCfg = ObservationsCfg()
    actions: ActionsCfg = ActionsCfg()
    terminations: TerminationsCfg = TerminationsCfg()
    events = EventCfg()     # ← missing type annotation!
    commands = None          # ← None instead of omitting
    rewards: RewardsCfg = RewardsCfg()
    curriculum = None        # ← None instead of omitting

Issues:

  1. events = EventCfg() missing type annotation — Should be events: EventCfg = EventCfg()
  2. num_envs=1 — Reference envs default to 4096. Setting 1 as default seems wrong for batch training
  3. commands = None and curriculum = None — These are already None by default in ManagerBasedRLEnvCfg, so explicitly setting them is unnecessary and clutters the config
  4. episode_length_s = 20.0 — This is set in __post_init__ but MISSING in the base class, so it must be set. This is fine, but notably longer than lift (5.0s)
  5. Render settings in __post_init__self.sim.render.enable_translucency, self.sim.render.carb_settings, rendering_mode = "quality", antialiasing_mode = "DLAA" — these are unusual and task-specific. No other reference env sets these. Consider whether these belong in the env config or should be runtime overrides

6. MDP Module (mdp/)

Reference pattern (lazy_export()):

# mdp/__init__.py
from isaaclab.utils.module import lazy_export
lazy_export()

New env:

# mdp/__init__.py
from isaaclab.envs.mdp import JointPositionActionCfg, time_out
from .events import ...
from .observations import ...
from .rewards import ...
from .terminations import ...
__all__ = [...]

Issues:

  1. Should use lazy_export() — This is the standard pattern in Isaac Lab mdp modules. It handles star imports and avoids manual __all__ maintenance
  2. Re-exports from isaaclab.envs.mdpJointPositionActionCfg and time_out should be imported directly where used, not re-exported through the task's mdp package

7. Observations

Reference pattern (lift):

joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object_position = ObsTerm(func=mdp.object_position_in_robot_root_frame)

New env:

robot_joint_state = ObsTerm(func=mdp.get_robot_body_joint_states)      # 87 dims (29*3)
robot_dex3_joint_state = ObsTerm(func=mdp.get_robot_dex3_joint_states) # 14 dims

Issues:

  1. Hardcoded joint indices — The observation functions use magic numbers for joint reordering (body_joint_indices = [0, 3, 6, 9, ...]). This is fragile and couples the observation to a specific USD joint order. Consider using joint_names_expr patterns instead
  2. Global mutable state_body_obs_cache and _dex3_obs_cache are module-level mutable dicts. This breaks if multiple envs with different configurations run in the same process. The cache should be per-environment instance
  3. No object observations — The env doesn't observe trocar positions/orientations in the policy group, only joint states. This means the policy can only use camera images for object state — valid for visuomotor policies, but unusual for the policy observation group
  4. concatenate_terms = False — This is intentional for visuomotor policies but worth documenting explicitly

8. Reward Functions

Reference pattern (lift):

reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.04}, weight=15.0)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)

New env (rewards.py — 736 lines!):

  • Complex multi-stage state machine (_task_stage) stored on env instance
  • Reward locking mechanism with _lift_reward_locked, _tip_reward_locked, etc.
  • Debug printing with should_print_debug()
  • update_task_stage() called inside first reward function (lift_trocars_reward)

Issues:

  1. Reward functions have side effectslift_trocars_reward() calls update_task_stage() which modifies env._task_stage. The reward manager doesn't guarantee call order, so this is fragile. Stage updates should be in events or a dedicated pre-step hook, not buried in the first reward function
  2. Storing state on env via setattrenv._task_stage, env._prev_stage_*, env._lift_reward_locked, etc. are monkey-patched onto the env instance. This is the wrong pattern — Isaac Lab provides the observation/state manager for this, or custom attributes should be in the env cfg/env class
  3. reward / env.step_dt scaling — In sparse reward mode, rewards are divided by env.step_dt. This is unusual and couples the reward magnitude to the physics timestep
  4. No regularization rewards — Reference envs include action_rate_l2 and joint_vel_l2 penalties. This env has none, which may cause jerky motions
  5. USD stage access in reward functionsget_trocar_tip_position() accesses the USD stage via pxr API to compute tip offsets. This is extremely unusual for reward functions and may have performance implications. The offset should be precomputed in scene setup or the env __init__

9. External Dependencies

config/gr00t_config.py imports:

from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
...
from gr00t.experiment.data_config import DATA_CONFIG_MAP, BaseDataConfig
from gr00t.model.transforms import GR00TTransform

Issues:

  1. gr00t is NOT a dependency of isaaclab_tasks — This file will cause import errors for anyone installing Isaac Lab normally. It should either:
    • Be moved to isaaclab_contrib (where RLinf lives)
    • Be guarded with try/except ImportError
    • Not be in the task directory at all
  2. The YAML config file (isaaclab_ppo_gr00t_assemble_trocar.yaml) is RLinf-specific training configuration with hardcoded checkpoint paths (/mnt/ckpt/...). This doesn't belong in isaaclab_tasks

10. RLinf Extension Changes (rendering warmup)

The monkey-patching approach:

_original_reset = env.reset
def _patched_reset(*args, **kwargs):
    obs, extras = _original_reset(*args, **kwargs)
    # force render warmup
    for _ in range(_n_warmup):
        _rtx_utils._last_render_update_key = (0, -1)  # ← private API access
        env.sim.set_setting("/app/player/playSimulations", False)
        _app.update()
        env.sim.set_setting("/app/player/playSimulations", True)
        env.scene.update(dt=0)
    obs = env.observation_manager.compute(update_history=True)
    env.obs_buf = obs
    return obs, extras
env.reset = _patched_reset

Issues:

  1. Accessing private API_rtx_utils._last_render_update_key = (0, -1) reaches into a private attribute of isaac_rtx_renderer_utils. This will break when the renderer internals change
  2. env.obs_buf = obs override — Directly setting the observation buffer bypasses any safety checks or hooks
  3. The right approach — Isaac Lab already has num_rerenders_on_reset in ManagerBasedEnvCfg. If this doesn't work correctly for the RLinf use case, the fix should be in the core ManagerBasedEnv.reset() method, not monkey-patched from the extension. File an issue or fix it upstream
  4. Why not just set env.cfg.num_rerenders_on_reset? — The env config already supports this. If it's not working correctly, that's a bug to fix in the env, not to patch around

Specific Issues (Numbered)

  1. Dual license headers — All new files have both BSD-3-Clause and Apache-2.0 headers. Remove Apache-2.0 headers
  2. __init__.py has duplicate copyright block — Lines 1-4 and 6-8 are both copyright notices
  3. gym.register() in wrong location — Move to config/g129_dex3/__init__.py
  4. Direct class reference for env_cfg_entry_point — Use string format f"{__name__}.module:Class"
  5. "RLinf" in env name — Remove framework-specific branding from gym ID
  6. Missing agents/ directory — Add standard RL agent configs
  7. events missing type annotation — Add events: EventCfg = EventCfg()
  8. num_envs=1 default — Should be 4096 or at least a reasonable batch size
  9. scene field name in scene cfg — Rename to something descriptive
  10. No ground plane — Add explicit ground plane config
  11. mdp/__init__.py should use lazy_export() — Follow standard pattern
  12. Module-level mutable observation cache — Move to per-env instance
  13. Reward side effects (stage machine) — Move update_task_stage to event system
  14. setattr monkey-patching on env instance — Use proper state management
  15. gr00t_config.py imports unavailable package — Move to isaaclab_contrib or guard imports
  16. RLinf YAML with hardcoded paths — Remove from isaaclab_tasks
  17. USD stage access in reward function — Precompute tip offsets during init
  18. No regularization rewards — Add action_rate and joint_vel penalties
  19. Missing num_rerenders_on_reset in env cfg — Set this instead of monkey-patching in RLinf
  20. tray uses ArticulationCfg with empty actuators — Use RigidObjectCfg or AssetBaseCfg

Recommendations (Priority Order)

P0 — Must fix before merge:

  1. Remove dual license headers; use only BSD-3-Clause
  2. Move gr00t_config.py and isaaclab_ppo_gr00t_assemble_trocar.yaml out of isaaclab_tasks (to isaaclab_contrib or a separate package)
  3. Restructure directory to match standard layout: registrations in config/g129_dex3/__init__.py
  4. Fix env_cfg_entry_point to use string format
  5. Remove "RLinf" from gym env IDs
  6. Set num_rerenders_on_reset in the env cfg instead of monkey-patching reset

P1 — Should fix:
7. Add type annotation for events
8. Use lazy_export() in mdp/__init__.py
9. Move stage state machine out of reward functions into events
10. Add ground plane to scene config
11. Fix default num_envs to a reasonable batch size
12. Add action rate / joint velocity regularization rewards

P2 — Nice to have:
13. Create abstract base env cfg with MISSING robot field
14. Move observation cache from module-level to env instance
15. Precompute trocar tip offsets during env initialization
16. Add agents/ directory with at least one standard training config
17. Rename scene field in AssembleTrocarSceneCfg to something descriptive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants