Add assemble_trocar task for G129-Dex3 and fix reset rendering warmup for RLinf extension#5082
Conversation
… in rlinf extension Made-with: Cursor
Greptile SummaryThis PR adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3) robot, including a 4-stage sparse MDP (lift → tip-align → insert → place), three TiledCamera presets for GR00T visual input, robot articulation config, a GR00T data config, and a minor extension patch that adds a post-reset render-warmup to fix stale camera frames under RTX rendering. Key issues found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["Episode Start - Stage 0"] -->|"Both trocars lifted above table"| B["Stage 1: Lifted\nlift_trocars_reward fires"]
B -->|"Tip distance < threshold"| C["Stage 2: Tips Aligned\ntrocar_tip_alignment_reward fires"]
C -->|"Parallel AND centers close"| D["Stage 3: Inserted\ntrocar_insertion_reward fires"]
D -->|"Both in target zone AND below z_min"| E["Stage 4: Placed\ntrocar_placement_reward fires"]
E --> F["task_success_termination - Episode Done"]
A -->|"Timeout 20s"| G["time_out termination"]
A -->|"Object z below 0.5m"| H["object_drop_termination"]
B --> H
C --> H
D --> H
F --> I["Reset Events"]
G --> I
H --> I
I --> R1["reset_scene_to_default"]
I --> R2["reset_task_stage - clears caches"]
I --> R3["reset_tray_with_random_rotation"]
R1 --> A
R2 --> A
R3 --> A
|
...aclab_tasks/isaaclab_tasks/manager_based/manipulation/assemble_trocar/config/robot_config.py
Outdated
Show resolved
Hide resolved
| stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1) | ||
| reward = torch.where( | ||
| stage_just_completed, | ||
| torch.ones(env.num_envs, device=env.device) / env.step_dt, | ||
| torch.zeros(env.num_envs, device=env.device), | ||
| ) | ||
|
|
There was a problem hiding this comment.
Sparse reward is 50× larger than documented
The RewardsCfg docstring and inline comments state each stage gives a reward of 1.0 on completion. However, the actual value emitted is 1.0 / env.step_dt. With decimation=4 and sim.dt=1/200, step_dt = 0.02 s, so the actual reward per transition step is 50.0, not 1.0. The same pattern is repeated in trocar_tip_alignment_reward (line 456), trocar_insertion_reward (line 570), and trocar_placement_reward (line 696).
If the intent is to produce a reward equivalent to "50 reward-units delivered over one policy step" (which is a valid design choice for sparse rewards), the comments and RewardsCfg docstring need to be updated to reflect this. If the intent is to give exactly 1.0, the division by step_dt should be removed.
| stage_just_completed = (env._prev_stage_lift == 0) & (stage >= 1) | |
| reward = torch.where( | |
| stage_just_completed, | |
| torch.ones(env.num_envs, device=env.device) / env.step_dt, | |
| torch.zeros(env.num_envs, device=env.device), | |
| ) | |
| reward = torch.where( | |
| stage_just_completed, | |
| torch.ones(env.num_envs, device=env.device), | |
| torch.zeros(env.num_envs, device=env.device), | |
| ) |
| def update_task_stage( | ||
| env: ManagerBasedRLEnv, | ||
| asset_cfg1: SceneEntityCfg, | ||
| asset_cfg2: SceneEntityCfg, | ||
| table_height: float = 0.85483, | ||
| lift_threshold: float = 0.05, | ||
| tip_align_threshold: float = 0.015, | ||
| insertion_dist_threshold: float = 0.03, | ||
| insertion_angle_threshold: float = 0.15, | ||
| placement_x_min: float = -1.8, | ||
| placement_x_max: float = -1.4, | ||
| placement_y_min: float = 1.5, | ||
| placement_y_max: float = 1.8, | ||
| placement_z_min: float = 0.9, | ||
| print_log: bool = False, | ||
| ) -> torch.Tensor: | ||
| """Update task stage based on current state. | ||
|
|
||
| This function checks conditions and advances stages automatically. | ||
| Once a stage is completed, it never goes back. | ||
| """ | ||
| stage = get_task_stage(env) | ||
|
|
||
| obj1: RigidObject = env.scene[asset_cfg1.name] | ||
| obj2: RigidObject = env.scene[asset_cfg2.name] | ||
|
|
||
| pos1 = wp.to_torch(obj1.data.root_pos_w) | ||
| pos2 = wp.to_torch(obj2.data.root_pos_w) | ||
| quat1 = wp.to_torch(obj1.data.root_quat_w) | ||
| quat2 = wp.to_torch(obj2.data.root_quat_w) | ||
|
|
||
| # Store old stage to detect changes (BEFORE any stage transitions) | ||
| old_stage = stage.clone() | ||
|
|
||
| # Stage 0 -> 1: Check if lifted | ||
| target_z = table_height + lift_threshold | ||
| is_lifted_1 = pos1[:, 2] > target_z | ||
| is_lifted_2 = pos2[:, 2] > target_z | ||
| both_lifted = is_lifted_1 & is_lifted_2 | ||
| stage = torch.where((stage == 0) & both_lifted, torch.ones_like(stage), stage) | ||
|
|
||
| # Stage 1 -> 2: Check if tips are aligned (hole found) | ||
| # Get tip positions | ||
| tip_pos1 = get_trocar_tip_position(env, asset_cfg1) | ||
| tip_pos2 = get_trocar_tip_position(env, asset_cfg2) | ||
| tip_dist = torch.norm(tip_pos1 - tip_pos2, dim=-1) | ||
|
|
||
| # Tip alignment success | ||
| tip_aligned = tip_dist < tip_align_threshold | ||
| stage = torch.where((stage == 1) & tip_aligned, torch.full_like(stage, 2), stage) | ||
|
|
||
| # Stage 2 -> 3: Check if inserted (parallel + center close) | ||
| # Get center distance | ||
| center_dist = torch.norm(pos1 - pos2, dim=-1) | ||
|
|
||
| # Check alignment | ||
| target_axis1 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1) | ||
| target_axis2 = torch.tensor([0.0, 0.0, -1.0], device=env.device).repeat(env.num_envs, 1) | ||
| axis1 = quat_apply(quat1, target_axis1) | ||
| axis2 = quat_apply(quat2, target_axis2) | ||
| dot_prod = torch.sum(axis1 * axis2, dim=-1) | ||
| abs_dot = torch.clamp(torch.abs(dot_prod), max=1.0) | ||
| angle = torch.acos(abs_dot) | ||
|
|
||
| # Insertion success: parallel + center close | ||
| is_parallel = angle < insertion_angle_threshold | ||
| center_close = center_dist < insertion_dist_threshold | ||
| is_inserted = is_parallel & center_close | ||
|
|
||
| stage = torch.where((stage == 2) & is_inserted, torch.full_like(stage, 3), stage) | ||
|
|
||
| # Stage 3 -> 4: Check if placed in target zone | ||
| # Get environment origins to handle multi-env spatial offsets | ||
| env_origins = env.scene.env_origins # shape: (num_envs, 3) | ||
|
|
||
| # Adjust target zone relative to each environment's origin | ||
| curr_x_min = env_origins[:, 0] + min(placement_x_min, placement_x_max) # (num_envs,) | ||
| curr_x_max = env_origins[:, 0] + max(placement_x_min, placement_x_max) | ||
| curr_y_min = env_origins[:, 1] + min(placement_y_min, placement_y_max) | ||
| curr_y_max = env_origins[:, 1] + max(placement_y_min, placement_y_max) | ||
|
|
||
| in_zone_1 = ( | ||
| (pos1[:, 0] >= curr_x_min) | ||
| & (pos1[:, 0] <= curr_x_max) | ||
| & (pos1[:, 1] >= curr_y_min) | ||
| & (pos1[:, 1] <= curr_y_max) | ||
| & (pos1[:, 2] < placement_z_min) | ||
| ) | ||
| in_zone_2 = ( | ||
| (pos2[:, 0] >= curr_x_min) | ||
| & (pos2[:, 0] <= curr_x_max) | ||
| & (pos2[:, 1] >= curr_y_min) | ||
| & (pos2[:, 1] <= curr_y_max) | ||
| & (pos2[:, 2] < placement_z_min) | ||
| ) | ||
| both_in_zone = in_zone_1 & in_zone_2 | ||
| stage = torch.where((stage == 3) & both_in_zone, torch.full_like(stage, 4), stage) | ||
|
|
||
| # Print stage transitions (AFTER all stage transitions - always print when stage changes) | ||
| if print_log and (stage != old_stage).any(): | ||
| for env_id in range(env.num_envs): | ||
| if stage[env_id] != old_stage[env_id]: | ||
| print(f"Env {env_id}: Stage {old_stage[env_id].item()} → {stage[env_id].item()}") | ||
|
|
||
| env._task_stage = stage | ||
| return stage |
There was a problem hiding this comment.
Stage advancement depends on implicit reward evaluation order
update_task_stage (which advances env._task_stage) is only called from lift_trocars_reward. The three subsequent reward functions — trocar_tip_alignment_reward, trocar_insertion_reward, and trocar_placement_reward — call get_task_stage, which simply reads env._task_stage. This means the stage read by the later rewards is only up-to-date if lift_trocars_reward is evaluated first in every step.
Isaac Lab evaluates reward terms in the order they appear in RewardsCfg, which today happens to match (lift_trocars is declared first). However, this ordering is not enforced by the API, and any reordering of the dict (or framework change) will cause the later rewards to silently use a one-step-stale stage value, producing spurious or missed transition rewards.
Consider factoring update_task_stage into an independent event term (evaluated once per step before rewards) or at least adding a docstring/assertion that guards against incorrect evaluation order.
| _body_obs_cache = { | ||
| "device": None, | ||
| "batch": None, | ||
| "idx_t": None, | ||
| "idx_batch": None, | ||
| "pos_buf": None, | ||
| "vel_buf": None, | ||
| "torque_buf": None, | ||
| "combined_buf": None, | ||
| } |
There was a problem hiding this comment.
Module-level observation caches will interfere across multiple environment instances
_body_obs_cache and _dex3_obs_cache are module-level globals keyed only on (device, batch_size). If more than one ManagerBasedRLEnv instance is created in the same process (e.g. training env + eval env, or two parallel environments with different batch sizes), the second instantiation will either evict the first env's cached index tensors and buffers (causing silent wrong results) or — worse — two environments sharing the same batch size will share a single pre-allocated output buffer, causing them to overwrite each other's results.
Consider scoping these caches to the environment instance (e.g. as attributes on env) rather than as module-level globals.
| trocar_2 = RigidObjectCfg( | ||
| prim_path="/World/envs/env_.*/trocar_2", | ||
| spawn=UsdFileCfg( | ||
| usd_path=( | ||
| f"{USD_ROOT}/LightWheel/Assets/" | ||
| "DisposableLaparoscopicPunctureDevice001/" | ||
| "DisposableLaparoscopicPunctureDevice005-xform.usd" | ||
| ), | ||
| rigid_props=sim_utils.RigidBodyPropertiesCfg( | ||
| rigid_body_enabled=True, | ||
| disable_gravity=False, | ||
| ), | ||
| ), | ||
| init_state=RigidObjectCfg.InitialStateCfg( | ||
| rot=[-0.71475, -0.000243, 0.05853, 0.69692], pos=[-1.50635, 1.90997, 0.8631] | ||
| ), | ||
| ) |
There was a problem hiding this comment.
trocar_2 is missing explicit collision_props
trocar_1 has collision_props explicitly set (collision_enabled=True, contact_offset=0.001, rest_offset=-0.001) but trocar_2 does not. Since both trocars need to interact physically during the insertion stage, inconsistent collision settings could produce unreliable contact detection. Consider adding equivalent collision_props to trocar_2's UsdFileCfg:
spawn=UsdFileCfg(
usd_path=(...),
rigid_props=sim_utils.RigidBodyPropertiesCfg(...),
collision_props=sim_utils.CollisionPropertiesCfg(
collision_enabled=True,
contact_offset=0.001,
rest_offset=-0.001,
),
),|
Hi @kellyguo11 , could you please help review this PR? Thanks in advance. |
🤖 Isaac Lab Review BotSummary: The trocar assembly task design is solid (4-stage sparse reward with clean stage progression), but there are real bugs and architectural issues that need addressing before merge. Request Changes — 10 findings, 1 runtime crash bug. Findings (ranked by severity)
CI Status: No CI checks reported for this PR. |
| def _patched_reset(*args, **kwargs): | ||
| obs, extras = _original_reset(*args, **kwargs) | ||
| try: | ||
| import isaaclab_physx.renderers.isaac_rtx_renderer_utils as _rtx_utils |
There was a problem hiding this comment.
why is this patch necessary? this is also restricting the renderer to RTX, is it not possible to use newton renderer as well?
There was a problem hiding this comment.
Good question — I agree this is worth scrutinizing.
Looking at the core ManagerBasedEnv.reset() (manager_based_env.py L378-380), it already handles the rerender warmup natively:
if self.sim.has_rtx_sensors() and self.cfg.num_rerenders_on_reset > 0:
for _ in range(self.cfg.num_rerenders_on_reset):
self.sim.render()So the framework already re-renders on reset when RTX sensors are present and num_rerenders_on_reset > 0 — and it does so through self.sim.render(), which is renderer-agnostic.
This monkey-patch duplicates that logic but goes directly through RTX-specific internals (_rtx_utils._last_render_update_key), which:
- Restricts to RTX — as you noted, Newton renderer users would get no warmup (only the
except ImportError: passpath) - Accesses private internals —
_last_render_update_keyis an implementation detail that could change - Bypasses the framework's own render path —
self.sim.render()already handles the render-update-key reset internally
If the built-in rerender-on-reset isn't sufficient for the RLinf use case (e.g., the env.reset() call from gym.make().unwrapped isn't triggering the warmup correctly), it would be better to investigate why and fix it at the framework level rather than patching around it here. @mingxueg-nv — is there a specific failure mode you're seeing that the existing num_rerenders_on_reset doesn't handle?
There was a problem hiding this comment.
Hi, @kellyguo11 , during training and evaluation, the first frame of recorded videos shows the robot in its USD default pose instead of the environment's reset pose, the camera observation returned by env.reset() does not get the actual scene state.
I have already tried to set num_rerenders_on_reset, but it doesn't work:
-
In IsaacLab 2.3, sim.render() called
self.forward() + _app.update()to flush data and trigger the full render pipeline. -
In IsaacLab 3.0, this was removed, sim.render() now only updates visualizers. But ManagerBasedEnv.reset() still relies on
sim.render()fornum_rerenders_on_reset, making it a no-op.
I have removed the limit of render to RTX in patch, thanks.
There was a problem hiding this comment.
Thanks for the detailed explanation and the code references, @mingxueg-nv — that makes sense now.
I traced through the code and can confirm the issue:
ManagerBasedEnv.reset()(L378-379) callssim.render()inside thenum_rerenders_on_resetloop.- In headless mode,
sim.render()hits theNO_RENDERINGbranch (L613-619) which throttles updates and skipsforward()— so the scene data never actually gets flushed to the renderer after reset. - This means the first camera observation still captures the USD default pose rather than the post-reset state.
Your updated patch looks good — using set_setting("/app/player/playSimulations", False) + _app.update() is the same pattern that sim.render() uses internally for full render mode, and the explicit sensor.update(dt=0.0, force_recompute=True) loop ensures cameras pick up the new state. This is now renderer-agnostic too, which addresses my earlier concern. 👍
One minor thought: this feels like it could eventually be a fix upstream in ManagerBasedEnv.reset() itself (e.g., using forward() + _app.update() instead of bare sim.render() when render_mode="rgb_array"), so that all camera-based envs benefit. But as a workaround in the RLinf extension, this is clean and well-scoped.
| # SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
| # SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| # Licensed under the Apache License, Version 2.0 (the "License"); |
There was a problem hiding this comment.
any reason why we also need the apache license?
There was a problem hiding this comment.
Good catch. The rest of the Isaac Lab codebase uses only the BSD-3-Clause header (e.g., lift/config/__init__.py, stack/config/__init__.py, etc.). This file — and several other new files in this PR (rewards.py, observations.py, etc.) — have a dual header with both BSD-3-Clause and Apache-2.0, plus the full Apache license text block.
This looks like it may have been carried over from an internal NVIDIA template. Since this code is being contributed into the Isaac Lab repo, the standard BSD-3-Clause header should be sufficient. The Apache-2.0 block (lines 6-19) can likely be removed from all the new files to stay consistent with the rest of the project.
There was a problem hiding this comment.
@kellyguo11 , thanks for reminder, I have removed all duplicated and apache headers from all files.
There was a problem hiding this comment.
Confirmed — checked the latest commit and the headers look clean now. __init__.py, camera_config.py, etc. all have just the standard BSD-3-Clause block. Thanks for the quick fix! ✅
| # | ||
| # SPDX-License-Identifier: BSD-3-Clause | ||
|
|
||
| # Copyright (c) 2022-2026, The Isaac Lab Project Developers. |
There was a problem hiding this comment.
duplicated license headers
There was a problem hiding this comment.
Have removed, thanks
There was a problem hiding this comment.
Looks good, confirmed the duplicate is gone. Thanks! 👍
| export RLINF_DATA_CONFIG_CLASS="policy.gr00t_config:IsaacLabDataConfig" | ||
| """ | ||
|
|
||
| from gr00t.data.dataset import ModalityConfig |
There was a problem hiding this comment.
will these imports get picked up as well if we are not running this task? we want to make sure users are not forced to install gr00t if they don't need to run this particular task
There was a problem hiding this comment.
Hi, @kellyguo11 , the config/__init__.py only imports camera_config and robot_config, gr00t_config.py is not reachable through any import chain. This module is only loaded at runtime when RLinf resolves the string gr00t_config:IsaacLabDataConfig from the YAML config, so users who don't run this task will not trigger the gr00t import. For users running this task, we provide required installation step in the RLinf setup docs.
There was a problem hiding this comment.
Thanks for the explanation! Confirmed — config/__init__.py only imports camera_config and robot_config, so gr00t_config.py is indeed not reachable through any static import chain. The lazy-loading via RLinf string resolution (gr00t_config:IsaacLabDataConfig) keeps the gr00t dependency isolated to users who actually run this task. LGTM on this point.
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Environment Design Consistency Review
Summary
This PR adds an assemble_trocar manipulation task for the Unitree G129 + Dex3 robot along with an RLinf rendering warmup fix. The task implements a multi-stage surgical trocar assembly: lift → tip-align → insert → place.
Overall assessment: The environment is functional and well-implemented for the task logic, but has significant deviations from established Isaac Lab environment patterns that need to be addressed before merging into isaaclab_tasks. Below is a comprehensive comparison against reference environments (lift, stack, reach, pick_place).
1. Directory Structure
Reference pattern (lift/stack/reach):
task_name/
├── __init__.py ← empty or minimal, no registrations
├── task_env_cfg.py ← base/abstract env cfg with MISSING fields
├── mdp/
│ ├── __init__.py ← uses lazy_export()
│ ├── observations.py
│ ├── rewards.py
│ └── terminations.py
└── config/
├── __init__.py ← empty
└── robot_name/
├── __init__.py ← gym.register() calls here
├── agents/ ← RL agent configs (rsl_rl, skrl, sb3, etc.)
└── joint_pos_env_cfg.py ← concrete env cfg
New env structure:
assemble_trocar/
├── __init__.py ← ❌ gym.register() here (should be in config/g129_dex3/)
├── g129_dex3_env_cfg.py ← ❌ concrete cfg in task root (should be under config/g129_dex3/)
├── mdp/
│ ├── __init__.py ← ❌ manual imports (should use lazy_export())
│ ├── events.py ← ✅ (lift doesn't have this, but it's fine)
│ ├── observations.py ← ✅
│ ├── rewards.py ← ✅
│ └── terminations.py ← ✅
└── config/
├── __init__.py ← ❌ exports robot/camera presets (not registrations)
├── camera_config.py ← ❌ should be in task config, not generic config
├── gr00t_config.py ← ❌ GR00T-specific, has hard external dependency
├── isaaclab_ppo_gr00t_assemble_trocar.yaml ← ❌ RLinf training config doesn't belong here
└── robot_config.py ← ❌ should use isaaclab_assets or be task-level config
Key deviations:
gym.register()should be inconfig/g129_dex3/__init__.py, not the task root__init__.py- The concrete env cfg should be under
config/g129_dex3/, not the task root - The task root
__init__.pyshould be empty/minimal (like lift/stack) - Missing the base/abstract env cfg pattern (lift has
LiftEnvCfgwithMISSINGfields) - No
agents/directory with standard RL training configs
2. Environment Registration
Reference pattern (Isaac-Lift-Cube-Franka-v0):
# In config/franka/__init__.py
gym.register(
id="Isaac-Lift-Cube-Franka-v0",
entry_point="isaaclab.envs:ManagerBasedRLEnv",
kwargs={
"env_cfg_entry_point": f"{__name__}.joint_pos_env_cfg:FrankaCubeLiftEnvCfg", # STRING reference
"rsl_rl_cfg_entry_point": ..., # agent configs
"skrl_cfg_entry_point": ...,
},
disable_env_checker=True,
)New env:
# In __init__.py (wrong location!)
gym.register(
id="Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0",
kwargs={
"env_cfg_entry_point": g129_dex3_env_cfg.G1AssembleTrocarEnvCfg, # DIRECT CLASS reference
},
)Issues:
- Registration location: Should be in
config/g129_dex3/__init__.py, not the task-level__init__.py - Entry point format: Uses direct class reference instead of string
f"{__name__}.module:ClassName". While this works, it's inconsistent with every other env in the repo - Naming convention:
Isaac-Assemble-Trocar-G129-Dex3-RLinf-v0includes "RLinf" — task names should be framework-agnostic (e.g.,Isaac-Assemble-Trocar-G129-Dex3-v0) - Missing agent configs: No
rsl_rl_cfg_entry_point,skrl_cfg_entry_point, etc. — even if only RLinf is used initially, the structure should support standard Isaac Lab training workflows
3. License Headers
Reference pattern (all Isaac Lab files):
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...)
# All rights reserved.
#
# SPDX-License-Identifier: BSD-3-ClauseNew env files have DUAL license headers:
# Copyright (c) 2022-2026, The Isaac Lab Project Developers (...) # ← Isaac Lab header
# SPDX-License-Identifier: BSD-3-Clause
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION # ← Apache-2.0 header
# SPDX-License-Identifier: Apache-2.0
# Licensed under the Apache License, Version 2.0 (the "License")...This creates a license conflict. Files in isaaclab_tasks should use only the BSD-3-Clause header. The Apache-2.0 header should be removed from all files.
4. Scene Configuration
Reference pattern (lift ObjectTableSceneCfg):
@configclass
class ObjectTableSceneCfg(InteractiveSceneCfg):
robot: ArticulationCfg = MISSING # ← abstract, set by concrete cfg
ee_frame: FrameTransformerCfg = MISSING # ← abstract
object: RigidObjectCfg = MISSING # ← abstract
table = AssetBaseCfg(...) # ← concrete
plane = AssetBaseCfg(prim_path="/World/GroundPlane", ...) # ← ground plane
light = AssetBaseCfg(prim_path="/World/light", ...)New env AssembleTrocarSceneCfg:
@configclass
class AssembleTrocarSceneCfg(InteractiveSceneCfg):
robot: ArticulationCfg = G1RobotPresets.g1_29dof_dex3_base_fix(...) # ← concrete, not abstract
scene = AssetBaseCfg(...) # ← "scene" is a very generic name
trocar_1 = RigidObjectCfg(...)
trocar_2 = RigidObjectCfg(...)
tray = ArticulationCfg(...)
light = AssetBaseCfg(...) # ← no ground plane!Issues:
- No ground plane — All reference envs define a ground plane (
/World/GroundPlane). The new env relies on the scene USD having one, but this should be explicit - No abstract base — Robot is hardcoded instead of using
MISSINGpattern for robot-agnostic task definition - Field naming —
sceneis a poor name for a scene entity (shadowing theInteractiveSceneCfgconcept); should be something descriptive likeoperating_tableorworkspace - Camera configs as class attributes —
front_camera,left_wrist_camera,right_wrist_cameraare set via classmethod calls rather than declarative config instances trayusesArticulationCfgwith empty actuators — This is a passive prop;RigidObjectCfgwould be more semantically correct (orAssetBaseCfgif truly passive)
5. Environment Configuration
Reference pattern (lift LiftEnvCfg):
@configclass
class LiftEnvCfg(ManagerBasedRLEnvCfg):
scene: ObjectTableSceneCfg = ObjectTableSceneCfg(num_envs=4096, env_spacing=2.5)
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
commands: CommandsCfg = CommandsCfg()
rewards: RewardsCfg = RewardsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events: EventCfg = EventCfg() # ← type-annotated
curriculum: CurriculumCfg = CurriculumCfg()New env:
@configclass
class G1AssembleTrocarEnvCfg(ManagerBasedRLEnvCfg):
scene: AssembleTrocarSceneCfg = AssembleTrocarSceneCfg(num_envs=1, ...)
observations: ObservationsCfg = ObservationsCfg()
actions: ActionsCfg = ActionsCfg()
terminations: TerminationsCfg = TerminationsCfg()
events = EventCfg() # ← missing type annotation!
commands = None # ← None instead of omitting
rewards: RewardsCfg = RewardsCfg()
curriculum = None # ← None instead of omittingIssues:
events = EventCfg()missing type annotation — Should beevents: EventCfg = EventCfg()num_envs=1— Reference envs default to 4096. Setting 1 as default seems wrong for batch trainingcommands = Noneandcurriculum = None— These are alreadyNoneby default inManagerBasedRLEnvCfg, so explicitly setting them is unnecessary and clutters the configepisode_length_s = 20.0— This is set in__post_init__butMISSINGin the base class, so it must be set. This is fine, but notably longer than lift (5.0s)- Render settings in
__post_init__—self.sim.render.enable_translucency,self.sim.render.carb_settings,rendering_mode = "quality",antialiasing_mode = "DLAA"— these are unusual and task-specific. No other reference env sets these. Consider whether these belong in the env config or should be runtime overrides
6. MDP Module (mdp/)
Reference pattern (lazy_export()):
# mdp/__init__.py
from isaaclab.utils.module import lazy_export
lazy_export()New env:
# mdp/__init__.py
from isaaclab.envs.mdp import JointPositionActionCfg, time_out
from .events import ...
from .observations import ...
from .rewards import ...
from .terminations import ...
__all__ = [...]Issues:
- Should use
lazy_export()— This is the standard pattern in Isaac Lab mdp modules. It handles star imports and avoids manual__all__maintenance - Re-exports from
isaaclab.envs.mdp—JointPositionActionCfgandtime_outshould be imported directly where used, not re-exported through the task's mdp package
7. Observations
Reference pattern (lift):
joint_pos = ObsTerm(func=mdp.joint_pos_rel)
joint_vel = ObsTerm(func=mdp.joint_vel_rel)
object_position = ObsTerm(func=mdp.object_position_in_robot_root_frame)New env:
robot_joint_state = ObsTerm(func=mdp.get_robot_body_joint_states) # 87 dims (29*3)
robot_dex3_joint_state = ObsTerm(func=mdp.get_robot_dex3_joint_states) # 14 dimsIssues:
- Hardcoded joint indices — The observation functions use magic numbers for joint reordering (
body_joint_indices = [0, 3, 6, 9, ...]). This is fragile and couples the observation to a specific USD joint order. Consider usingjoint_names_exprpatterns instead - Global mutable state —
_body_obs_cacheand_dex3_obs_cacheare module-level mutable dicts. This breaks if multiple envs with different configurations run in the same process. The cache should be per-environment instance - No object observations — The env doesn't observe trocar positions/orientations in the policy group, only joint states. This means the policy can only use camera images for object state — valid for visuomotor policies, but unusual for the
policyobservation group concatenate_terms = False— This is intentional for visuomotor policies but worth documenting explicitly
8. Reward Functions
Reference pattern (lift):
reaching_object = RewTerm(func=mdp.object_ee_distance, params={"std": 0.1}, weight=1.0)
lifting_object = RewTerm(func=mdp.object_is_lifted, params={"minimal_height": 0.04}, weight=15.0)
action_rate = RewTerm(func=mdp.action_rate_l2, weight=-1e-4)New env (rewards.py — 736 lines!):
- Complex multi-stage state machine (
_task_stage) stored on env instance - Reward locking mechanism with
_lift_reward_locked,_tip_reward_locked, etc. - Debug printing with
should_print_debug() update_task_stage()called inside first reward function (lift_trocars_reward)
Issues:
- Reward functions have side effects —
lift_trocars_reward()callsupdate_task_stage()which modifiesenv._task_stage. The reward manager doesn't guarantee call order, so this is fragile. Stage updates should be in events or a dedicated pre-step hook, not buried in the first reward function - Storing state on env via
setattr—env._task_stage,env._prev_stage_*,env._lift_reward_locked, etc. are monkey-patched onto the env instance. This is the wrong pattern — Isaac Lab provides the observation/state manager for this, or custom attributes should be in the env cfg/env class reward / env.step_dtscaling — In sparse reward mode, rewards are divided byenv.step_dt. This is unusual and couples the reward magnitude to the physics timestep- No regularization rewards — Reference envs include
action_rate_l2andjoint_vel_l2penalties. This env has none, which may cause jerky motions - USD stage access in reward functions —
get_trocar_tip_position()accesses the USD stage viapxrAPI to compute tip offsets. This is extremely unusual for reward functions and may have performance implications. The offset should be precomputed in scene setup or the env__init__
9. External Dependencies
config/gr00t_config.py imports:
from gr00t.data.dataset import ModalityConfig
from gr00t.data.transform.base import ComposedModalityTransform
...
from gr00t.experiment.data_config import DATA_CONFIG_MAP, BaseDataConfig
from gr00t.model.transforms import GR00TTransformIssues:
gr00tis NOT a dependency ofisaaclab_tasks— This file will cause import errors for anyone installing Isaac Lab normally. It should either:- Be moved to
isaaclab_contrib(where RLinf lives) - Be guarded with try/except ImportError
- Not be in the task directory at all
- Be moved to
- The YAML config file (
isaaclab_ppo_gr00t_assemble_trocar.yaml) is RLinf-specific training configuration with hardcoded checkpoint paths (/mnt/ckpt/...). This doesn't belong inisaaclab_tasks
10. RLinf Extension Changes (rendering warmup)
The monkey-patching approach:
_original_reset = env.reset
def _patched_reset(*args, **kwargs):
obs, extras = _original_reset(*args, **kwargs)
# force render warmup
for _ in range(_n_warmup):
_rtx_utils._last_render_update_key = (0, -1) # ← private API access
env.sim.set_setting("/app/player/playSimulations", False)
_app.update()
env.sim.set_setting("/app/player/playSimulations", True)
env.scene.update(dt=0)
obs = env.observation_manager.compute(update_history=True)
env.obs_buf = obs
return obs, extras
env.reset = _patched_resetIssues:
- Accessing private API —
_rtx_utils._last_render_update_key = (0, -1)reaches into a private attribute ofisaac_rtx_renderer_utils. This will break when the renderer internals change env.obs_buf = obsoverride — Directly setting the observation buffer bypasses any safety checks or hooks- The right approach — Isaac Lab already has
num_rerenders_on_resetinManagerBasedEnvCfg. If this doesn't work correctly for the RLinf use case, the fix should be in the coreManagerBasedEnv.reset()method, not monkey-patched from the extension. File an issue or fix it upstream - Why not just set
env.cfg.num_rerenders_on_reset? — The env config already supports this. If it's not working correctly, that's a bug to fix in the env, not to patch around
Specific Issues (Numbered)
- Dual license headers — All new files have both BSD-3-Clause and Apache-2.0 headers. Remove Apache-2.0 headers
__init__.pyhas duplicate copyright block — Lines 1-4 and 6-8 are both copyright noticesgym.register()in wrong location — Move toconfig/g129_dex3/__init__.py- Direct class reference for
env_cfg_entry_point— Use string formatf"{__name__}.module:Class" - "RLinf" in env name — Remove framework-specific branding from gym ID
- Missing
agents/directory — Add standard RL agent configs eventsmissing type annotation — Addevents: EventCfg = EventCfg()num_envs=1default — Should be 4096 or at least a reasonable batch sizescenefield name in scene cfg — Rename to something descriptive- No ground plane — Add explicit ground plane config
mdp/__init__.pyshould uselazy_export()— Follow standard pattern- Module-level mutable observation cache — Move to per-env instance
- Reward side effects (stage machine) — Move
update_task_stageto event system setattrmonkey-patching on env instance — Use proper state managementgr00t_config.pyimports unavailable package — Move toisaaclab_contribor guard imports- RLinf YAML with hardcoded paths — Remove from
isaaclab_tasks - USD stage access in reward function — Precompute tip offsets during init
- No regularization rewards — Add action_rate and joint_vel penalties
- Missing
num_rerenders_on_resetin env cfg — Set this instead of monkey-patching in RLinf trayusesArticulationCfgwith empty actuators — UseRigidObjectCfgorAssetBaseCfg
Recommendations (Priority Order)
P0 — Must fix before merge:
- Remove dual license headers; use only BSD-3-Clause
- Move
gr00t_config.pyandisaaclab_ppo_gr00t_assemble_trocar.yamlout ofisaaclab_tasks(toisaaclab_contribor a separate package) - Restructure directory to match standard layout: registrations in
config/g129_dex3/__init__.py - Fix
env_cfg_entry_pointto use string format - Remove "RLinf" from gym env IDs
- Set
num_rerenders_on_resetin the env cfg instead of monkey-patching reset
P1 — Should fix:
7. Add type annotation for events
8. Use lazy_export() in mdp/__init__.py
9. Move stage state machine out of reward functions into events
10. Add ground plane to scene config
11. Fix default num_envs to a reasonable batch size
12. Add action rate / joint velocity regularization rewards
P2 — Nice to have:
13. Create abstract base env cfg with MISSING robot field
14. Move observation cache from module-level to env instance
15. Precompute trocar tip offsets during env initialization
16. Add agents/ directory with at least one standard training config
17. Rename scene field in AssembleTrocarSceneCfg to something descriptive
Description
Adds the Assemble Trocar manipulation task for the Unitree G1 (29-DoF + Dex3), with RLinf support.
Key additions:
IsaacLabDataConfigdefining video/state/action modality keys, transforms (SinCos state encoding, min-max action normalization, color jitter), and model-specific settings.isaaclab_contrib/rl/rlinf/extension.pyto support the new task registration.Type of change
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there