[Exp] Add Anymal-D rough terrain warp velocity environment#5088
[Exp] Add Anymal-D rough terrain warp velocity environment#5088hujc7 wants to merge 12 commits intoisaac-sim:developfrom
Conversation
Add experimental warp-compatible manager implementations, MDP terms, utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch for eager/captured dispatch, and manager-based env orchestration. Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to accept warp env types, and minor stable fixes (settings_manager RuntimeError handling, observation_manager comment cleanup).
Add an experimental manager-based Cartpole environment using the warp manager infrastructure as a reference task for testing and benchmarking.
|
WIP but set to review for bot review |
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — PR #5088
"[Exp] Add Anymal-D rough terrain warp velocity environment" by @hujc7
Summary
This PR adds Anymal-D rough terrain locomotion on Newton/Warp with the Newton collision pipeline. It is stacked on two prior PRs (#4829 warp manager infra, #4945 warp MDP migration) and introduces the top commit(s) for rough terrain support. The PR spans 118 files across several packages:
-
Experimental warp-first manager infrastructure (
isaaclab_experimental): Complete manager-based env orchestration with Warp-native managers (Action, Observation, Event, Reward, Termination, Command), MDP terms, utilities (buffers, modifiers, noise, warp kernels), andManagerCallSwitchfor eager/captured dispatch. -
Anymal-D rough terrain env config (
isaaclab_tasks_experimental):LocomotionVelocityBaseEnvCfg→LocomotionVelocityRoughEnvCfg→AnymalDRoughEnvCfghierarchy with Newton solver settings and terrain-aware spawning. -
Newton physics fixes: Zero contact margin fix, collision pipeline buffer overflow fix (
max_triangle_pairs1M→2M),capture_unsafedecorator on lazy ArticulationData properties, and state kernel helpers. -
Stable Newton preset:
RoughPhysicsCfgadded to the stable Anymal-D env with pyramidal cone + Newton collision pipeline settings. -
Dependency bumps: mujoco/mujoco-warp 3.5→3.6.0, Newton pinned to
141baffff9. -
RL wrapper updates: rsl_rl, rl_games, sb3, skrl wrappers accept
ManagerBasedRLEnvWarptypes.
Design Assessment
Architecture: Well-structured for experimental iteration. The env config hierarchy (Base → Rough/Flat) cleanly separates shared Newton solver settings from terrain-specific overrides (buffer sizes, friction cone, CCD iterations). The ManagerCallSwitch provides a clean 3-mode dispatch (stable/warp-eager/warp-captured) per manager, which is a pragmatic approach for incremental migration.
Warp-first MDP terms follow a consistent pattern. Observation/reward/termination terms all follow func(env, out, **params) -> None writing into pre-allocated Warp arrays. The decorator system (@generic_io_descriptor_warp) mirrors the stable IO descriptor system while handling the key difference that Warp-first terms cannot produce output during inspection.
The upstream workarounds are well-documented. The PR body clearly explains the MuJoCo Warp float32 divergence, zero contact margin, and buffer overflow issues with links to upstream bug reports. The code workarounds are appropriately placed (Newton manager for contact margin and pipeline buffer size).
Architecture Impact
-
Low risk to stable code. The stable changes are minimal: a Newton physics preset for Anymal-D rough env,
capture_unsafedecorators on ArticulationData lazy properties, zero margin fix innewton_manager.py, and RL wrapper type acceptance. The bulk of the PR lives inisaaclab_experimentalandisaaclab_tasks_experimental. -
Dependency pinning concern: Newton is pinned to a specific commit hash (
141baffff9), not a release tag. This is acceptable for experimental work but should be tracked for eventual release stabilization.
Implementation Verdict
APPROVE with advisory findings. The architecture is sound, the upstream workarounds are well-motivated, and the experimental env follows existing patterns correctly. The findings below are advisory improvements, not blockers.
Test Coverage
The PR includes comprehensive parity tests in isaaclab_experimental/test/envs/mdp/:
test_actions_warp_parity.py(227 lines)test_events_warp_parity.py(340 lines)test_observations_warp_parity.py(458 lines)test_rewards_warp_parity.py(567 lines)test_terminations_warp_parity.py(349 lines)
Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Shared test fixtures are extracted into parity_helpers.py.
Training validation: 500 iterations at 4096 envs with zero NaN (reward 10.71, episode length 824). Ablation tests confirm contact margin fix and buffer size increase are essential.
CI Status
✅ labeler — pass (4s)
Note: Only the labeler check ran. No lint, build, or test CI checks are configured for this PR.
Findings
1. [Medium] Function-level state caching on reset_root_state_from_terrain is fragile
File: isaaclab_experimental/envs/mdp/events.py
The function caches Warp arrays and parsed range values as attributes on the function object itself (reset_root_state_from_terrain._scratch_pose, etc.). This pattern has several issues:
- Multiple environments: If two environments use this function with different terrains/configs, they share the same cached state. The first caller's terrain data wins.
- Re-creation: If the env is destroyed and recreated, the stale function-level state persists because the function object is module-level.
Recommendation: Move cached state to the environment instance (e.g., env._reset_terrain_cache) or use a per-asset-cfg cache key. Same issue applies to generated_commands, track_lin_vel_xy_exp, and track_ang_vel_z_exp in observations and rewards.
2. [Medium] spatial_vectorf component ordering assumes [lin, ang] but Warp uses [ang, lin]
File: isaaclab_experimental/envs/mdp/events.py — _reset_root_state_from_terrain_kernel
The kernel writes velocity as:
vel_out[env_id] = wp.spatial_vectorf(
default_vel[0] + wp.randf(state, vel_lin_lo[0], vel_lin_hi[0]), # lin_x
default_vel[1] + wp.randf(state, vel_lin_lo[1], vel_lin_hi[1]), # lin_y
default_vel[2] + wp.randf(state, vel_lin_lo[2], vel_lin_hi[2]), # lin_z
default_vel[3] + wp.randf(state, vel_ang_lo[0], vel_ang_hi[0]), # ang_x
default_vel[4] + wp.randf(state, vel_ang_lo[1], vel_ang_hi[1]), # ang_y
default_vel[5] + wp.randf(state, vel_ang_lo[2], vel_ang_hi[2]), # ang_z
)Warp's spatial_vectorf convention is [angular(0:3), linear(3:6)] per wp.spatial_top (angular) / wp.spatial_bottom (linear). If default_root_vel follows [lin, ang] ordering from ArticulationData, the random perturbation would be applied to the wrong components. Verify that the convention matches what write_root_velocity_to_sim_mask expects. If the asset data is stored as [lin, ang] then the raw default_vel[0..5] access is correct for a raw float array, but the vel_lin_lo/vel_ang_lo labeling is misleading.
3. [Low] COM randomization disabled with a None assignment instead of removal
File: isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py
base_com = None # FIXME(warp-migration): ...causes NaNsThe commented-out FIXME is good documentation, but setting base_com = None in the config means that any subclass trying to override this field will need to know about this workaround. Consider adding a tracking issue reference.
4. [Low] Pre-commit checks not run
The PR body explicitly notes [ ] Run pre-commit checks is unchecked. Since the only CI check that ran is the labeler, lint/formatting issues may exist.
Summary
A well-structured, large PR that brings Anymal-D rough terrain locomotion to the Newton/Warp experimental pipeline. The upstream physics workarounds are properly documented and justified. The experimental manager infrastructure is cleanly separated from stable code. The function-level state caching pattern (Finding 1) and spatial vector component ordering (Finding 2) warrant attention but are not blockers for an experimental feature.
WarpGraphCache now runs an eager warm-up call before graph capture so that first-call initialisation (allocations, hasattr guards, dtype casts) executes outside the capture context. Also align warp env rendering checks with stable env: use sim.is_rendering for the step-loop check and cache has_rtx_sensors at init for rerender-on-reset and render-method guards.
Add warp-first MDP terms (observations, rewards, events, terminations, actions) for manager-based envs. Update manager infrastructure with ManagerCallSwitch max_mode cap, Scene capture config, body_ids_wp resolution, and capture safety annotations. Add newton state kernels for body-frame computations used by MDP terms.
Fix WrenchComposer to use Tier 1 sim-bind buffers (body_link_pose_w, body_com_pos_b) instead of caching the lazy Tier 2 body_com_pose_w property, which became stale after the first step. Add capture_unsafe decorator for lazy-evaluated derived properties in articulation and rigid object data. Update wrench kernels to compute COM pose inline from link pose and body-frame COM offset.
Add manager-based warp env configs for classic (Ant, Humanoid), locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), and manipulation reach (Franka, UR10). Include task-specific MDP terms for humanoid observations/rewards and locomotion rewards/terminations/ curriculums.
Add parity test infrastructure and tests organized by MDP category: observations, rewards, terminations, events, and actions. Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Extract shared fixtures and mock objects to parity_helpers.py.
Fix accidental revert of WarpGraphCache abstraction from stale cherry-pick. Restore register_manager_capturability and runtime _max_modes cap in get_mode_for_manager so managers can dynamically downgrade from captured mode during term preparation.
Replace scattered hasattr guards with a single _is_warmed_up flag per MDP term function. Fix parity test mocks: GRAVITY_VEC_W as 1D array, add missing mask-variant write stubs and wrench composer mock. Add unit tests for WarpGraphCache and ManagerCallSwitch.
Enable rough terrain locomotion for Anymal-D on Newton/Warp with the Newton collision pipeline. Tested at 4096 envs for 500 iterations with zero NaN, reward 10.71, episode length 824. - Add Base/Rough/Flat env config hierarchy with Newton solver settings - Add reset_root_state_from_terrain warp kernel for terrain-aware spawn - Fix zero contact margin from USD import causing CCD divergence - Increase CollisionPipeline max_triangle_pairs for mesh terrain - Use pyramidal friction cone for rough terrain solver stability - Add warm-up before CUDA graph capture in ManagerCallSwitch - Bump mujoco/mujoco-warp to 3.6.0, pin Newton to 141baffff9
Add RoughPhysicsCfg preset with Newton solver settings tuned for mesh terrain: pyramidal friction cone, increased buffer sizes, and Newton collision pipeline. Disable RayCaster height scanner for Newton as it requires Kit. Tested at 4096 envs for 300 iterations with presets=newton, reward 11.98, episode length 966.
7589193 to
4f4e142
Compare
|
@greptileai Review |
Greptile SummaryThis PR adds an experimental Anymal-D rough terrain locomotion environment running on Newton/MuJoCo Warp, along with the supporting Warp-first infrastructure (managers, MDP terms, CUDA graph capture, warm-up run) and several Newton-level workarounds for mesh-terrain stability issues (contact margin fix, increased buffer sizes, pyramidal-cone solver). It also bumps Key changes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant RL as RL Training Loop
participant Env as ManagerBasedRLEnvWarp
participant MCS as ManagerCallSwitch
participant WGC as WarpGraphCache
participant EM as EventManager (Warp)
participant Kernel as CUDA Kernel
participant Terrain as TerrainImporter
RL->>Env: step(actions)
Env->>MCS: call_stage("EventManager_reset", ...)
MCS->>MCS: get_mode_for_manager → WARP_CAPTURED (2)
MCS->>WGC: capture_or_replay("EventManager_reset", fn)
alt First call (warm-up + capture)
WGC->>EM: fn() [warm-up — eager]
EM->>Terrain: read terrain_levels (int64→int32 copy)
EM->>Kernel: wp.launch reset_root_state_from_terrain
WGC->>EM: fn() [capture — inside ScopedCapture]
EM->>Kernel: wp.launch (recorded into CUDA graph)
WGC->>WGC: cache graph + result
else Subsequent calls
WGC->>WGC: wp.capture_launch(cached_graph)
Note over WGC,Terrain: ⚠ terrain_levels_wp NOT updated<br/>Curriculum changes to terrain.terrain_levels<br/>are invisible to the captured kernel
end
RL->>Env: curriculum update
Env->>Terrain: update_env_origins(env_ids, move_up, move_down)
Terrain->>Terrain: terrain_levels[env_ids] updated in-place
Note over Terrain,Kernel: Warp buffers still hold stale initial values
Reviews (1): Last reviewed commit: "Standardize _is_warmed_up flag for reset..." | Re-trigger Greptile |
| reset_root_state_from_terrain._terrain_levels_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device) | ||
| reset_root_state_from_terrain._terrain_types_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device) | ||
| levels_i32 = terrain.terrain_levels.to(dtype=torch.int32, copy=True).contiguous() | ||
| types_i32 = terrain.terrain_types.to(dtype=torch.int32, copy=True).contiguous() | ||
| wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32)) | ||
| wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32)) | ||
| reset_root_state_from_terrain._num_patches = valid_positions.shape[2] | ||
|
|
||
| # Scratch buffers | ||
| reset_root_state_from_terrain._scratch_pose = wp.zeros((env.num_envs,), dtype=wp.transformf, device=env.device) | ||
| reset_root_state_from_terrain._scratch_vel = wp.zeros( | ||
| (env.num_envs,), dtype=wp.spatial_vectorf, device=env.device | ||
| ) | ||
|
|
||
| # Pre-parse orientation range (roll, pitch, yaw) | ||
| r = [pose_range.get(key, (0.0, 0.0)) for key in ["roll", "pitch", "yaw"]] | ||
| reset_root_state_from_terrain._rot_lo = wp.vec3f(r[0][0], r[1][0], r[2][0]) | ||
| reset_root_state_from_terrain._rot_hi = wp.vec3f(r[0][1], r[1][1], r[2][1]) | ||
|
|
||
| # Pre-parse velocity range | ||
| v = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]] | ||
| reset_root_state_from_terrain._vel_lin_lo = wp.vec3f(v[0][0], v[1][0], v[2][0]) | ||
| reset_root_state_from_terrain._vel_lin_hi = wp.vec3f(v[0][1], v[1][1], v[2][1]) | ||
| reset_root_state_from_terrain._vel_ang_lo = wp.vec3f(v[3][0], v[4][0], v[5][0]) | ||
| reset_root_state_from_terrain._vel_ang_hi = wp.vec3f(v[3][1], v[4][1], v[5][1]) | ||
| reset_root_state_from_terrain._is_warmed_up = True |
There was a problem hiding this comment.
Terrain levels cached once, never updated by curriculum
_terrain_levels_wp and _terrain_types_wp are initialized from terrain.terrain_levels / terrain.terrain_types during the first warm-up call and then the _is_warmed_up guard prevents any subsequent re-initialization. The terrain curriculum (terrain_levels_vel) regularly calls terrain.update_env_origins(env_ids, move_up, move_down) which updates the torch tensors in-place, but the warp buffers are separate GPU allocations (created via to(dtype=torch.int32, copy=True)) and are never refreshed afterwards.
The result is that reset_root_state_from_terrain always spawns each robot at its initial terrain level assignment, regardless of how the curriculum has since promoted or demoted that environment. The PR's stated goal — "correctly spawns on the assigned level, enabling curriculum progression" — is not achieved: the apparent "curriculum level 4.51" in the training results reflects the initial distribution of terrain level assignments, not dynamic curriculum progression.
A straightforward fix would be to update _terrain_levels_wp / _terrain_types_wp unconditionally on every call (outside the capture region, if CUDA graphs are in use). Since the torch tensors are already on device, a wp.copy of the int32-cast view is cheap and can be done before launching the kernel:
# After _is_warmed_up block, before wp.launch:
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32).contiguous()
types_i32 = terrain.terrain_types.to(dtype=torch.int32).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32))If this function runs inside a CUDA graph, the copy would need to happen in a non-captured scope (the warm-up or a pre-step hook).
| if not getattr(apply_external_force_torque, "_is_warmed_up", False): | ||
| apply_external_force_torque._scratch_forces = wp.zeros( | ||
| (env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device | ||
| ) | ||
| apply_external_force_torque._scratch_torques = wp.zeros( | ||
| (env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device | ||
| ) | ||
| apply_external_force_torque._is_warmed_up = True | ||
|
|
||
| wp.launch( | ||
| kernel=_apply_external_force_torque_kernel, | ||
| dim=env.num_envs, | ||
| inputs=[ | ||
| env_mask, | ||
| env.rng_state_wp, | ||
| apply_external_force_torque._scratch_forces, | ||
| apply_external_force_torque._scratch_torques, | ||
| force_range[0], | ||
| force_range[1], | ||
| torque_range[0], | ||
| torque_range[1], | ||
| ], | ||
| device=env.device, | ||
| ) | ||
|
|
||
| asset.permanent_wrench_composer.set_forces_and_torques_mask( | ||
| forces=apply_external_force_torque._scratch_forces, | ||
| torques=apply_external_force_torque._scratch_torques, | ||
| env_mask=env_mask, | ||
| ) |
There was a problem hiding this comment.
apply_external_force_torque ignores configured body names
The stable isaaclab.envs.mdp.events.apply_external_force_torque uses asset_cfg.body_ids to restrict which bodies receive forces. This warp version allocates scratch buffers sized (num_envs, asset.num_bodies) and the kernel iterates for b in range(force_out.shape[1]), applying random forces to all bodies regardless of asset_cfg.body_names.
This is currently harmless in the locomotion configs because force_range=(0.0, 0.0) and torque_range=(-0.0, 0.0) produce zero forces for all bodies. However, if a downstream config sets non-zero ranges (e.g. to randomize perturbation on the base only), the behavior would silently diverge from the stable implementation — applying forces to every link instead of just the configured ones.
Consider adding a body_ids parameter (similar to joint_ids in reset_joints_by_offset) so the kernel can be restricted to the configured body subset.
| if not getattr(fn, "_is_warmed_up", False) or fn._asset_name != asset_cfg.name: | ||
| fn._asset_name = asset_cfg.name | ||
| r = [com_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z"]] | ||
| fn._com_lo = wp.vec3f(r[0][0], r[1][0], r[2][0]) | ||
| fn._com_hi = wp.vec3f(r[0][1], r[1][1], r[2][1]) | ||
| fn._is_warmed_up = True |
There was a problem hiding this comment.
Function-level state is shared across all callers of the same function
All event functions in this file store their "warm-up" state as attributes directly on the function object (e.g. fn._is_warmed_up, fn._scratch_forces, fn._terrain_levels_wp). This is not instance-scoped: if two environments are created in the same process, or if the same function is registered to two different event terms with different asset configs / num_envs, the second caller will silently reuse stale/wrong-sized buffers from the first.
For example, randomize_rigid_body_com checks fn._asset_name != asset_cfg.name to partially guard against this, but that guard does not cover num_envs changes. The other functions (apply_external_force_torque, push_by_setting_velocity, reset_root_state_from_terrain, etc.) do not have even this partial check.
This design is reasonable for the single-env experimental use-case described in this PR, but it would be worth adding a comment explicitly documenting the single-environment assumption to prevent surprising failures in future multi-env or serialized-training scenarios.
| "mujoco-warp==3.6.0", | ||
| "PyOpenGL-accelerate==3.1.10", | ||
| "newton==1.0.0", | ||
| "newton @ git+https://github.com/newton-physics/newton.git@141baffff9", |
There was a problem hiding this comment.
Newton pinned to an unreleased commit hash
"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",Pinning to a raw, unreleased commit SHA is fragile: the commit is not immutable in the same way a signed tag or package release would be — history can be force-pushed, the SHA is not semantically versioned, and it is difficult to audit what changed between revisions. It also makes the dependency invisible to security-scanning tooling that knows about released packages.
Consider whether Newton will publish a versioned release that includes 141baffff9 so the dependency can be expressed as a proper version pin (e.g., newton>=x.y.z), or at minimum document the expected upstream release that will supersede this commit in a code comment.
Summary
LocomotionVelocityBaseEnvCfgwith shared Newton solver settings,LocomotionVelocityRoughEnvCfgandLocomotionVelocityFlatEnvCfgsubclasses with terrain-specific overridesreset_root_state_from_terrainwarp kernel for terrain-aware spawning using flat patch samplingManagerCallSwitchmujoco/mujoco-warpto 3.6.0 and pin Newton to141baffff9RoughPhysicsCfgNewton preset to stable Anymal-D rough envUpstream issues and workarounds
Running Newton's collision pipeline on mesh terrain with MuJoCo Warp as the constraint solver exposes several issues that required workarounds in IsaacLab:
MuJoCo Warp solver divergence with elliptic friction cone (float32 precision)
MuJoCo Warp uses float32 (vs float64 in MuJoCo C). The constraint solver diverges to NaN for ~1/4000 envs when using elliptic friction cone + high
impratiowith many mesh terrain contacts. This is a known upstream issue:Workaround: Use
cone="pyramidal"+impratio=1.0for rough terrain. MuJoCo docs note that "pyramidal cones sometimes make the solver faster and more robust". Upgrading to MuJoCo Warp 3.6.0 also improved stability ~7x (PR #571 fix).Zero contact margin from USD import
Newton's USD importer produces shapes with zero contact margin. Zero margin causes the CCD narrow phase to use a degenerate
is_discretepath (epsilon=0) where GJK can fail to converge on mesh terrain geometry, producing NaN in body/joint state.Workaround: Enforce minimum margin of 0.01 on all zero-margin shapes before
ModelBuilder.finalize().Collision pipeline buffer overflow on mesh terrain
The default
max_triangle_pairs(1M) andnjmax/nconmaxvalues are insufficient for mesh terrain. At 4096 envs, mesh terrain generates ~2M triangle pairs and up to 162 constraint rows per env. Buffer overflow silently drops contacts, causing robots to fall through terrain.Workaround: Increase
max_triangle_pairsto 2M inCollisionPipeline, and setnjmax=200,nconmax=100for rough terrain.Training results
The stable env shows higher reward because its
reset_root_state_uniformdoes not spawn robots on their assigned terrain level, so the terrain curriculum stays at ~0.6 (easiest terrain). The experimental warp env usesreset_root_state_from_terrainwhich correctly spawns on the assigned level, enabling curriculum progression to level ~4.5 (harder terrain). The experimental env's lower reward reflects training on harder terrain, not worse performance.Both envs exceed the Anymal-C rough benchmark reward threshold (≥7). The experimental warp env was additionally tested at 500 iterations (reward 10.71, episode length 824, zero NaN).
Dependencies
This PR is stacked on top of:
Only the top commit(s) are new in this PR.
Test plan
presets=newton: 300 iterations at 4096 envs, zero NaN, reward 11.98, episode length 966