Skip to content

[Exp] Add Anymal-D rough terrain warp velocity environment#5088

Open
hujc7 wants to merge 12 commits intoisaac-sim:developfrom
hujc7:dev-velocity-rouge
Open

[Exp] Add Anymal-D rough terrain warp velocity environment#5088
hujc7 wants to merge 12 commits intoisaac-sim:developfrom
hujc7:dev-velocity-rouge

Conversation

@hujc7
Copy link
Copy Markdown

@hujc7 hujc7 commented Mar 23, 2026

Summary

  • Enable rough terrain locomotion for Anymal-D on Newton/Warp with the Newton collision pipeline
  • Add LocomotionVelocityBaseEnvCfg with shared Newton solver settings, LocomotionVelocityRoughEnvCfg and LocomotionVelocityFlatEnvCfg subclasses with terrain-specific overrides
  • Add reset_root_state_from_terrain warp kernel for terrain-aware spawning using flat patch sampling
  • Add warm-up run before CUDA graph capture in ManagerCallSwitch
  • Bump mujoco/mujoco-warp to 3.6.0 and pin Newton to 141baffff9
  • Add RoughPhysicsCfg Newton preset to stable Anymal-D rough env

Upstream issues and workarounds

Running Newton's collision pipeline on mesh terrain with MuJoCo Warp as the constraint solver exposes several issues that required workarounds in IsaacLab:

MuJoCo Warp solver divergence with elliptic friction cone (float32 precision)

MuJoCo Warp uses float32 (vs float64 in MuJoCo C). The constraint solver diverges to NaN for ~1/4000 envs when using elliptic friction cone + high impratio with many mesh terrain contacts. This is a known upstream issue:

Workaround: Use cone="pyramidal" + impratio=1.0 for rough terrain. MuJoCo docs note that "pyramidal cones sometimes make the solver faster and more robust". Upgrading to MuJoCo Warp 3.6.0 also improved stability ~7x (PR #571 fix).

Zero contact margin from USD import

Newton's USD importer produces shapes with zero contact margin. Zero margin causes the CCD narrow phase to use a degenerate is_discrete path (epsilon=0) where GJK can fail to converge on mesh terrain geometry, producing NaN in body/joint state.

Workaround: Enforce minimum margin of 0.01 on all zero-margin shapes before ModelBuilder.finalize().

Collision pipeline buffer overflow on mesh terrain

The default max_triangle_pairs (1M) and njmax/nconmax values are insufficient for mesh terrain. At 4096 envs, mesh terrain generates ~2M triangle pairs and up to 162 constraint rows per env. Buffer overflow silently drops contacts, causing robots to fall through terrain.

Workaround: Increase max_triangle_pairs to 2M in CollisionPipeline, and set njmax=200, nconmax=100 for rough terrain.

Training results

Metric Exp warp (300it) Stable Newton (300it) Benchmark (Anymal-C rough)
Reward 8.41 11.98 ≥ 7
Episode length 710.6 966.0 ≥ 875
time_out % 48.3% 81.8%
base_contact % 51.7% 18.2%
Terrain curriculum 4.51 0.61

The stable env shows higher reward because its reset_root_state_uniform does not spawn robots on their assigned terrain level, so the terrain curriculum stays at ~0.6 (easiest terrain). The experimental warp env uses reset_root_state_from_terrain which correctly spawns on the assigned level, enabling curriculum progression to level ~4.5 (harder terrain). The experimental env's lower reward reflects training on harder terrain, not worse performance.

Both envs exceed the Anymal-C rough benchmark reward threshold (≥7). The experimental warp env was additionally tested at 500 iterations (reward 10.71, episode length 824, zero NaN).

Dependencies

This PR is stacked on top of:

Only the top commit(s) are new in this PR.

Test plan

  • Exp warp env: 500 iterations at 4096 envs, zero NaN, reward 10.71, episode length 824
  • Exp warp env with CUDA graph capture (mode 2): 300 iterations via sweep script, PASS
  • Stable env with presets=newton: 300 iterations at 4096 envs, zero NaN, reward 11.98, episode length 966
  • Ablation: removing contact margin fix → NaN crash (confirms fix is essential)
  • Ablation: removing buffer size increase → NaN crash (confirms fix is essential)
  • Run pre-commit checks

hujc7 added 2 commits March 15, 2026 00:34
Add experimental warp-compatible manager implementations, MDP terms,
utilities (buffers, modifiers, noise, warp kernels), ManagerCallSwitch
for eager/captured dispatch, and manager-based env orchestration.

Includes RL library wrapper updates (rsl_rl, rl_games, sb3, skrl) to
accept warp env types, and minor stable fixes (settings_manager
RuntimeError handling, observation_manager comment cleanup).
Add an experimental manager-based Cartpole environment using the warp
manager infrastructure as a reference task for testing and benchmarking.
@github-actions github-actions bot added the isaac-lab Related to Isaac Lab team label Mar 23, 2026
@hujc7
Copy link
Copy Markdown
Author

hujc7 commented Mar 23, 2026

WIP but set to review for bot review

Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Isaac Lab Review Bot — PR #5088

"[Exp] Add Anymal-D rough terrain warp velocity environment" by @hujc7


Summary

This PR adds Anymal-D rough terrain locomotion on Newton/Warp with the Newton collision pipeline. It is stacked on two prior PRs (#4829 warp manager infra, #4945 warp MDP migration) and introduces the top commit(s) for rough terrain support. The PR spans 118 files across several packages:

  1. Experimental warp-first manager infrastructure (isaaclab_experimental): Complete manager-based env orchestration with Warp-native managers (Action, Observation, Event, Reward, Termination, Command), MDP terms, utilities (buffers, modifiers, noise, warp kernels), and ManagerCallSwitch for eager/captured dispatch.

  2. Anymal-D rough terrain env config (isaaclab_tasks_experimental): LocomotionVelocityBaseEnvCfgLocomotionVelocityRoughEnvCfgAnymalDRoughEnvCfg hierarchy with Newton solver settings and terrain-aware spawning.

  3. Newton physics fixes: Zero contact margin fix, collision pipeline buffer overflow fix (max_triangle_pairs 1M→2M), capture_unsafe decorator on lazy ArticulationData properties, and state kernel helpers.

  4. Stable Newton preset: RoughPhysicsCfg added to the stable Anymal-D env with pyramidal cone + Newton collision pipeline settings.

  5. Dependency bumps: mujoco/mujoco-warp 3.5→3.6.0, Newton pinned to 141baffff9.

  6. RL wrapper updates: rsl_rl, rl_games, sb3, skrl wrappers accept ManagerBasedRLEnvWarp types.


Design Assessment

Architecture: Well-structured for experimental iteration. The env config hierarchy (Base → Rough/Flat) cleanly separates shared Newton solver settings from terrain-specific overrides (buffer sizes, friction cone, CCD iterations). The ManagerCallSwitch provides a clean 3-mode dispatch (stable/warp-eager/warp-captured) per manager, which is a pragmatic approach for incremental migration.

Warp-first MDP terms follow a consistent pattern. Observation/reward/termination terms all follow func(env, out, **params) -> None writing into pre-allocated Warp arrays. The decorator system (@generic_io_descriptor_warp) mirrors the stable IO descriptor system while handling the key difference that Warp-first terms cannot produce output during inspection.

The upstream workarounds are well-documented. The PR body clearly explains the MuJoCo Warp float32 divergence, zero contact margin, and buffer overflow issues with links to upstream bug reports. The code workarounds are appropriately placed (Newton manager for contact margin and pipeline buffer size).


Architecture Impact

  • Low risk to stable code. The stable changes are minimal: a Newton physics preset for Anymal-D rough env, capture_unsafe decorators on ArticulationData lazy properties, zero margin fix in newton_manager.py, and RL wrapper type acceptance. The bulk of the PR lives in isaaclab_experimental and isaaclab_tasks_experimental.

  • Dependency pinning concern: Newton is pinned to a specific commit hash (141baffff9), not a release tag. This is acceptable for experimental work but should be tracked for eventual release stabilization.


Implementation Verdict

APPROVE with advisory findings. The architecture is sound, the upstream workarounds are well-motivated, and the experimental env follows existing patterns correctly. The findings below are advisory improvements, not blockers.


Test Coverage

The PR includes comprehensive parity tests in isaaclab_experimental/test/envs/mdp/:

  • test_actions_warp_parity.py (227 lines)
  • test_events_warp_parity.py (340 lines)
  • test_observations_warp_parity.py (458 lines)
  • test_rewards_warp_parity.py (567 lines)
  • test_terminations_warp_parity.py (349 lines)

Tests verify warp-first implementations match stable torch baselines across three execution modes (stable, warp uncaptured, warp CUDA-graph captured). Shared test fixtures are extracted into parity_helpers.py.

Training validation: 500 iterations at 4096 envs with zero NaN (reward 10.71, episode length 824). Ablation tests confirm contact margin fix and buffer size increase are essential.


CI Status

labeler — pass (4s)

Note: Only the labeler check ran. No lint, build, or test CI checks are configured for this PR.


Findings

1. [Medium] Function-level state caching on reset_root_state_from_terrain is fragile

File: isaaclab_experimental/envs/mdp/events.py

The function caches Warp arrays and parsed range values as attributes on the function object itself (reset_root_state_from_terrain._scratch_pose, etc.). This pattern has several issues:

  • Multiple environments: If two environments use this function with different terrains/configs, they share the same cached state. The first caller's terrain data wins.
  • Re-creation: If the env is destroyed and recreated, the stale function-level state persists because the function object is module-level.

Recommendation: Move cached state to the environment instance (e.g., env._reset_terrain_cache) or use a per-asset-cfg cache key. Same issue applies to generated_commands, track_lin_vel_xy_exp, and track_ang_vel_z_exp in observations and rewards.

2. [Medium] spatial_vectorf component ordering assumes [lin, ang] but Warp uses [ang, lin]

File: isaaclab_experimental/envs/mdp/events.py_reset_root_state_from_terrain_kernel

The kernel writes velocity as:

vel_out[env_id] = wp.spatial_vectorf(
    default_vel[0] + wp.randf(state, vel_lin_lo[0], vel_lin_hi[0]),  # lin_x
    default_vel[1] + wp.randf(state, vel_lin_lo[1], vel_lin_hi[1]),  # lin_y
    default_vel[2] + wp.randf(state, vel_lin_lo[2], vel_lin_hi[2]),  # lin_z
    default_vel[3] + wp.randf(state, vel_ang_lo[0], vel_ang_hi[0]),  # ang_x
    default_vel[4] + wp.randf(state, vel_ang_lo[1], vel_ang_hi[1]),  # ang_y
    default_vel[5] + wp.randf(state, vel_ang_lo[2], vel_ang_hi[2]),  # ang_z
)

Warp's spatial_vectorf convention is [angular(0:3), linear(3:6)] per wp.spatial_top (angular) / wp.spatial_bottom (linear). If default_root_vel follows [lin, ang] ordering from ArticulationData, the random perturbation would be applied to the wrong components. Verify that the convention matches what write_root_velocity_to_sim_mask expects. If the asset data is stored as [lin, ang] then the raw default_vel[0..5] access is correct for a raw float array, but the vel_lin_lo/vel_ang_lo labeling is misleading.

3. [Low] COM randomization disabled with a None assignment instead of removal

File: isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py

base_com = None  # FIXME(warp-migration): ...causes NaNs

The commented-out FIXME is good documentation, but setting base_com = None in the config means that any subclass trying to override this field will need to know about this workaround. Consider adding a tracking issue reference.

4. [Low] Pre-commit checks not run

The PR body explicitly notes [ ] Run pre-commit checks is unchecked. Since the only CI check that ran is the labeler, lint/formatting issues may exist.


Summary

A well-structured, large PR that brings Anymal-D rough terrain locomotion to the Newton/Warp experimental pipeline. The upstream physics workarounds are properly documented and justified. The experimental manager infrastructure is cleanly separated from stable code. The function-level state caching pattern (Finding 1) and spatial vector component ordering (Finding 2) warrant attention but are not blockers for an experimental feature.

hujc7 added 10 commits March 30, 2026 15:21
WarpGraphCache now runs an eager warm-up call before graph capture so
that first-call initialisation (allocations, hasattr guards, dtype
casts) executes outside the capture context.

Also align warp env rendering checks with stable env: use
sim.is_rendering for the step-loop check and cache has_rtx_sensors
at init for rerender-on-reset and render-method guards.
Add warp-first MDP terms (observations, rewards, events, terminations,
actions) for manager-based envs. Update manager infrastructure with
ManagerCallSwitch max_mode cap, Scene capture config, body_ids_wp
resolution, and capture safety annotations.

Add newton state kernels for body-frame computations used by MDP terms.
Fix WrenchComposer to use Tier 1 sim-bind buffers (body_link_pose_w,
body_com_pos_b) instead of caching the lazy Tier 2 body_com_pose_w
property, which became stale after the first step.

Add capture_unsafe decorator for lazy-evaluated derived properties in
articulation and rigid object data. Update wrench kernels to compute
COM pose inline from link pose and body-frame COM offset.
Add manager-based warp env configs for classic (Ant, Humanoid),
locomotion velocity (A1, AnymalB/C/D, Cassie, G1, Go1/2, H1), and
manipulation reach (Franka, UR10). Include task-specific MDP terms
for humanoid observations/rewards and locomotion rewards/terminations/
curriculums.
Add parity test infrastructure and tests organized by MDP category:
observations, rewards, terminations, events, and actions. Tests verify
warp-first implementations match stable torch baselines across three
execution modes (stable, warp uncaptured, warp CUDA-graph captured).

Extract shared fixtures and mock objects to parity_helpers.py.
Fix accidental revert of WarpGraphCache abstraction from stale
cherry-pick. Restore register_manager_capturability and runtime
_max_modes cap in get_mode_for_manager so managers can dynamically
downgrade from captured mode during term preparation.
Replace scattered hasattr guards with a single _is_warmed_up flag per
MDP term function. Fix parity test mocks: GRAVITY_VEC_W as 1D array,
add missing mask-variant write stubs and wrench composer mock. Add
unit tests for WarpGraphCache and ManagerCallSwitch.
Enable rough terrain locomotion for Anymal-D on Newton/Warp with the
Newton collision pipeline. Tested at 4096 envs for 500 iterations with
zero NaN, reward 10.71, episode length 824.

- Add Base/Rough/Flat env config hierarchy with Newton solver settings
- Add reset_root_state_from_terrain warp kernel for terrain-aware spawn
- Fix zero contact margin from USD import causing CCD divergence
- Increase CollisionPipeline max_triangle_pairs for mesh terrain
- Use pyramidal friction cone for rough terrain solver stability
- Add warm-up before CUDA graph capture in ManagerCallSwitch
- Bump mujoco/mujoco-warp to 3.6.0, pin Newton to 141baffff9
Add RoughPhysicsCfg preset with Newton solver settings tuned for mesh
terrain: pyramidal friction cone, increased buffer sizes, and Newton
collision pipeline. Disable RayCaster height scanner for Newton as it
requires Kit. Tested at 4096 envs for 300 iterations with presets=newton,
reward 11.98, episode length 966.
@hujc7 hujc7 force-pushed the dev-velocity-rouge branch from 7589193 to 4f4e142 Compare March 30, 2026 23:58
@hujc7
Copy link
Copy Markdown
Author

hujc7 commented Mar 31, 2026

@greptileai Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 31, 2026

Greptile Summary

This PR adds an experimental Anymal-D rough terrain locomotion environment running on Newton/MuJoCo Warp, along with the supporting Warp-first infrastructure (managers, MDP terms, CUDA graph capture, warm-up run) and several Newton-level workarounds for mesh-terrain stability issues (contact margin fix, increased buffer sizes, pyramidal-cone solver). It also bumps mujoco/mujoco-warp to 3.6.0 and pins Newton to a specific commit, and extends all major RL framework wrappers (RSL-RL, RL-Games, SB3, SKRL) to accept the new ManagerBasedRLEnvWarp family.

Key changes:

  • NewtonManager.start_simulation() now enforces a minimum 0.01 contact margin on all zero-margin shapes before finalize(), and _initialize_contacts() sets max_triangle_pairs=2_000_000 for mesh terrain — both are essential workarounds confirmed by ablation.
  • WarpGraphCache adds an eager warm-up run before CUDA graph capture so that first-call allocations (scratch buffers, dtype-conversion caches) happen outside the capture context.
  • reset_root_state_from_terrain is a new Warp kernel that samples spawn positions from pre-computed flat terrain patches; however, the cached _terrain_levels_wp / _terrain_types_wp buffers are only populated at warm-up time and never updated when the terrain curriculum changes level assignments, meaning spawning does not actually follow the curriculum (see inline comment).
  • apply_external_force_torque applies forces to all robot bodies regardless of the body_names field in asset_cfg, diverging from the stable implementation. This is latent (current configs use zero ranges) but would be incorrect with non-zero perturbation ranges.
  • Event function state is stored at function-object level (fn._is_warmed_up, scratch buffers), which is not safe if more than one environment instance exists in a process.

Confidence Score: 4/5

  • Safe to merge as an experimental module, but the terrain-curriculum correctness issue should be addressed or explicitly documented before wider adoption.
  • One P1 finding: the terrain level/type warp buffers in reset_root_state_from_terrain are cached at initialization and never refreshed, so the terrain curriculum has no effect on spawn positions after the first warm-up — contradicting the PR's stated goal of enabling curriculum progression. This is worth fixing or at minimum clearly documenting. The remaining findings are P2 style/design issues (body-name filtering, function-level state, commit-hash pin) that do not block experimental use.
  • source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py — terrain level caching and body-name filtering bugs.

Important Files Changed

Filename Overview
source/isaaclab_experimental/isaaclab_experimental/envs/mdp/events.py New Warp-first event functions for locomotion training; contains two bugs: (1) terrain levels are cached at warm-up and never refreshed from curriculum updates, undermining terrain-aware spawning; (2) apply_external_force_torque ignores body_names and affects all bodies.
source/isaaclab_experimental/isaaclab_experimental/utils/warp_graph_cache.py Adds warm-up run before CUDA graph capture; design is correct — warm-up flushes first-call allocations so only steady-state kernel launches are captured.
source/isaaclab_newton/isaaclab_newton/physics/newton_manager.py Adds zero-contact-margin fix before finalize() and increases max_triangle_pairs to 2M for mesh terrain; logic is correct and well-documented with upstream issue references.
source/isaaclab_experimental/isaaclab_experimental/utils/manager_call_switch.py New manager routing utility that dispatches stable/warp/captured execution paths; logic is sound with proper max-mode capping, JSON config loading, and dynamic capturability registration.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/locomotion/velocity/velocity_env_cfg.py New shared base env config with rough/flat subclasses; correctly overrides contact solver parameters for mesh terrain and wires up flat-patch sampling for terrain-aware spawning.
source/isaaclab_tasks/isaaclab_tasks/manager_based/locomotion/velocity/config/anymal_d/rough_env_cfg.py Adds RoughPhysicsCfg Newton preset with pyramidal cone, correct buffer sizes, and Newton collision pipeline; well-commented with upstream issue references.
source/isaaclab_newton/setup.py Bumps mujoco/mujoco-warp to 3.6.0 and pins newton to an unreleased commit SHA; version bumps are well-motivated but the commit-hash pin is not semantically versioned.
source/isaaclab_tasks_experimental/isaaclab_tasks_experimental/manager_based/locomotion/velocity/config/anymal_d/rough_env_cfg.py Minimal experimental Anymal-D rough env config; correctly inherits from LocomotionVelocityRoughEnvCfg and caps Scene manager to WARP_NOT_CAPTURED.

Sequence Diagram

sequenceDiagram
    participant RL as RL Training Loop
    participant Env as ManagerBasedRLEnvWarp
    participant MCS as ManagerCallSwitch
    participant WGC as WarpGraphCache
    participant EM as EventManager (Warp)
    participant Kernel as CUDA Kernel
    participant Terrain as TerrainImporter

    RL->>Env: step(actions)
    Env->>MCS: call_stage("EventManager_reset", ...)
    MCS->>MCS: get_mode_for_manager → WARP_CAPTURED (2)
    MCS->>WGC: capture_or_replay("EventManager_reset", fn)

    alt First call (warm-up + capture)
        WGC->>EM: fn() [warm-up — eager]
        EM->>Terrain: read terrain_levels (int64→int32 copy)
        EM->>Kernel: wp.launch reset_root_state_from_terrain
        WGC->>EM: fn() [capture — inside ScopedCapture]
        EM->>Kernel: wp.launch (recorded into CUDA graph)
        WGC->>WGC: cache graph + result
    else Subsequent calls
        WGC->>WGC: wp.capture_launch(cached_graph)
        Note over WGC,Terrain: ⚠ terrain_levels_wp NOT updated<br/>Curriculum changes to terrain.terrain_levels<br/>are invisible to the captured kernel
    end

    RL->>Env: curriculum update
    Env->>Terrain: update_env_origins(env_ids, move_up, move_down)
    Terrain->>Terrain: terrain_levels[env_ids] updated in-place
    Note over Terrain,Kernel: Warp buffers still hold stale initial values
Loading

Reviews (1): Last reviewed commit: "Standardize _is_warmed_up flag for reset..." | Re-trigger Greptile

Comment on lines +698 to +723
reset_root_state_from_terrain._terrain_levels_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device)
reset_root_state_from_terrain._terrain_types_wp = wp.zeros(n_envs, dtype=wp.int32, device=env.device)
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32, copy=True).contiguous()
types_i32 = terrain.terrain_types.to(dtype=torch.int32, copy=True).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp, wp.from_torch(types_i32, dtype=wp.int32))
reset_root_state_from_terrain._num_patches = valid_positions.shape[2]

# Scratch buffers
reset_root_state_from_terrain._scratch_pose = wp.zeros((env.num_envs,), dtype=wp.transformf, device=env.device)
reset_root_state_from_terrain._scratch_vel = wp.zeros(
(env.num_envs,), dtype=wp.spatial_vectorf, device=env.device
)

# Pre-parse orientation range (roll, pitch, yaw)
r = [pose_range.get(key, (0.0, 0.0)) for key in ["roll", "pitch", "yaw"]]
reset_root_state_from_terrain._rot_lo = wp.vec3f(r[0][0], r[1][0], r[2][0])
reset_root_state_from_terrain._rot_hi = wp.vec3f(r[0][1], r[1][1], r[2][1])

# Pre-parse velocity range
v = [velocity_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z", "roll", "pitch", "yaw"]]
reset_root_state_from_terrain._vel_lin_lo = wp.vec3f(v[0][0], v[1][0], v[2][0])
reset_root_state_from_terrain._vel_lin_hi = wp.vec3f(v[0][1], v[1][1], v[2][1])
reset_root_state_from_terrain._vel_ang_lo = wp.vec3f(v[3][0], v[4][0], v[5][0])
reset_root_state_from_terrain._vel_ang_hi = wp.vec3f(v[3][1], v[4][1], v[5][1])
reset_root_state_from_terrain._is_warmed_up = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Terrain levels cached once, never updated by curriculum

_terrain_levels_wp and _terrain_types_wp are initialized from terrain.terrain_levels / terrain.terrain_types during the first warm-up call and then the _is_warmed_up guard prevents any subsequent re-initialization. The terrain curriculum (terrain_levels_vel) regularly calls terrain.update_env_origins(env_ids, move_up, move_down) which updates the torch tensors in-place, but the warp buffers are separate GPU allocations (created via to(dtype=torch.int32, copy=True)) and are never refreshed afterwards.

The result is that reset_root_state_from_terrain always spawns each robot at its initial terrain level assignment, regardless of how the curriculum has since promoted or demoted that environment. The PR's stated goal — "correctly spawns on the assigned level, enabling curriculum progression" — is not achieved: the apparent "curriculum level 4.51" in the training results reflects the initial distribution of terrain level assignments, not dynamic curriculum progression.

A straightforward fix would be to update _terrain_levels_wp / _terrain_types_wp unconditionally on every call (outside the capture region, if CUDA graphs are in use). Since the torch tensors are already on device, a wp.copy of the int32-cast view is cheap and can be done before launching the kernel:

# After _is_warmed_up block, before wp.launch:
levels_i32 = terrain.terrain_levels.to(dtype=torch.int32).contiguous()
types_i32  = terrain.terrain_types.to(dtype=torch.int32).contiguous()
wp.copy(reset_root_state_from_terrain._terrain_levels_wp, wp.from_torch(levels_i32, dtype=wp.int32))
wp.copy(reset_root_state_from_terrain._terrain_types_wp,  wp.from_torch(types_i32,  dtype=wp.int32))

If this function runs inside a CUDA graph, the copy would need to happen in a non-captured scope (the warm-up or a pre-step hook).

Comment on lines +159 to +188
if not getattr(apply_external_force_torque, "_is_warmed_up", False):
apply_external_force_torque._scratch_forces = wp.zeros(
(env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device
)
apply_external_force_torque._scratch_torques = wp.zeros(
(env.num_envs, asset.num_bodies), dtype=wp.vec3f, device=env.device
)
apply_external_force_torque._is_warmed_up = True

wp.launch(
kernel=_apply_external_force_torque_kernel,
dim=env.num_envs,
inputs=[
env_mask,
env.rng_state_wp,
apply_external_force_torque._scratch_forces,
apply_external_force_torque._scratch_torques,
force_range[0],
force_range[1],
torque_range[0],
torque_range[1],
],
device=env.device,
)

asset.permanent_wrench_composer.set_forces_and_torques_mask(
forces=apply_external_force_torque._scratch_forces,
torques=apply_external_force_torque._scratch_torques,
env_mask=env_mask,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 apply_external_force_torque ignores configured body names

The stable isaaclab.envs.mdp.events.apply_external_force_torque uses asset_cfg.body_ids to restrict which bodies receive forces. This warp version allocates scratch buffers sized (num_envs, asset.num_bodies) and the kernel iterates for b in range(force_out.shape[1]), applying random forces to all bodies regardless of asset_cfg.body_names.

This is currently harmless in the locomotion configs because force_range=(0.0, 0.0) and torque_range=(-0.0, 0.0) produce zero forces for all bodies. However, if a downstream config sets non-zero ranges (e.g. to randomize perturbation on the base only), the behavior would silently diverge from the stable implementation — applying forces to every link instead of just the configured ones.

Consider adding a body_ids parameter (similar to joint_ids in reset_joints_by_offset) so the kernel can be restricted to the configured body subset.

Comment on lines +81 to +86
if not getattr(fn, "_is_warmed_up", False) or fn._asset_name != asset_cfg.name:
fn._asset_name = asset_cfg.name
r = [com_range.get(key, (0.0, 0.0)) for key in ["x", "y", "z"]]
fn._com_lo = wp.vec3f(r[0][0], r[1][0], r[2][0])
fn._com_hi = wp.vec3f(r[0][1], r[1][1], r[2][1])
fn._is_warmed_up = True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Function-level state is shared across all callers of the same function

All event functions in this file store their "warm-up" state as attributes directly on the function object (e.g. fn._is_warmed_up, fn._scratch_forces, fn._terrain_levels_wp). This is not instance-scoped: if two environments are created in the same process, or if the same function is registered to two different event terms with different asset configs / num_envs, the second caller will silently reuse stale/wrong-sized buffers from the first.

For example, randomize_rigid_body_com checks fn._asset_name != asset_cfg.name to partially guard against this, but that guard does not cover num_envs changes. The other functions (apply_external_force_torque, push_by_setting_velocity, reset_root_state_from_terrain, etc.) do not have even this partial check.

This design is reasonable for the single-env experimental use-case described in this PR, but it would be worth adding a comment explicitly documenting the single-environment assumption to prevent surprising failures in future multi-env or serialized-training scenarios.

"mujoco-warp==3.6.0",
"PyOpenGL-accelerate==3.1.10",
"newton==1.0.0",
"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Newton pinned to an unreleased commit hash

"newton @ git+https://github.com/newton-physics/newton.git@141baffff9",

Pinning to a raw, unreleased commit SHA is fragile: the commit is not immutable in the same way a signed tag or package release would be — history can be force-pushed, the SHA is not semantically versioned, and it is difficult to audit what changed between revisions. It also makes the dependency invisible to security-scanning tooling that knows about released packages.

Consider whether Newton will publish a versioned release that includes 141baffff9 so the dependency can be expressed as a proper version pin (e.g., newton>=x.y.z), or at minimum document the expected upstream release that will supersede this commit in a code comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant