Summary
Action-chunking policies (e.g., GR00T) output a chunk of 16-32 actions per inference call. The policy replays this chunk over subsequent env.step() calls, but only the first step of each chunk needs a fresh camera frame -- all intermediate steps discard the rendered result. Today, Isaac Lab renders on every env.step() regardless, wasting significant GPU time.
This PR is a proof-of-concept demonstrating two optimizations that eliminate unnecessary rendering and significantly improve throughput. The implementation works but relies on a workaround (mutating cfg.sim.render_interval at runtime). We propose a cleaner long-term API change to Isaac Lab's env.step().
Branch
hkang/render-opt (based on latest main)
Benchmark Results
Hardware: NVIDIA L20 (48 GB), single GPU, remote policy (GR00T server + Isaac Sim client).
Tested on L20 with ActionChunkingClientSidePolicy + GR00T remote server, gr1_open_microwave (chunk_length=16), 8 envs, 100 steps:
- With render (inference step): ~4-6 step/s -- this is the step where
needs_obs_next_step() returns True, so render_interval is restored to normal and env.step() renders a camera frame for the next inference
- Without render (chunk-replay step): ~9.5 step/s -- this is the step where the policy is consuming buffered actions from the chunk,
needs_obs_next_step() returns False, so render_interval is set to a huge value and env.step() skips rendering entirely
The ~2x speed difference confirms that render-skipping is working. For a chunk_length of 16, only 1 out of every 16 steps needs to render, so the majority of steps run at the faster rate.
How to Reproduce
Prerequisites: Two Docker containers on the same GPU node -- one for the GR00T policy server, one for the Isaac Sim client.
1. Start GR00T policy server
# Build & run the server container (from repo root)
bash docker/run_gr00t_server.sh \
-m /path/to/models \
--port 5555 \
--policy_type isaaclab_arena_gr00t.policy.gr00t_remote_policy.Gr00tRemoteServerSidePolicy \
--policy_config_yaml_path isaaclab_arena_gr00t/policy/config/gr1_manip_gr00t_closedloop_config.yaml
Wait until you see: [PolicyServer] listening on tcp://0.0.0.0:5555
2. Run the client (Isaac Sim container)
# Inside the isaaclab_arena Docker container
/isaac-sim/python.sh -m isaaclab_arena.evaluation.policy_runner \
--headless \
--enable_cameras \
--policy_type isaaclab_arena.policy.action_chunking_client.ActionChunkingClientSidePolicy \
--remote_host $(hostname) \
--remote_port 5555 \
--num_envs 8 \
--num_steps 100 \
gr1_open_microwave
You should see a repeating throughput pattern every 16 steps (= chunk_length): the first step is slow (~4-6 step/s) because needs_obs_next_step() returned True on the previous step, so this env.step() renders a camera frame. The remaining 15 steps are fast (~9-10 step/s) because the policy is replaying buffered actions and needs_obs_next_step() returns False, causing env.step() to skip rendering.
3. Compare with baseline
To see the baseline (without render optimization), revert the render_interval change in isaaclab_arena_manager_based_env.py and remove the render-skip logic in policy_runner.py.
What Changed and Why It's a Workaround
This PR modifies 5 files (42 lines added):
Optimization 1: Render once per env.step() instead of twice
In isaaclab_arena_manager_based_env.py, we set render_interval = decimation in __post_init__(). With default settings (decimation=4, render_interval=2), Isaac Lab renders twice per env.step(), but only the final frame is consumed by observation_manager.compute(). This change reduces it to 1 render per step. This is clean and correct.
Optimization 2: Skip rendering when the policy doesn't need observations
This is the workaround part. We add PolicyBase.needs_obs_next_step() -> bool so action-chunking policies can signal "I'm replaying buffered actions, don't need a fresh camera frame." In policy_runner.py, we toggle cfg.sim.render_interval at runtime:
# Current workaround: mutate config before each env.step()
_NO_RENDER = 2**31 - 1
if not policy.needs_obs_next_step():
unwrapped.cfg.sim.render_interval = _NO_RENDER # skip render
else:
unwrapped.cfg.sim.render_interval = _render_interval # restore
obs, _, terminated, truncated, _ = env.step(actions)
Why this is ugly:
- We mutate
cfg.sim.render_interval (a config value) at runtime as a side-channel to control rendering behavior
- This only works because
render_interval happens to be read inside the physics loop as a modulo condition -- it's an implementation detail, not an API contract
- If Isaac Lab changes how
render_interval is used internally, this breaks silently
Proposed Clean Solution (Requires Isaac Lab Change)
The right fix is for Isaac Lab's ManagerBasedRLEnv.step() to accept a render parameter:
# Proposed Isaac Lab API
def step(self, action, render: bool = True):
is_rendering = render and (self.sim.has_gui() or self.sim.has_rtx_sensors())
...
Then the rollout loop becomes clean and explicit:
actions = policy.get_action(env, obs)
obs, _, terminated, truncated, _ = env.step(
actions,
render=policy.needs_obs_next_step(),
)
This would:
- Eliminate runtime config mutation
- Make the semantics explicit: the caller decides whether to render
- Leave existing reset re-rendering (
num_rerenders_on_reset) unchanged
Headless vs. Non-Headless Consideration
Developers should consider whether render-skipping should only apply in headless mode. In non-headless (GUI) mode, skipping renders would freeze the viewport on most steps, making the simulation appear broken. A possible guard:
# Only skip renders when running headless
if not policy.needs_obs_next_step() and not env.sim.has_gui():
# skip rendering
This ensures:
- Headless evaluation/benchmarking gets the full performance benefit
- Interactive/GUI sessions always render for visual feedback
internal analysis documentation
perf doc
Summary
Action-chunking policies (e.g., GR00T) output a chunk of 16-32 actions per inference call. The policy replays this chunk over subsequent
env.step()calls, but only the first step of each chunk needs a fresh camera frame -- all intermediate steps discard the rendered result. Today, Isaac Lab renders on everyenv.step()regardless, wasting significant GPU time.This PR is a proof-of-concept demonstrating two optimizations that eliminate unnecessary rendering and significantly improve throughput. The implementation works but relies on a workaround (mutating
cfg.sim.render_intervalat runtime). We propose a cleaner long-term API change to Isaac Lab'senv.step().Branch
hkang/render-opt(based on latestmain)Benchmark Results
Hardware: NVIDIA L20 (48 GB), single GPU, remote policy (GR00T server + Isaac Sim client).
Tested on L20 with
ActionChunkingClientSidePolicy+ GR00T remote server,gr1_open_microwave(chunk_length=16), 8 envs, 100 steps:needs_obs_next_step()returnsTrue, sorender_intervalis restored to normal andenv.step()renders a camera frame for the next inferenceneeds_obs_next_step()returnsFalse, sorender_intervalis set to a huge value andenv.step()skips rendering entirelyThe ~2x speed difference confirms that render-skipping is working. For a chunk_length of 16, only 1 out of every 16 steps needs to render, so the majority of steps run at the faster rate.
How to Reproduce
Prerequisites: Two Docker containers on the same GPU node -- one for the GR00T policy server, one for the Isaac Sim client.
1. Start GR00T policy server
# Build & run the server container (from repo root) bash docker/run_gr00t_server.sh \ -m /path/to/models \ --port 5555 \ --policy_type isaaclab_arena_gr00t.policy.gr00t_remote_policy.Gr00tRemoteServerSidePolicy \ --policy_config_yaml_path isaaclab_arena_gr00t/policy/config/gr1_manip_gr00t_closedloop_config.yamlWait until you see:
[PolicyServer] listening on tcp://0.0.0.0:55552. Run the client (Isaac Sim container)
You should see a repeating throughput pattern every 16 steps (= chunk_length): the first step is slow (~4-6 step/s) because
needs_obs_next_step()returnedTrueon the previous step, so thisenv.step()renders a camera frame. The remaining 15 steps are fast (~9-10 step/s) because the policy is replaying buffered actions andneeds_obs_next_step()returnsFalse, causingenv.step()to skip rendering.3. Compare with baseline
To see the baseline (without render optimization), revert the
render_intervalchange inisaaclab_arena_manager_based_env.pyand remove the render-skip logic inpolicy_runner.py.What Changed and Why It's a Workaround
This PR modifies 5 files (42 lines added):
Optimization 1: Render once per
env.step()instead of twiceIn
isaaclab_arena_manager_based_env.py, we setrender_interval = decimationin__post_init__(). With default settings (decimation=4,render_interval=2), Isaac Lab renders twice perenv.step(), but only the final frame is consumed byobservation_manager.compute(). This change reduces it to 1 render per step. This is clean and correct.Optimization 2: Skip rendering when the policy doesn't need observations
This is the workaround part. We add
PolicyBase.needs_obs_next_step() -> boolso action-chunking policies can signal "I'm replaying buffered actions, don't need a fresh camera frame." Inpolicy_runner.py, we togglecfg.sim.render_intervalat runtime:Why this is ugly:
cfg.sim.render_interval(a config value) at runtime as a side-channel to control rendering behaviorrender_intervalhappens to be read inside the physics loop as a modulo condition -- it's an implementation detail, not an API contractrender_intervalis used internally, this breaks silentlyProposed Clean Solution (Requires Isaac Lab Change)
The right fix is for Isaac Lab's
ManagerBasedRLEnv.step()to accept arenderparameter:Then the rollout loop becomes clean and explicit:
This would:
num_rerenders_on_reset) unchangedHeadless vs. Non-Headless Consideration
Developers should consider whether render-skipping should only apply in headless mode. In non-headless (GUI) mode, skipping renders would freeze the viewport on most steps, making the simulation appear broken. A possible guard:
This ensures:
internal analysis documentation
perf doc