Skip redundant GPU rendering for action-chunking policies

### Summary

Action-chunking policies (e.g., GR00T) output a chunk of 16-32 actions per inference call. The policy replays this chunk over subsequent `env.step()` calls, but **only the first step of each chunk needs a fresh camera frame** -- all intermediate steps discard the rendered result. Today, Isaac Lab renders on every `env.step()` regardless, wasting significant GPU time.

This PR is a **proof-of-concept** demonstrating two optimizations that eliminate unnecessary rendering and significantly improve throughput. The implementation works but relies on a workaround (mutating `cfg.sim.render_interval` at runtime). We propose a cleaner long-term API change to Isaac Lab's `env.step()`.

### Branch

`hkang/render-opt` (based on latest `main`)

### Benchmark Results

**Hardware:** NVIDIA L20 (48 GB), single GPU, remote policy (GR00T server + Isaac Sim client).

Tested on L20 with `ActionChunkingClientSidePolicy` + GR00T remote server, `gr1_open_microwave` (chunk_length=16), 8 envs, 100 steps:

- **With render (inference step):** ~4-6 step/s -- this is the step where `needs_obs_next_step()` returns `True`, so `render_interval` is restored to normal and `env.step()` renders a camera frame for the next inference
- **Without render (chunk-replay step):** ~9.5 step/s -- this is the step where the policy is consuming buffered actions from the chunk, `needs_obs_next_step()` returns `False`, so `render_interval` is set to a huge value and `env.step()` skips rendering entirely

The ~2x speed difference confirms that render-skipping is working. For a chunk_length of 16, only 1 out of every 16 steps needs to render, so the majority of steps run at the faster rate.

### How to Reproduce

**Prerequisites:** Two Docker containers on the same GPU node -- one for the GR00T policy server, one for the Isaac Sim client.

**1. Start GR00T policy server**

```bash
# Build & run the server container (from repo root)
bash docker/run_gr00t_server.sh \
  -m /path/to/models \
  --port 5555 \
  --policy_type isaaclab_arena_gr00t.policy.gr00t_remote_policy.Gr00tRemoteServerSidePolicy \
  --policy_config_yaml_path isaaclab_arena_gr00t/policy/config/gr1_manip_gr00t_closedloop_config.yaml
```

Wait until you see: `[PolicyServer] listening on tcp://0.0.0.0:5555`

**2. Run the client (Isaac Sim container)**

```bash
# Inside the isaaclab_arena Docker container
/isaac-sim/python.sh -m isaaclab_arena.evaluation.policy_runner \
  --headless \
  --enable_cameras \
  --policy_type isaaclab_arena.policy.action_chunking_client.ActionChunkingClientSidePolicy \
  --remote_host $(hostname) \
  --remote_port 5555 \
  --num_envs 8 \
  --num_steps 100 \
  gr1_open_microwave
```

You should see a repeating throughput pattern every 16 steps (= chunk_length): the first step is slow (~4-6 step/s) because `needs_obs_next_step()` returned `True` on the previous step, so this `env.step()` renders a camera frame. The remaining 15 steps are fast (~9-10 step/s) because the policy is replaying buffered actions and `needs_obs_next_step()` returns `False`, causing `env.step()` to skip rendering.

**3. Compare with baseline**

To see the baseline (without render optimization), revert the `render_interval` change in `isaaclab_arena_manager_based_env.py` and remove the render-skip logic in `policy_runner.py`.

### What Changed and Why It's a Workaround

This PR modifies 5 files (42 lines added):

**Optimization 1: Render once per `env.step()` instead of twice**

In `isaaclab_arena_manager_based_env.py`, we set `render_interval = decimation` in `__post_init__()`. With default settings (`decimation=4`, `render_interval=2`), Isaac Lab renders twice per `env.step()`, but only the final frame is consumed by `observation_manager.compute()`. This change reduces it to 1 render per step. This is clean and correct.

**Optimization 2: Skip rendering when the policy doesn't need observations**

This is the workaround part. We add `PolicyBase.needs_obs_next_step() -> bool` so action-chunking policies can signal "I'm replaying buffered actions, don't need a fresh camera frame." In `policy_runner.py`, we toggle `cfg.sim.render_interval` at runtime:

```python
# Current workaround: mutate config before each env.step()
_NO_RENDER = 2**31 - 1

if not policy.needs_obs_next_step():
    unwrapped.cfg.sim.render_interval = _NO_RENDER  # skip render
else:
    unwrapped.cfg.sim.render_interval = _render_interval  # restore

obs, _, terminated, truncated, _ = env.step(actions)
```

**Why this is ugly:**
- We mutate `cfg.sim.render_interval` (a config value) at runtime as a side-channel to control rendering behavior
- This only works because `render_interval` happens to be read inside the physics loop as a modulo condition -- it's an implementation detail, not an API contract
- If Isaac Lab changes how `render_interval` is used internally, this breaks silently

### Proposed Clean Solution (Requires Isaac Lab Change)

The right fix is for Isaac Lab's `ManagerBasedRLEnv.step()` to accept a `render` parameter:

```python
# Proposed Isaac Lab API
def step(self, action, render: bool = True):
    is_rendering = render and (self.sim.has_gui() or self.sim.has_rtx_sensors())
    ...
```

Then the rollout loop becomes clean and explicit:

```python
actions = policy.get_action(env, obs)
obs, _, terminated, truncated, _ = env.step(
    actions,
    render=policy.needs_obs_next_step(),
)
```

This would:
- Eliminate runtime config mutation
- Make the semantics explicit: the caller decides whether to render
- Leave existing reset re-rendering (`num_rerenders_on_reset`) unchanged

### Headless vs. Non-Headless Consideration

Developers should consider whether render-skipping should **only apply in headless mode**. In non-headless (GUI) mode, skipping renders would freeze the viewport on most steps, making the simulation appear broken. A possible guard:

```python
# Only skip renders when running headless
if not policy.needs_obs_next_step() and not env.sim.has_gui():
    # skip rendering
```

This ensures:
- **Headless evaluation/benchmarking** gets the full performance benefit
- **Interactive/GUI sessions** always render for visual feedback

### internal analysis documentation

[perf doc](https://docs.google.com/document/d/1Gzrbc58cLRfSIr_GT2Lh5ctpBG2DikQEmtKhjNto-vk/edit?tab=t.0#heading=h.vmyba9s17t9j
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip redundant GPU rendering for action-chunking policies #521

Summary

Branch

Benchmark Results

How to Reproduce

What Changed and Why It's a Workaround

Proposed Clean Solution (Requires Isaac Lab Change)

Headless vs. Non-Headless Consideration

internal analysis documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Skip redundant GPU rendering for action-chunking policies #521

Description

Summary

Branch

Benchmark Results

How to Reproduce

What Changed and Why It's a Workaround

Proposed Clean Solution (Requires Isaac Lab Change)

Headless vs. Non-Headless Consideration

internal analysis documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions