Conversation
Add a Flag to build_ocean so Raylib can work on Debian 11
…gn choices, like using a subprocess or not, separate eval from rendering...
…ocessing. Next steps: - Add multiprocessing - Save logs in a nice csv (with nice filenames and stuff) - Add a nice looking tqdm - Add support to run this during training - Add support for Gigaflow and eventually WOSAC
Next steps: - Make some tests on the cluster - Add a nice tqdm integration - log the results to a csv
f73240f to
6df1be9
Compare
Greptile SummaryThis PR implements multiprocessed evaluation for WOMD (Waymo Open Motion Dataset) by distributing map batches across workers. The changes add eval mode detection via Major changes:
Critical issues found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[eval_womd starts] --> B[Distribute maps across workers]
B --> C{num_workers > num_maps?}
C -->|Yes| ERR[Raise error]
C -->|No| D[Create worker envs with map ranges]
D --> E[Initialize vecenv and policy]
E --> F{maps_processed < num_maps?}
F -->|Yes| G[Reset RNN state if needed]
G --> H[Run episode_length timesteps]
H --> I[Collect info from workers]
I --> J{info_list has results?}
J -->|Yes| K[Process results, increment maps_processed]
J -->|No| L{All workers finished?}
L -->|Yes| M[INFINITE LOOP BUG]
L -->|No| F
K --> F
F -->|No| N[Close vecenv]
N --> O[Save results to CSV]
O --> P[Print average metrics]
subgraph Worker Lifecycle
W1[Worker loads batch] --> W2{eval_map_counter >= eval_last_map?}
W2 -->|Yes| W3[Set resample_freq=0, stop processing]
W2 -->|No| W4[Process maps]
W4 --> W5{Resample needed?}
W5 -->|Yes| W6[Load next batch]
W6 --> W1
W5 -->|No| W4
end
style M fill:#f99
style ERR fill:#f99
Last reviewed commit: b1e9707 |
| while (use_all_maps ? map_idx < max_envs : total_agent_count < num_agents && env_count < max_envs) { | ||
| int map_id = use_all_maps ? map_idx++ : rand() % num_maps; | ||
| while (total_agent_count < num_agents && env_count < max_envs) { | ||
| int map_id = (eval_mode ? map_idx++ : rand()) % num_maps; |
There was a problem hiding this comment.
modulo operation causes map index wrapping, potentially reprocessing maps
when map_idx exceeds num_maps, the modulo wraps it back to 0, which could cause the same maps to be evaluated multiple times if fewer than expected environments are created in earlier batches
if a worker is assigned maps 0-99 but only creates 95 envs in the first batch, it will start at map 95 next time, then wrap to maps 0-89, causing duplication
There was a problem hiding this comment.
It prevents the code from crashing if a map is skipped, but it will also be useful in the CARLA setting where we want to sample the same 5 maps multiple times.
Once again no maps should be skipped if we initialize the SDC first
| // In a case a map has more than num_agents the eval will break | ||
| // I don't put an assert here, but I think in a coming PR we should set MAX_AGENTS directly in python | ||
| // And we will add an assert to avoid this. | ||
| if (uncomplete_last_env) { | ||
| break; | ||
| } |
There was a problem hiding this comment.
unhandled edge case: map with more agents than num_agents will break eval
the comment acknowledges this but doesn't add error handling - consider validating or asserting that no single map exceeds num_agents
| // If we have a SDC index (WOMD), initialize it first: | ||
| int sdc_index = env->sdc_track_index; | ||
| if (sdc_index >= 0) { | ||
| active_agent_indices[0] = sdc_index; | ||
| env->num_created_agents++; | ||
| env->active_agent_count++; | ||
| env->agents[sdc_index].active_agent = 1; | ||
| } |
There was a problem hiding this comment.
initialized SDC (self-driving car) as first active agent to ensure consistent agent ordering for WOMD evaluation
| if self.eval_mode and self.resample_frequency == 0 and self.tick >= self.episode_length: | ||
| return (self.observations, self.rewards, self.terminals, self.truncations, []) |
There was a problem hiding this comment.
skipped further processing when eval worker completes all assigned maps, preventing unnecessary computation
Additional Comments (1)
same issue as eval mode path - should check before division |
pufferlib/config/ocean/drive.ini
Outdated
| num_maps = 20 | ||
| num_agents = 512 | ||
| num_maps=10000 | ||
| eval_batch_size = 128 |
There was a problem hiding this comment.
maybe we should pull out eval configs into a separate struct or config?
There was a problem hiding this comment.
Ideally we should have multiple config structures somewhere so we can easily switch between:
- eval in womd (self-play, log-replay, mix-play)
- eval in gigaflow
- wosac eval
- ...
I didn't think about a way to do it that isn't messy, I think we should like think about how do we want to structure the code in general, so I let things very basic in the PR
Introducing multiprocessed eval that enables to eval the model on 10k maps very fast.
For now I put it in a separate eval function called eval_womd, but it should later replace the eval function (I think we should have a separated eval and render function btw)
What it does:
Then each worker will process the 625 maps per chunk ofeval_batch_size(let's say 128). So each worker will instantiate an env of 128 maps 4 times followed by an env of 113 maps.eval_batch_sizeas it was actually useless. Now you just fix a value fornum_agentslike you would do in training, and we put as much maps as possible inside the env, making sure that no map is cropped.I tried to handle every edge case gracefully and tested on a lot of configs.
At the end the command:
puffer eval_womd puffer_drive --eval.num-maps=10000Prints a summary of the metrics and logs every map's results in a csv file, allowing you to identify which map you failed, or to compute more advanced statistics like boxplots.
IMPORTANT THINGS TO READ:
result.csvbut we should think about a proper filename and path for this