Modifying score metric for segments of driving#319
Modifying score metric for segments of driving#319
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates Drive episode scoring/completion metrics to evaluate performance per driving segment/goal, including a heuristic for the final (possibly incomplete) goal segment, and switches completion_rate to be based on “attempted” goals rather than “sampled” goals.
Changes:
- Replace
completion_ratedenominator fromgoals_sampled_this_episodetogoals_attempted_this_episode. - Add goal-segment scoring accumulation and a last-segment heuristic to award partial credit.
- Track additional per-agent state needed for last-segment evaluation (displacement since last goal, last-goal timestep, previous goal position).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pufferlib/ocean/env_binding.h | Updates aggregated completion_rate computation to use attempted goals. |
| pufferlib/ocean/drive/drive.h | Implements attempted-goal counting, per-segment score logic, and last-goal heuristics; updates log aggregation accordingly. |
| pufferlib/ocean/drive/datatypes.h | Extends Agent with new fields to support attempted-goal accounting and last-segment heuristics. |
Comments suppressed due to low confidence (4)
pufferlib/ocean/drive/drive.h:1354
agent->goals_attempted_this_episodecan be 0 here (e.g., GOAL_GENERATE_NEW, no goals reached, no collision, andtime_since_last_goal < MIN_GOAL_SEGMENT_TIME_TO_ANALYZE_AGENT), which makes bothfrac_goal_reachedand the later score computation divide by zero. Add a guard (e.g., treat 0 attempts as 0 completion/score, or clamp denominator to >= 1) before doing these divisions.
env->log.goals_reached_this_episode += agent->goals_reached_this_episode;
env->log.goals_sampled_this_episode += agent->goals_sampled_this_episode;
env->log.goals_attempted_this_episode += agent->goals_attempted_this_episode;
int offroad = env->logs[i].offroad_rate;
env->log.offroad_rate += offroad;
int collided = env->logs[i].collision_rate;
env->log.collision_rate += collided;
float offroad_per_agent = env->logs[i].offroad_per_agent;
env->log.offroad_per_agent += offroad_per_agent;
float collisions_per_agent = env->logs[i].collisions_per_agent;
pufferlib/ocean/drive/drive.h:2486
last_goal_reached_timestepis initialized to 0 on reset, butenv->timestepis set toenv->init_steps. This makestime_since_last_goal = (env->timestep - last_goal_reached_timestep) * dtincorrectly include the pre-episode offset and can skew the last-segment heuristics/attempt counting. Initializelast_goal_reached_timestepto the current timestep at reset (or toenv->init_steps) instead of 0.
checkNeighbors(env, agent->sim_x, agent->sim_y, z_offsets, Z_RANGE * Z_RANGE, &list_size);
if (list_size > 0) {
DepthPoint road_neighbours[list_size];
int valid_count = 0;
pufferlib/ocean/drive/drive.h:1319
- The heuristic awards only
+0.5toenv->logs[i].score, but it also incrementsagent->goals_reached_this_episodeby 1. This makescompletion_rate/frac_goal_reachedtreat the last segment as fully completed even though the score is partial, and it changes the meaning of the "goals_reached" counter. Consider not modifyinggoals_reached_this_episodefor this heuristic (or, if partial completion is intended, update the metric naming/formula so completion_rate remains consistent with the scoring).
agent->goals_attempted_this_episode += 1; // Count as attempt if agent had enough time
float displacement_from_last_goal =
sqrtf((agent->sim_x - agent->prev_goal_x) * (agent->sim_x - agent->prev_goal_x) +
(agent->sim_y - agent->prev_goal_y) * (agent->sim_y - agent->prev_goal_y) +
pufferlib/ocean/drive/datatypes.h:195
- Minor grammar/clarity: the comment reads "goals reached + last goal(if this segment can be judged as an attempt)"; add a space before the parenthesis and consider rephrasing to make it clearer that this is a count of attempted segments (not necessarily sampled goals).
float goals_attempted_this_episode; // goals reached + last goal(if this segment can be judged as an attempt)
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Greptile SummaryThis PR refactors the scoring system to track goal completion on a per-segment basis rather than using episode-level thresholds. For Key changes:
Issues found:
Confidence Score: 2/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
Start[Episode End] --> ComputeMetrics[compute_metrics_for_last_goal_segment]
ComputeMetrics --> CheckMode{goal_behavior?}
CheckMode -->|GOAL_GENERATE_NEW| InitAttempted[goals_attempted = goals_reached]
CheckMode -->|Other modes| SetSampled[goals_attempted = goals_sampled]
InitAttempted --> CheckCollision{collided_before_goal?}
CheckCollision -->|Yes| IncrementBad[goals_attempted += 1<br/>Bad Attempt]
CheckCollision -->|No| CheckTime{time_since_last_goal<br/>>= 1.0s?}
CheckTime -->|No| SkipAnalysis[Skip heuristic analysis]
CheckTime -->|Yes| IncrementAttempt[goals_attempted += 1]
IncrementAttempt --> CheckProgress{displacement > threshold<br/>AND avg_speed > 2.0 m/s?}
CheckProgress -->|Yes| PartialCredit[score += 0.5<br/>goals_reached += 1]
CheckProgress -->|No| NoCredit[No partial credit]
IncrementBad --> AddLog
SkipAnalysis --> AddLog
PartialCredit --> AddLog
NoCredit --> AddLog
SetSampled --> AddLog
AddLog[add_log: Compute final score] --> CalcScore{goal_behavior?}
CalcScore -->|GOAL_RESPAWN/STOP| SimpleScore[score = 1 if reached<br/>without collision]
CalcScore -->|GOAL_GENERATE_NEW| AvgScore[score = total_score /<br/>goals_attempted]
SimpleScore --> End[Aggregate metrics]
AvgScore --> End
Last reviewed commit: b80b8c1 |
| int agent_idx = env->active_agent_indices[i]; | ||
| Agent *agent = &env->agents[agent_idx]; | ||
|
|
||
| env->log.goals_reached_this_episode += agent->goals_reached_this_episode; |
There was a problem hiding this comment.
division by zero when goals_attempted_this_episode is 0 (e.g., very short episodes in GOAL_GENERATE_NEW mode where agent hasn't collided and time < 1s)
| env->log.goals_reached_this_episode += agent->goals_reached_this_episode; | |
| float frac_goal_reached = agent->goals_attempted_this_episode > 0 ? agent->goals_reached_this_episode / agent->goals_attempted_this_episode : 0.0f; |
|
|
||
| agent->goals_attempted_this_episode += 1; // Count as attempt if agent had enough time | ||
| float displacement_from_last_goal = | ||
| sqrtf((agent->sim_x - agent->prev_goal_x) * (agent->sim_x - agent->prev_goal_x) + |
There was a problem hiding this comment.
incrementing goals_reached_this_episode when goal wasn't actually reached is semantically confusing - consider renaming variable to better reflect it includes partial credit
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Additional Comments (1)
|
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…riving Merge branch 3.0_beta
Added a score that checks if the agent completed its goal without colliding for each goal(driving segment).
Also added heuristic based check for the score of the last goal, has the agent attempted it's last goal, making some progress at a good speed.