3.0_beta merge into 3.0 (the big merge) by riccardosavorgnan · Pull Request #324 · Emerge-Lab/PufferDrive

riccardosavorgnan · 2026-03-03T01:28:06Z

After confirming the 3.0_beta branch produces sensible policies, we proceed with merging all the work in the 3.0 stable branch.

added new sub-types for entities Agent, RoadMapElement, TrafficControlElement
Implementation of new Agent and RoadMapElement datatypes
Fix agent count to align with gigaflow stuff (Fix agent count to align with gigaflow stuff #295)
Lane Rewards + Reward randomization and conditioning (Lane Rewards + Reward randomization and conditioning #296)
goal speed conditioning (goal speed conditioning #286)
Modified goal speed interval to be based on min/max rather than center + interval around it (Modified goal speed interval to be based on min/max rather than center + interval around it #297)
Goal Radius Randomization (Goal Radius Randomization #299)
Sampling goals within a specified doughnut(r1,r2) (Sampling goals within a specified doughnut(r1,r2) #304)
Common config for training runs for 3.0_beta (Common config for training runs for 3.0_beta #310)
Github Logs in WANDB Notes Section (Github Logs in WANDB Notes Section #312)
Rename the final weight file for readability (Rename the final weight file for readability #314)
Wbd/distance without collision (Wbd/distance without collision #315)
Implementing Full Reward Conditioning
Adding Truncations/Terminations to 3.0_beta
Small update to not let 0 values get passed into the neural network for reward conditioning coefficients (Small update to not let 0 values get passed into the neural network #320)
Correct values for goal collision and offroad rewards randomization (Correct values for collision and offroad rewards randomization #323)

…lElement

added new sub-types for entities Agent, RoadMapElement, TrafficControlElement

Implementation of new Agent and RoadMapElement datatypes. Large PR that introduces the new datatypes in place of the old Entity. We swapped old data field names to match the new ones, and disambiguate them between RoadElements and Agents. NOTE: The code changes should NOT introduce any behaviour change!

* Changed counts to max supported by sim, max_controlled, num_created and removed everything else to prevent redundencies * Fixed memory leaks * Fix max agent issue with womd maps

@greptile-apps

* Adding Infra code for Speed limits and lane center/angle * Removing commented code for lane alignment metric * Modifying torch.py * Fixing reset issues and lane alignment override * First checkpoint to add reward conditioning * Further changes to support reward conditioning * Fixing alignment issue * Adding lane reward conditioning * Modifying ego features based on conditioning logic * added config to set randomization range * fixing a bug: missing args in one of the bindings * bug fix + adding conditioning features to the neural policy of the render env * Recommended working config * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Aditya Gupta <adigupta2602@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Goal speed conditioning - the agent can achieve the goal only if it's between a threshold from the target speed * new config with working parameters --------- Co-authored-by: riccardosavorgnan <r.savorgnan.rs@gmail.com>

…r + interval around it (#297) * Modified goal speed interval to be based on min/max rather than center + interval around it * restored config to defaults --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Implemented the Goal Radius randomization and rendering.

- We loop to pick at random a road element and check whether it's within the doughnut; if one is found, we select it and interrupt the loop. if not, we continue looping until one is found. - If no valid segment is found at the end of the loop, we fallback to the one closest to the doughnut; (note that because of random shuffling with resampling we might not loop through all elements.)

* New config for testing - 3D Carla maps are now default; - Sample new goal on default; - reduced goal reward from 1.0 to 0.4; - 300 steps episode length, resample environments after every episode;

* Added branch name, latest commit hash, latest commit message, for better tracking of wandb runs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * code cleanup --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* rename final weights * oops, put things in the correct order

* Update agent cumulative displacement and log distance without collision * Mini Refacto + NEEDS DEBUGGING * fix a small error in drive.h that was breaking the code * Fix logic error in move_dynamics Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Adding full Reward conditioning support - with rewards like comfort, overspeed, time etc. * Adding more logs for comparison with reward conditioning - like lane center rate, avg speed etc.

Bifurcates between truncations and termination to let RL policy use bootstrapped value in case of truncation, potentially aiding training. Terminal setting - In case of STOP/REMOVE collision behavior setting, we mark the episode terminated for the corresponding agent If the agent reaches goal Truncate - If agent hasn't reached goal and the episode length is reached, or if all agents have reached goal (terminationMode=1)

…or reward conditioning coefficients (#320)

…323) * Changed goal value to 1.0 * Changed collision and offroad value ranges for randomzation to [-3.0, -0.1]

eugenevinitsky · 2026-03-03T01:31:42Z

pufferlib/config/ocean/drive.ini

+reward_goal = 1
 reward_goal_post_respawn = 0.25
-; Meters around goal to be considered "reached"
+; Meters around goal to be considered "reached" // ONLY active if reward_randomization = 0


This comment is sort of confusing. It's on even when reward randomization is off, right?

eugenevinitsky · 2026-03-03T01:33:36Z

pufferlib/config/ocean/drive.ini

+reward_randomization = 1
+; Options: 0 - Fixed reward values, 1 - Random reward values
+reward_conditioning = 1
+; Options: 1 - Add reward coefs to obs array, 0 - Dont


I feel like these are actually the same, when would these values ever be different?

eugenevinitsky · 2026-03-03T01:34:05Z

pufferlib/config/ocean/drive.ini

 vtrace_c_clip = 1
 vtrace_rho_clip = 1
-checkpoint_interval = 1000
+checkpoint_interval = 250


probably can leave this alone?

riccardosavorgnan and others added 18 commits February 5, 2026 02:28

added new sub-types for entities Agent, RoadMapElement, TrafficContro…

5a2028d

…lElement

Merge pull request #278 from Emerge-Lab/3.0_beta_newdatatypes

90f2077

added new sub-types for entities Agent, RoadMapElement, TrafficControlElement

Implementation of new Agent and RoadMapElement datatypes

55698f4

Fix agent count to align with gigaflow stuff (#295)

fabbd41

* Changed counts to max supported by sim, max_controlled, num_created and removed everything else to prevent redundencies * Fixed memory leaks * Fix max agent issue with womd maps

goal speed conditioning (#286)

e254241

* Goal speed conditioning - the agent can achieve the goal only if it's between a threshold from the target speed * new config with working parameters --------- Co-authored-by: riccardosavorgnan <r.savorgnan.rs@gmail.com>

Goal Radius Randomization (#299)

dddf28e

* Implemented the Goal Radius randomization and rendering.

Common config for training runs for 3.0_beta (#310)

ee1724f

* New config for testing - 3D Carla maps are now default; - Sample new goal on default; - reduced goal reward from 1.0 to 0.4; - 300 steps episode length, resample environments after every episode;

Rename the final weight file for readability (#314)

d58ec2d

* rename final weights * oops, put things in the correct order

Implementing Full Reward Conditioning

c5412a2

* Adding full Reward conditioning support - with rewards like comfort, overspeed, time etc. * Adding more logs for comparison with reward conditioning - like lane center rate, avg speed etc.

Small update to not let 0 values get passed into the neural network f…

fb4186e

…or reward conditioning coefficients (#320)

Correct values for goal collision and offroad rewards randomization (#…

0de3af3

…323) * Changed goal value to 1.0 * Changed collision and offroad value ranges for randomzation to [-3.0, -0.1]

riccardosavorgnan requested a review from eugenevinitsky March 3, 2026 01:29

eugenevinitsky reviewed Mar 3, 2026

View reviewed changes

riccardosavorgnan requested a review from mpragnay March 3, 2026 01:32

eugenevinitsky reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3.0_beta merge into 3.0 (the big merge)#324

3.0_beta merge into 3.0 (the big merge)#324
riccardosavorgnan wants to merge 18 commits into3.0from
3.0_beta

riccardosavorgnan commented Mar 3, 2026 •

edited

Loading

Uh oh!

eugenevinitsky Mar 3, 2026

Uh oh!

eugenevinitsky Mar 3, 2026

Uh oh!

eugenevinitsky Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

riccardosavorgnan commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eugenevinitsky Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

eugenevinitsky Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

eugenevinitsky Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

riccardosavorgnan commented Mar 3, 2026 •

edited

Loading