Skip to content

3.0_beta merge into 3.0 (the big merge)#324

Draft
riccardosavorgnan wants to merge 18 commits into3.0from
3.0_beta
Draft

3.0_beta merge into 3.0 (the big merge)#324
riccardosavorgnan wants to merge 18 commits into3.0from
3.0_beta

Conversation

@riccardosavorgnan
Copy link
Collaborator

@riccardosavorgnan riccardosavorgnan commented Mar 3, 2026

After confirming the 3.0_beta branch produces sensible policies, we proceed with merging all the work in the 3.0 stable branch.

riccardosavorgnan and others added 18 commits February 5, 2026 02:28
added new sub-types for entities Agent, RoadMapElement, TrafficControlElement
Implementation of new Agent and RoadMapElement datatypes.

Large PR that introduces the new datatypes in place of the old Entity. 
We swapped old data field names to match the new ones, and disambiguate them between RoadElements and Agents. 

NOTE: The code changes should NOT introduce any behaviour change!
* Changed counts to max supported by sim, max_controlled, num_created and removed everything else to prevent redundencies

* Fixed memory leaks

* Fix max agent issue with womd maps
* Adding Infra code for Speed limits and lane center/angle

* Removing commented code for lane alignment metric

* Modifying torch.py

* Fixing reset issues and lane alignment override

* First checkpoint to add reward conditioning

* Further changes to support reward conditioning

* Fixing alignment issue

* Adding lane reward conditioning

* Modifying ego features based on conditioning logic

* added config to set randomization range

* fixing a bug: missing args in one of the bindings

* bug fix + adding conditioning features to the neural policy of the render env

* Recommended working config

* Apply suggestion from @greptile-apps[bot]

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Aditya Gupta <adigupta2602@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Goal speed conditioning - the agent can achieve the goal only if it's between a threshold from the target speed

* new config with working parameters

---------

Co-authored-by: riccardosavorgnan <r.savorgnan.rs@gmail.com>
…r + interval around it (#297)

* Modified goal speed interval to be based on min/max rather than center + interval around it

* restored config to defaults

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implemented the Goal Radius randomization and rendering.
- We loop to pick at random a road element and check whether it's within the doughnut; if one is found, we select it and interrupt the loop. if not, we continue looping until one is found.

- If no valid segment is found at the end of the loop, we fallback to the one closest to the doughnut; (note that because of random shuffling with resampling we might not loop through all elements.)
* New config for testing

- 3D Carla maps are now default; 
- Sample new goal on default;
- reduced goal reward from 1.0 to 0.4;
- 300 steps episode length, resample environments after every episode;
* Added branch name, latest commit hash, latest commit message, for better tracking of wandb runs

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* code cleanup

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* rename final weights

* oops, put things in the correct order
* Update agent cumulative displacement and log distance without collision

* Mini Refacto + NEEDS DEBUGGING

* fix a small error in drive.h that was breaking the code

* Fix logic error in move_dynamics

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Adding full Reward conditioning support - with rewards like comfort, overspeed, time etc.

* Adding more logs for comparison with reward conditioning - like lane center rate, avg speed etc.
Bifurcates between truncations and termination to let RL policy use bootstrapped value in case of truncation, potentially aiding training.

Terminal setting - 
In case of STOP/REMOVE collision behavior setting, we mark the episode terminated for the corresponding agent
If the agent reaches goal

Truncate - 
If agent hasn't reached goal and the episode length is reached, or if all agents have reached goal (terminationMode=1)
…323)

* Changed goal value to 1.0
* Changed collision and offroad value ranges for randomzation to [-3.0, -0.1]
reward_goal = 1
reward_goal_post_respawn = 0.25
; Meters around goal to be considered "reached"
; Meters around goal to be considered "reached" // ONLY active if reward_randomization = 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is sort of confusing. It's on even when reward randomization is off, right?

Comment on lines +63 to +66
reward_randomization = 1
; Options: 0 - Fixed reward values, 1 - Random reward values
reward_conditioning = 1
; Options: 1 - Add reward coefs to obs array, 0 - Dont

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these are actually the same, when would these values ever be different?

vtrace_c_clip = 1
vtrace_rho_clip = 1
checkpoint_interval = 1000
checkpoint_interval = 250

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably can leave this alone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants