3.0_beta merge into 3.0 (the big merge)#324
Draft
riccardosavorgnan wants to merge 18 commits into3.0from
Draft
Conversation
added new sub-types for entities Agent, RoadMapElement, TrafficControlElement
Implementation of new Agent and RoadMapElement datatypes. Large PR that introduces the new datatypes in place of the old Entity. We swapped old data field names to match the new ones, and disambiguate them between RoadElements and Agents. NOTE: The code changes should NOT introduce any behaviour change!
* Changed counts to max supported by sim, max_controlled, num_created and removed everything else to prevent redundencies * Fixed memory leaks * Fix max agent issue with womd maps
* Adding Infra code for Speed limits and lane center/angle * Removing commented code for lane alignment metric * Modifying torch.py * Fixing reset issues and lane alignment override * First checkpoint to add reward conditioning * Further changes to support reward conditioning * Fixing alignment issue * Adding lane reward conditioning * Modifying ego features based on conditioning logic * added config to set randomization range * fixing a bug: missing args in one of the bindings * bug fix + adding conditioning features to the neural policy of the render env * Recommended working config * Apply suggestion from @greptile-apps[bot] Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: Aditya Gupta <adigupta2602@gmail.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Goal speed conditioning - the agent can achieve the goal only if it's between a threshold from the target speed * new config with working parameters --------- Co-authored-by: riccardosavorgnan <r.savorgnan.rs@gmail.com>
…r + interval around it (#297) * Modified goal speed interval to be based on min/max rather than center + interval around it * restored config to defaults --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Implemented the Goal Radius randomization and rendering.
- We loop to pick at random a road element and check whether it's within the doughnut; if one is found, we select it and interrupt the loop. if not, we continue looping until one is found. - If no valid segment is found at the end of the loop, we fallback to the one closest to the doughnut; (note that because of random shuffling with resampling we might not loop through all elements.)
* New config for testing - 3D Carla maps are now default; - Sample new goal on default; - reduced goal reward from 1.0 to 0.4; - 300 steps episode length, resample environments after every episode;
* Added branch name, latest commit hash, latest commit message, for better tracking of wandb runs * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * code cleanup --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* rename final weights * oops, put things in the correct order
* Update agent cumulative displacement and log distance without collision * Mini Refacto + NEEDS DEBUGGING * fix a small error in drive.h that was breaking the code * Fix logic error in move_dynamics Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Adding full Reward conditioning support - with rewards like comfort, overspeed, time etc. * Adding more logs for comparison with reward conditioning - like lane center rate, avg speed etc.
Bifurcates between truncations and termination to let RL policy use bootstrapped value in case of truncation, potentially aiding training. Terminal setting - In case of STOP/REMOVE collision behavior setting, we mark the episode terminated for the corresponding agent If the agent reaches goal Truncate - If agent hasn't reached goal and the episode length is reached, or if all agents have reached goal (terminationMode=1)
…or reward conditioning coefficients (#320)
…323) * Changed goal value to 1.0 * Changed collision and offroad value ranges for randomzation to [-3.0, -0.1]
| reward_goal = 1 | ||
| reward_goal_post_respawn = 0.25 | ||
| ; Meters around goal to be considered "reached" | ||
| ; Meters around goal to be considered "reached" // ONLY active if reward_randomization = 0 |
There was a problem hiding this comment.
This comment is sort of confusing. It's on even when reward randomization is off, right?
Comment on lines
+63
to
+66
| reward_randomization = 1 | ||
| ; Options: 0 - Fixed reward values, 1 - Random reward values | ||
| reward_conditioning = 1 | ||
| ; Options: 1 - Add reward coefs to obs array, 0 - Dont |
There was a problem hiding this comment.
I feel like these are actually the same, when would these values ever be different?
| vtrace_c_clip = 1 | ||
| vtrace_rho_clip = 1 | ||
| checkpoint_interval = 1000 | ||
| checkpoint_interval = 250 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
After confirming the 3.0_beta branch produces sensible policies, we proceed with merging all the work in the 3.0 stable branch.