Skip to content

Latest commit

 

History

History
15 lines (13 loc) · 660 Bytes

File metadata and controls

15 lines (13 loc) · 660 Bytes
tags python,numpy,neural-network,reinforcement learning
mathjax true

Policy Gradient Methods

  • learns state to action mapping directly which is often more simple
  • no model of environment dynamics needed
  • allows continuous action space
  • allows stochastic policy which can be a crucial advantage compared to deterministic policies
  • Actor Critic RL Methods

{:.caption .img} Reinforcement Learning, Policy Gradients and Actor Critics Hado Van Hasselt - Reinforcement Learning, Policy Gradients and Actor Critics (2018)