15 lines (13 loc) · 660 Bytes

tags	python,numpy,neural-network,reinforcement learning
mathjax	true

Policy Gradient Methods

learns state to action mapping directly which is often more simple
no model of environment dynamics needed
allows continuous action space
allows stochastic policy which can be a crucial advantage compared to deterministic policies
Actor Critic RL Methods

{:.caption .img} Hado Van Hasselt - Reinforcement Learning, Policy Gradients and Actor Critics (2018)