module.eval() and module.train() may produce different output with same module and same input as some special modules behavior differently in different mode. The common modules affected are bm modules and dropout modules.
** Why BM and Dropout works different? **
.requires_grad_() does not be inherited automatically by computation.
Refer to test/trials/testtensor.cpp/testXGrad()/y
Every operation performed on Tensor s creates a new function object, that performs the computation, and records that it happened.
Module class or struct, how it cooperates with shared_ptr
torch.autograd.detect_anomaly
John Schulman has said dropout doesnt really help in rl. Rl is very sensitive to learning rate. 0.01 is too high, 1e-7 is too low
- Maxium bias always exists due to estimation error
- Uniformly distributed overestimation does not affect policy searching
- But uniform overestimation distribution can not be assumed
- The overestimation error would then cause local-optimal policy or converge
- And bootstrapping distributed the error everywhere
- So, target model and double DQN created
- While in a lots of episodic cases, the estimated values would be corrected finally by terminal state true value
- So sometimes DQN may learn slower than DDQN but still get a not bad result
- Sutton ed2 ch6.7
- Double Dqn
Q-learning is off-policy as: In this case, the learned action-value function, Q, directly approximates q ⇤ , the optimal action-value function, independent of the policy being followed. This dramatically simplifies the analysis of the algorithm and enabled early convergence proofs. The policy still has an e↵ect in that it determines which state–action pairs are visited and updated. However, all that is required for correct convergence is that all pairs continue to be updated. As we observed in Chapter 5, this is a minimal requirement in the sense that any method guaranteed to find optimal behavior in the general case must require it.
SARSA is on-policy algorithm, as it takes Q(S', A') where A' is chosen by current policy
- Sutton ed2 ch6.5
- VS.
- Why are bias nodes used in neural networks?
- Finally, the tuned version uses a single shared bias for all action values in the top layer of the network. Double Dqn
- Bias Layer in Pytorch
If loss clipped, backpropagation failed
Kaiming initiation.