Derivation
30 Sep 2025
Derivation for Action-Value Function in Off-Policy Learning
Detailed derivation of the action-value function $Q(s, a)$ in off-policy learning using importance sampling, and an explanation of the backward loop implementation in Monte Carlo prediction.