Algorithm
11 Nov 2025
Temporal Difference (TD) Control Algorithms Comparison: SARSA, Expected SARSA, and Q-learning
Comparative analysis of major one-step Temporal Difference (TD) control algorithms: SARSA, Expected SARSA, and Q-learning, focusing on their policy nature and target construction.
10 Sep 2025
Detailed pseudo-code for the Dyna-Q+ algorithm, covering both deterministic and non-stationary environments, with a focus on exploration bonuses.
1 Sep 2024
Formalization of the afterstate concept in Reinforcement Learning, including value functions and Dynamic Programming / Temporal Difference algorithms.