Algorithm

11 Nov 2025

Temporal Difference (TD) Control Algorithms Comparison: SARSA, Expected SARSA, and Q-learning

Comparative analysis of major one-step Temporal Difference (TD) control algorithms: SARSA, Expected SARSA, and Q-learning, focusing on their policy nature and target construction.

10 Sep 2025

Dyna-Q+ Algorithm

Detailed pseudo-code for the Dyna-Q+ algorithm, covering both deterministic and non-stationary environments, with a focus on exploration bonuses.

1 Sep 2024

Afterstate Formulation

Formalization of the afterstate concept in Reinforcement Learning, including value functions and Dynamic Programming / Temporal Difference algorithms.