Afterstate

11 Nov 2025

Temporal Difference (TD) Control Algorithms Comparison: SARSA, Expected SARSA, and Q-learning

Comparative analysis of major one-step Temporal Difference (TD) control algorithms: SARSA, Expected SARSA, and Q-learning, focusing on their policy nature and target construction.

15 Sep 2025

Reinforcement Learning for Outfit Compatibility

Modeling the outfit compatibility problem as a Markov Decision Process (MDP), defining the state space, action space, and afterstate formulation for sequential item selection.

1 Sep 2024

Afterstate Formulation

Formalization of the afterstate concept in Reinforcement Learning, including value functions and Dynamic Programming / Temporal Difference algorithms.