This paper reinterprets MDP planning as Bayesian inference over policies, using a novel approach to approximate the posterior distribution of optimal policies in discrete domains.
The research explores a new way to solve decision-making problems by treating the process of finding the best actions as a form of Bayesian inference. In simple terms, it uses a statistical approach to guess the best strategies, where uncertainty about the best actions is represented as a distribution. This method is tested on various decision-making scenarios like grid worlds and Blackjack, and it shows how this approach can provide insights into the uncertainty of different strategies compared to traditional methods.