Multi-armed Bandits and Markov Decision Processes for Gaming

A series of assignments in which we:

  • Implemented variants of Thompson Sampling and KL-UCB for solving a Batched Multi-armed Bandits Problem
  • Executed Markov Decision Process planning to devise an optimal strategies for half-field football offence and a billiards game, using Value iteration, Linear Programming, Howard’s Policy Iteration and Monte-Carlo Tree search

Github Repository