The first of two reports I have written this year for STOR601 is on the multi-armed bandit problem, supervised by James Grant.
This report focuses on using Thompson sampling to minimise regret for the multi-armed bandit problem, including approximations to Thompson sampling when the method cannot be used directly. These methods are compared empirically using simulated data.
View the report here: