BookReview

The explore exploit chapter in the book “Algorithms to Live By” looks into the problem called the “Multi-armed Bandit”. The “Multi-armed Bandit” is when you have multiple independent choices with different reward probabilities, and you can only try one at a time (like a slot machine). One algorithm that the book describes to tackle this problem is the Gittins Index. The Gittins Index is a fairly complicated calculation, but the idea for the slot machines example is that it favors exploration of new machines when there is less data gathered about the odds. The amount that the index favors exploration is calculated from the “discount” factor, the higher the discount factor, the more exploration. For example, if you only have a few minutes to test the machines, then you might choose a lower discount factor, because you don’t have time to test any one thoroughly, so stick with a decent one if you find it. With a high 90% factor, shown in the graph below, even a machine with a 9 to 5 win ratio would have a lower Gittins Index than an untested machine, guiding the player to move on. ...

I’ve been reading through the book Algorithms to Live By, and I thought it would be interesting to summarize or think of an application for each chapter since the book is about how algorithms influence or can be used in everyday decisions. The first chapter is about the Optimal Stopping problem, which deals with deciding when to, well stop, given a decision that costs something over time. One example in the book is about hiring a secretary, or some other job candidate. According to the math, if you want to find the top candidate, you should reject the first 37% applicants in order to establish a baseline for the candidate quality, then accept the first candidate that outperforms the baseline. ...