Explore Exploit
The explore exploit chapter in the book “Algorithms to Live By” looks into the problem called the “Multi-armed Bandit”. The “Multi-armed Bandit” is when you have multiple independent choices with different reward probabilities, and you can only try one at a time (like a slot machine). One algorithm that the book describes to tackle this problem is the Gittins Index. The Gittins Index is a fairly complicated calculation, but the idea for the slot machines example is that it favors exploration of new machines when there is less data gathered about the odds. The amount that the index favors exploration is calculated from the “discount” factor, the higher the discount factor, the more exploration. For example, if you only have a few minutes to test the machines, then you might choose a lower discount factor, because you don’t have time to test any one thoroughly, so stick with a decent one if you find it. With a high 90% factor, shown in the graph below, even a machine with a 9 to 5 win ratio would have a lower Gittins Index than an untested machine, guiding the player to move on. ...