A series of bandit algorithms in Swift, built with functional programing and immutable data structures. Inspired by johnmyleswhite/BanditsBook.
To run on the command line:
$ swift build
// requires you to be in the./Swiper
directory$ ./build/debug/Swiper
// builds aresults_swift.tsv
file in your~/Documents/
directory
The epsilon in the Epsilon-greedy strategy controls the proportion of explorations vs exploitations.
Example usage:
let epsilonGreedy = EpsilonGreedy(epsilon: 0.1, nArms: 2)
let selectedArm = epsilonGreedy.selectArm()
somethingWithCallback(color: selectedArm) { (reward) in
let updatedEpsilonGreedy = epsilonGreedy.update(selectedArm, reward: reward)
}
The Annealing Softmax object selects arms based on a softmax function. This object does not require a temperature—the algorithm automatically manages it via simulated annealing.
Example usage:
let softmax = Softmax(nArms: 4)
let selectedArm = softmax.selectArm()
somethingWithCallback(copy: selectedArm) { (reward) in
let updatedSoftmax = softmax.update(selectedArm, reward: reward)
}
The UCB strategy uses context to select its next arm. The UCB1 assumes that your max reward is a value of 1.
Example useage:
let ucb = UCB1(nArms: 3)
let selectedArm = ucb.selectArm()
somethingWithCallback(displayPopup: selectedArm) { (reward) in
let updatedUcb = ucb.update(selectedArm, reward: reward)
}