Skip to content

🦊 A series of bandit algorithms in Swift

Notifications You must be signed in to change notification settings

crenwick/Swiper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Swiper

Carthage compatible

A series of bandit algorithms in Swift, built with functional programing and immutable data structures. Inspired by johnmyleswhite/BanditsBook.

Swiper

Swift Build System Instructions

To run on the command line:

  1. $ swift build // requires you to be in the ./Swiper directory
  2. $ ./build/debug/Swiper // builds a results_swift.tsv file in your ~/Documents/ directory

Epsilon-Greedy

The epsilon in the Epsilon-greedy strategy controls the proportion of explorations vs exploitations.

Example usage:

let epsilonGreedy = EpsilonGreedy(epsilon: 0.1, nArms: 2)
let selectedArm = epsilonGreedy.selectArm()
somethingWithCallback(color: selectedArm) { (reward) in
  let updatedEpsilonGreedy = epsilonGreedy.update(selectedArm, reward: reward)
}

Softmax (Annealing)

The Annealing Softmax object selects arms based on a softmax function. This object does not require a temperature—the algorithm automatically manages it via simulated annealing.

Example usage:

let softmax = Softmax(nArms: 4)
let selectedArm = softmax.selectArm()
somethingWithCallback(copy: selectedArm) { (reward) in
  let updatedSoftmax = softmax.update(selectedArm, reward: reward)
}

UCB1 (Upper Confidence Bound)

The UCB strategy uses context to select its next arm. The UCB1 assumes that your max reward is a value of 1.

Example useage:

let ucb = UCB1(nArms: 3)
let selectedArm = ucb.selectArm()
somethingWithCallback(displayPopup: selectedArm) { (reward) in
  let updatedUcb = ucb.update(selectedArm, reward: reward)
}