>>> source .venv/bin/activate # activate virtual environment
>>> pip install -r requirements.txt # install required packages
A Markov decision process is a 4-tuple ${\displaystyle (S,A,P_{a},R_{a})}$, where:
A multi-armed bandit is a 2-tuple ${\displaystyle (A,R_{a})}$, where:
A contextual multi-armed bandit is a 3-tuple ${\displaystyle (S,A,R_{a})}$, where: