All the algorithms I implemented (using Python3 and NumPy) while reading Introduction to Reinforcement Learning by Sutton and Barto.
There's a separate ReadMe for each topic
High Level structure of the repo :
- Bandits
- Epsilon-greedy
- Optimistic initial value
- Softmax exploration
- Dynamic Programming methods
- Policy iteration
- Value iteration
- Model free methods
- Monte Carlo control
- On-Policy Monte Carlo
- Off-policy Monte Carlo using Importance Sampling (incomplete)
- Temporal-difference methods
- Q-Learning
- SARSA
- Monte Carlo control
CS234 and David silver often use different notations, it would be better to follow just one of them in the beginning (I prefer David Silver's lectures)
Check this out for more resources!