Skip to content

Latest commit

 

History

History
23 lines (16 loc) · 974 Bytes

README.md

File metadata and controls

23 lines (16 loc) · 974 Bytes

ReinforcementLearning

Chapter wise implementation & analysis of all the algorithms in RL : An Intoduction by Richard S. Sutton and Andrew G. Barto

Chapter 2

The Notebook Greedy,e-Greedy,UCB,Gradient.ipynb demonstrates the working of following algorithms:

  1. Greedy Algorithm
  2. epsilon-Greedy Algorithm
  3. UCB
  4. Gradient Bandit

The notebook also shows the anlysis on the above algorithms with Optimistic Initial Values. Results shows that UCB outperforms all other algorithms in stationary K-armed Bandit problem.

Chapter 4

The notebook RL using Dynamic Programming.ipynb demonstrates the way of solving finite MDPs. Below mentioned alorithms are implmented:

  1. Policy Iteration with two arrays
  2. Policy Iteration using inplace update
  3. Value Iteration with two arrays
  4. Value Iteration using inplace updates

The results clearly shows that the Value Iteration with inplace updates converges faster then the other three algorithms.