We present implementations of REINFORCE with Baseline and Monte-Carlo Tree Search algorithms on three MDPs: Cartpole, CS687-Gridworld and Mountain Car. For extra-credits, we have implemented a yet unexplored MDP: Mountain Car and we present different algorithms: Epsilon Greedy, Epsilon Decreasing Greedy, Upper Confidence Bound (UCB) and Thompson sampling performance analysis on multi-armed bandits.
-
Notifications
You must be signed in to change notification settings - Fork 0
razor08/RL-Project
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Implementation of REINFORCE with Baseline and Monte-Carlo Tree Search algorithms along with Multi-Armed Bandits.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published