Skip to content

Policy iteration algorithm for cart pole balancing problem, WIP.

Notifications You must be signed in to change notification settings

litesaber15/cartpolebalancing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cartpolebalancing

Using Policy Iteration to solve the Cart Pole Balancing problem. A simple 1 hidden layer fully connected neural network is used to evaluate the best action for a given state. Suppose a training episode lasts for k steps. Reward for each step is collected, and discounted return is calculated for each step after the episode ends. (state,discounted return) is stored for each each episode. Backpropogration is done for a batch of episodes, and the process is repeated for a number of batches.

Here's a GIF of the trained AI: Screen GIF

Simulation environment: OpenAI Gym Cartpole-v0

Forward pass and backpropogation done in Theano. Here are good tutorial for getting started with Theano and for implementing a simple ANN.

I used the CPU for this. The Nvidia drivers are a bit tricky to install on Ubuntu 1604 if you have Intel's Skylake. Here's my Theano .theanorc config for CPU:

[global]
floatX = float32
device = cpu
force_device=True
pycuda.init = False

[lib]
cnmem = 1

[blas]
ldflags=-L/usr/lib/ -lblas

About

Policy iteration algorithm for cart pole balancing problem, WIP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages