Skip to content

This is my implementation of Deep Reinforcement Learning - one of the Weekend projects.

Notifications You must be signed in to change notification settings

chuong98/Weekend-DeepRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

77 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Leaning Tutorial

About

Weekend Deep Reinforcement Learning (DRL) is a self-study of DRL in my free time. DRL is very easy, especially when you already have a bit background in Control and Deep Learning. Even without the background, the concept is still very simple, so why not study and have fun with it.

My implementation aims to provides a minimal code implementation, and short notes to summarize the theory.

  • The code, modules, and config system are written based on mmcv configs and registry system, thus very easy to adopt, adjust components by changing the config files.
  • Lecture Notes: No lengthy math, just the motivation concept, key equations for implementing, and a summary of tricks that makes the methods work. More important, I try to make the connection with previous methods as possible.

My learning strategy is to go directly to summarize and implement the papers, starting from the basic one. I hate the fact that most of the books in RL always start with very heavy theory background, asking us to remember many vague definitions, such as what is On-Line, Off-Line, Policy Gradient, etc. NO, NO, NO !!! Let play with the basic blocks first. When we feel comfortable, just recap and introduce these concepts later. It is absolutely fine if you don't remember these definitions at all.

Following are the great resource that I learn from:

1. Env Setup:

conda create -n RL --python=3.8 -y
conda install tqdm mathplotlib scipy
conda install pytorch torchvision cudatoolkit=10.2 -c pytorch
pip install gym 
pip install gym[all] #Install the environment dependence
# or pip install cmake 'gym[atari]'
pip install pybullet

2. Try Gym environment

import gym
env = gym.make('CartPole-v0')
for i_episode in range(20):
    observation = env.reset() # Before start, reset the environment 
    for t in range(100):
        env.render()            
        print(observation)
        action = env.action_space.sample() # This is where your code should return action
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()
  • Every environment comes with an env.action_space and an env.observation_space.
  • List all available environments: gym.envs.registry.all().

3. Algorithms:

Paper ranking:

  • 🏆 Must known benchmark papers.
  • 🚀 Improved version of benchmark papers. Come back after finishing the benchmark papers.
  1. Q-Learning: Introduction to RL with Q-Learning
  2. Deep Q-Learning:
  3. Actor-Critic methods:
  4. Recap and overview of RL methods:
  5. Policy Gradient:
  6. How to deal with Sparse Reward for Off-Line learning:
  7. On-Line Policy (TBD)
  8. Model-Based Learning (TBD)
  9. Multi-Agent Learning (TBD)

4. Usage:

Except the first Q-Learning tutorial, that is for RL introduction, all other methods can be easily trained as:

python tools/train.py [path/to/config.py] [--extra_args]

For example, to train a Deep Q-Learning (DQN) for mountain car env, use:

python tools/train.py configs/DQN/dqn_mountain_car.py

About

This is my implementation of Deep Reinforcement Learning - one of the Weekend projects.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages