Skip to content

Code release for Graph Backup: Data Efficient Backup Exploiting Markovian Transitions https://arxiv.org/abs/2205.15824

License

Notifications You must be signed in to change notification settings

ZhengyaoJiang/graphbackup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Implementation of Graph Backup: Data-Efficient Backup Exploiting Markovian Transitions

Code release for Graph Backup: Data Efficient Backup Exploiting Markovian Transitions .

Abstract:

The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the environment. In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation. Compared to multi-step backup methods such as $n$-step $Q$-Learning and TD($\lambda$), Graph Backup can perform counterfactual credit assignment and gives stable value estimates for a state regardless of which trajectory the state is sampled from. Our method, when combined with popular value-based methods, provides improved performance over one-step and multi-step methods on a suite of data-efficient RL benchmarks including MiniGrid, Minatar and Atari100K. We further analyse the reasons for this performance boost through a novel visualisation of the transition graphs of Atari games.

introimg

The figure above shows the (a) the transition graph of an Atari game, Frostbite; (b) the backup diagrams for different backup methods. Graph backup will exploit the graph structure of tranistions to produce a value estimation.

The implementation of vanilla DQN for MiniGrid and MinAtar is based on https://github.com/Kaixhin/Rainbow, under the directory gridworld . The implementation of Rainbow for Atari is based on https://github.com/mila-iqia/spr, under the directory atari.

Install

conda create -n gb python=3.9
conda activate gb
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
# setup atari ROMS
cd atari
wget http://www.atarimania.com/roms/Roms.rar
unrar x Roms.rar
python -m atari_py.import_roms .

Usage

To run mingrid experiments:

cd gridworld
python core/run.py --id=T-1-1 --exp_group=T-1 --env=MiniGrid-KeyCorridorS3R1-v0 --num_steps 100000 --seed=1 --disable_noisy --disable_dist --priority-exponent=0.0 --disable_duelling --disable_noisy --distill_steps=1 --buffer_sample=uniform --initialization=distilled --multi-step=10 --backup_target=graph-limited --buffer_key=transition --branching_limit=50 --backup_target_update --discount=0.95 --learning-rate=0.001

To run minatar experiments:

cd gridworld
python core/run.py --id=T-2-1 --exp_group=T-2 --env=Minatar-seaquest --num_steps 100000 --seed=1 --disable_noisy --disable_dist --priority-exponent=0.0 --disable_duelling --disable_noisy --distill_steps=1 --buffer_sample=uniform --initialization=distilled --multi-step=5 --backup_target=graph-limited --buffer_key=transition --branching_limit=20 --backup_target_update --hidden-size=256 --learning-rate=0.000065 --learn-start=1600 --target-update=8000 --replay-frequency=4

To run atari experiments:

cd atari 
python scripts/run.py --game=breakout --exp_id=T-3-1 --seed=1 --num-logs=10 --spr=0 --backup=graph --augmentation none --target-augmentation 0 --momentum-tau 0.01 --n-step=10 --breath=10 --architecture=spr --learning_rate=0.0001 --limit_sample_method=uniform

For Practitioners

For practitioners who want to apply graph backup to their own projects or adapt graph backup to other algorithms. We recommand they check gbsampler.py where we packed up most of the important logics for graph backup in a single file. This includes building of the graph and using the resultant graph for value estimation.

Reference

@article{jiang2022graphbackup,
  title={Graph Backup: Data Efficient Backup Exploiting Markovian Transitions},
  author={Zhengyao Jiang and Tianjun Zhang and Robert Kirk and Tim Rocktäschel and Edward Grefenstette},
  journal={arXiv preprint arXiv:2205.15824},
  year={2022},
}

About

Code release for Graph Backup: Data Efficient Backup Exploiting Markovian Transitions https://arxiv.org/abs/2205.15824

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages