Implementation of Graph Backup: Data-Efficient Backup Exploiting Markovian Transitions

Code release for Graph Backup: Data Efficient Backup Exploiting Markovian Transitions .

Abstract:

The successes of deep Reinforcement Learning (RL) are limited to settings where we have a large stream of online experiences, but applying RL in the data-efficient setting with limited access to online interactions is still challenging. A key to data-efficient RL is good value estimation, but current methods in this space fail to fully utilise the structure of the trajectory data gathered from the environment. In this paper, we treat the transition data of the MDP as a graph, and define a novel backup operator, Graph Backup, which exploits this graph structure for better value estimation. Compared to multi-step backup methods such as $n$-step $Q$-Learning and TD($\lambda$), Graph Backup can perform counterfactual credit assignment and gives stable value estimates for a state regardless of which trajectory the state is sampled from. Our method, when combined with popular value-based methods, provides improved performance over one-step and multi-step methods on a suite of data-efficient RL benchmarks including MiniGrid, Minatar and Atari100K. We further analyse the reasons for this performance boost through a novel visualisation of the transition graphs of Atari games.

The figure above shows the (a) the transition graph of an Atari game, Frostbite; (b) the backup diagrams for different backup methods. Graph backup will exploit the graph structure of tranistions to produce a value estimation.

The implementation of vanilla DQN for MiniGrid and MinAtar is based on https://github.com/Kaixhin/Rainbow, under the directory gridworld . The implementation of Rainbow for Atari is based on https://github.com/mila-iqia/spr, under the directory atari.

Install

conda create -n gb python=3.9
conda activate gb
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
# setup atari ROMS
cd atari
wget http://www.atarimania.com/roms/Roms.rar
unrar x Roms.rar
python -m atari_py.import_roms .

Usage

To run mingrid experiments:

cd gridworld
python core/run.py --id=T-1-1 --exp_group=T-1 --env=MiniGrid-KeyCorridorS3R1-v0 --num_steps 100000 --seed=1 --disable_noisy --disable_dist --priority-exponent=0.0 --disable_duelling --disable_noisy --distill_steps=1 --buffer_sample=uniform --initialization=distilled --multi-step=10 --backup_target=graph-limited --buffer_key=transition --branching_limit=50 --backup_target_update --discount=0.95 --learning-rate=0.001

To run minatar experiments:

cd gridworld
python core/run.py --id=T-2-1 --exp_group=T-2 --env=Minatar-seaquest --num_steps 100000 --seed=1 --disable_noisy --disable_dist --priority-exponent=0.0 --disable_duelling --disable_noisy --distill_steps=1 --buffer_sample=uniform --initialization=distilled --multi-step=5 --backup_target=graph-limited --buffer_key=transition --branching_limit=20 --backup_target_update --hidden-size=256 --learning-rate=0.000065 --learn-start=1600 --target-update=8000 --replay-frequency=4

To run atari experiments:

cd atari 
python scripts/run.py --game=breakout --exp_id=T-3-1 --seed=1 --num-logs=10 --spr=0 --backup=graph --augmentation none --target-augmentation 0 --momentum-tau 0.01 --n-step=10 --breath=10 --architecture=spr --learning_rate=0.0001 --limit_sample_method=uniform

For Practitioners

For practitioners who want to apply graph backup to their own projects or adapt graph backup to other algorithms. We recommand they check gbsampler.py where we packed up most of the important logics for graph backup in a single file. This includes building of the graph and using the resultant graph for value estimation.

Reference

@article{jiang2022graphbackup,
  title={Graph Backup: Data Efficient Backup Exploiting Markovian Transitions},
  author={Zhengyao Jiang and Tianjun Zhang and Robert Kirk and Tim Rocktäschel and Edward Grefenstette},
  journal={arXiv preprint arXiv:2205.15824},
  year={2022},
}

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
atari		atari
gridworld		gridworld
img		img
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Implementation of Graph Backup: Data-Efficient Backup Exploiting Markovian Transitions

Install

Usage

For Practitioners

Reference

About

Releases

Packages

Languages

License

ZhengyaoJiang/graphbackup

Folders and files

Latest commit

History

Repository files navigation

Implementation of Graph Backup: Data-Efficient Backup Exploiting Markovian Transitions

Install

Usage

For Practitioners

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages