Skip to content
/ MPO Public

Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments

Notifications You must be signed in to change notification settings

acyclics/MPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MPO (Maximum a Posteriori Policy Optimization)

Pytorch implementation of MPO (works cited below) with the help of other repositories (also cited below).

Policy evaluation is done using Retrace.

Currently only accommodate Discrete gym environments.

Usage

Look at main.py for examples of using MPO.

The architectures for Actor and Critic can be changed in mpo_net.py.

Citations

  • Maximum a Posteriori Policy Optimisation (Original MPO algorithm)

https://arxiv.org/abs/1806.06920

  • Relative Entropy Regularized Policy Iteration (Improved MPO algorithm)

https://arxiv.org/abs/1812.02256

  • daisatojp's mpo github repository (MPO implementation as reference)

https://github.com/daisatojp/mpo

  • Openai's ACER github repository (Replay buffer implementation as reference)

https://github.com/openai/baselines/tree/master/baselines/acer

Training Results

mpo_on_LunarLanderV2

  • 5 parallel environments

mpo_on_AcrobotV1

  • 5 paralle environments

About

Pytorch implementation of "Maximum a Posteriori Policy Optimization" with Retrace for Discrete gym environments

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages