Skip to content


Repository files navigation

Fixup Policy Optimization (FixPO)

This repo contains the code for "Guaranteed Trust Region Optimization via Two-Phase KL Penalization".

It implements an efficient trust region optimization algorithm, FixPO.

Setup Instructions

Install Python >=3.8

Install Python's poetry dependency manager:

curl -sSL | python3 -

Install Mujoco 2.1:

mkdir -p $HOME/.mujoco && \
curl -O && \
tar xf mujoco210-linux-x86_64.tar.gz --directory $HOME/.mujoco

echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin' >> ~/.bashrc
source ~/.bashrc

Install dependencies:

pip install --upgrade --user pip
poetry install

Alternatively, use

There are subdirectories containing tianshou v0.5.0, Meta-World v2.0.0, and trust-region-layers in this repo that are patched to be compatible with each other. See patches if you would like to apply these changes yourself.

Most experiments are listed in and use code in src.

To run those experiments:

poetry run doexp

File Overview Defines all of the experiments to run that use tianshou. Defines all of the experiments to run that use trust_region_layers.

doexp: Runs all experiments defined in

src/ Main algorithm code.

src/ Launcher for FixPO with Gym environments.

src/ Launcher for FixPO with Meta-World environments.

src/ Launcher for PPO with Gym environments.

src/ Launcher for TRPO with Gym environments.

src/ Launcher for PPO with Meta-World environments.

src/ Helper function for setting up Meta-World environments.

src/ Helper function for setting up Meta-World environments.

patches/tianshou.patch: Patch to Tianshou v0.5.0 that allows running Meta-World experiments.

patches/metaworld.patch: Patch to Meta-World v2.0.0 that allows running Meta-World experiments.

patches/trust_region_layers.patch: Patch to trust_region_layers that allows running Meta-World experiments.

patches/mujoco_kl_config.json: Base config to use the KL projection proposed in trust_region_layers.


If you use this code in your research, please cite:

  title={Guaranteed Trust Region Optimization via Two-Phase KL Penalization},
  author={K. R. Zentner and Ujjwal Puri and Zhehui Huang and Gaurav S. Sukhatme},