Code to complement "Ready Policy One: World Building through Active Learning".
Trains an agent inside an ensemble of dynamics models.
All the experiments run in this paper exist in the args_yml
directory. On the machines we trained on, we could run 5 seeds concurrently, hence the macro-level script run_experiments.py
launches 5 at once, with a binary flag to toggle seeds 0-4 or 5-9.
To run the HalfCheetah
Ready Policy One experiments for seeds 5-9, type the following:
python run_experiments.py --yaml ./args_yml/main_exp/halfcheetah-rp1.yml --seeds5to9
@article{rpone2020,
title={Ready Policy One: World Building Through Active Learning},
author={Ball, Philip and Parker-Holder, Jack and Pacchiano, Aldo and Choromanski, Krzysztof and Roberts, Stephen},
journal={Proceedings of the 37th International Conference on Machine Learning},
year={2020}
}
Two reasons: 1) It is non-parallelised; 2) This code tries to find GPUs where possible, try forcing it to run on CPU
The authors acknowledge Nikhil Barhate for his PPO-PyTorch repo. The ppo.py
file here is a heavily modified version of this code.