This is the official codebase for the ICML 2023 paper
Abstract-to-Executable Trajectory Translation for One Shot Task Generalization by
Stone Tao, Xiaochen Li, Tongzhou Mu, Zhiao Huang, Yuzhe Qin, Hao Su
For visualizations and videos see our project page: https://trajectorytranslation.github.io/. For full details, check out our paper: https://arxiv.org/abs/2210.07658
To get started, install the repo with conda as so
conda env create -f environment.yml
conda activate tr2
And then run
pip install -e ./paper_rl/
pip install -e .
pip install -e external/ManiSkill2
Due to some compatability/dependency issues, we are still cleaning up the setup details to install opendrawer (which uses ManiSkill 1). For now you can try the above and then update the conda environment with the ManiSkill 1 dependencies. Check back for updates or watch this repo.
Our approach relies on following abstract trajectories. Abstract trajectories are easily generated via heuristics that just move 3D points representing objects in space, describing a general plan of what should be achieved by a low-level agent (e.g. the robot arm) without incorporating low-level details like physical manipulation. During RL training, these abstract trajectories are loaded up and given as part of the environment observation.
Follow the subsequent sections for instructions on obtaining abstract trajectories, training with them, and evaluating with them.
The dataset files can all be found at this google drive link: https://drive.google.com/file/d/1z38DTgzmTc2mfePYnP9qNDUfGgN80FYH/view?usp=sharing
Download and unzip to a folder called datasets
for the rest of the code to work.
To generate the abstract trajectories for each environment, see the scripts in scripts/abstract_trajectories/<env_name>
To train with online RL, specify a base configuration yml file, specify the experiment name
python scripts/train_translation_online.py \
cfg=train_cfg.yml restart_training=True logging_cfg.exp_name=test_exp exp_cfg.epochs=2000
Results including saved model checkpoints and evalution vidoes are stored in a results
folder. Note that results/<exp_name>/models/best_train_EpRet.pt
will be the model with the best training return.
In order to achieve greater precision and success rate, you can run the "finetuning" step by turning on gradient accumulation to stabilize RL training. This was used in the paper for training agents for the Blockstacking task. This can be done by running the following and specifying the initial weights (from the initial online training)
python scripts/train_translation_online.py \
cfg=train_cfg.yml restart_training=True logging_cfg.exp_name=test_exp_finetune exp_cfg.epochs=2000 \
pretrained_ac_weights=results/test_exp/models/best_train_EpRet.pt exp_cfg.accumulate_grads=True
For each environment, there is an associated train_cfg.yml
file that specifies the base hyperparameters for online RL training and environment configs. These are stored at cfgs/<env_name>/train.yml
To batch evalute trained models, specify the configurataion file and the model weights.
python scripts/eval_translation.py \
cfg=eval_cfg.yml model=results/test_exp/models/best_train_EpRet.pt
To simply watch the trained model, specify the configuration file, the model weights, and the ID of the trajectory
python scripts/watch_translation.py \
cfg=watch_cfg.yml model=results/test_exp/models/best_train_EpRet.pt traj_id=2
For each environment, there is an associated config file for evaluation and watching. These are stored at cfgs/<env_name>/<eval|watch>.yml
For specific scripts to run experiments to reproduce table 1 in our paper, see scripts/exps/<env_name>/*.sh
. These contain copy+pastable bash scripts to reproduce the individual results of each trial used to produce the mean values shown in table 1, including training and evaluation.
Already trained models and weights can be downloaded here: https://drive.google.com/file/d/15mTVSWTdX805EO1XGNBG20BE80BKBkah/view?usp=sharing. They are organized by results/<env_name>/<model>
We are still busy cleaning and organizing results for other non-core environments that were tested on as well as one of the ablation studies, stay tuned for updates by watching this repository.
Open sourced code for real world experiments is a work in progress, but here is a high level overview: We first predict the pose of a block in the real world, placed it in simulation and ran our trained blockstacking TR2-GPT2 agent to generate a simulated trajectory. Using position control, we execute the simulated trajectory step by step on the real robot arm. Then we place a new block into view and repeat the steps until done.
This part is still WIP as we're cleaning out the old research and experimental code to make extending the environmentes easier. However in general, you can subclass of the TrajectoryEnv class which lets you load abstract trajectories, stack observations, skip sampling, and more. See existing environments, (BoxPusher is a simple generally cleaner example) of how to do this.
To cite our work, you can use the following bibtex
@inproceedings{tao2023tr2,
title = {Abstract-to-Executable Trajectory Translation for One-Shot Task Generalization},
author = {Tao, Stone and Li, Xiaochen and Mu, Tongzhou and Huang, Zhiao and Qin, Yuzhe and Su, Hao},
booktitle = {Fortieth International Conference on Machine Learning},
year = {2023},
}