We use the Franka Panda robot model with operational space control, and train with task-specific dense reward. All agents receive a 168×168 egocentric RGB view as input. The positions of moveable objects are randomized in each episode.
- Door opening: a robot arm must turn the handle and open the door in front of it.
- Nut assembly: two colored pegs (square and round) are mounted on the tabletop. The robot must fit the round nut onto the round peg.
- Two-arm lifting: two arms on opposite ends must each grab a handle of a large pot and lift it up above certain height.
- Peg-in-hole: one arm holds a board with a square hole in the center, and the other holds a long peg. Two arms must coordinate to insert the peg into the hole.
All agents are trained with clean background and objects, and evaluated on 3 progressively harder sets of environments. We design 10 variations for each task and difficulty level, and report the mean reward over 100 evaluation episodes (10 per variation). SECANT gains an average of +287.5% more reward in easy set, +374.3% in hard set, and +351.6% in extreme set over the best prior method.
Please refer to Installation.
Use secant.envs.robosuite.make_robosuite()
to create a standardized Robosuite Gym environment.
from secant.envs.robosuite import make_robosuite
env = make_robosuite(
task="Door",
mode="train",
scene_id=0,
)
env.reset()
done = False
while not done:
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
Important Note:
-
task
can be set to one of["Door", "TwoArmPegInHole", "NutAssemblyRound", "TwoArmLift"]
. -
mode
can be set to one of["train", "eval-easy", "eval-hard", "eval-extreme"]
. -
scene_id
is0
for the training mode and is in a range of0
to9
for the evaluation mode. -
Robosuite supports both image and robot state modality. Pass in
["rgb"]
or["state"]
toobs_modality
to turn on either mode, or pass in["rgb", "state"]
to turn on both.For image modality, robosuite supports different view points such as
frontview
,sideview
,birdview
,agentview
,robotview
andeye_in_hand
. To control which view(s) that the environment returns, pass in the name(s) of the desired views toobs_cameras
. Ex.obs_cameras=["agentview", "frontview"]
. -
The created environment instance has properties
observation_space
andaction_space
. Please refer to OpenAI Gym's API. Because both rgb and state modality are supported, the observation space is a dictionary ofgym.Box
objects rather than a singlegym.Box
object. Observation should have the form{"rgb": {"frontview": np.array, "sideview": np.array}, "state": np.array}
if multiple views are requested or{"rgb": np.array, "state": np.array}
if only one view is requested. -
Set
headless=True
if no human rendering is needed. Userender_camera
to set the camera name to use whenenv.render()
is called. Note that when image modality is turned on,render_camera
has to be included inobs_cameras
.