Skip to content

Commit

Permalink
add pose_estimation; add sample data and demo code
Browse files Browse the repository at this point in the history
  • Loading branch information
andyzeng committed Sep 18, 2016
1 parent 419d595 commit d94666c
Show file tree
Hide file tree
Showing 100 changed files with 2,212 additions and 849 deletions.
82 changes: 68 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,55 @@



## A Quick-Start: Matlab Demo
Estimates 6D object poses on the sample scene data (in `data/sample`) with pre-computed object segmentation results from [Deep Learning FCN ROS Package](#deep-learning-fcn-ros-package).
1.


## Documentation
* [6D Pose Estimation ROS Package](#6d-pose-estimation-ros-package)
* [Realsense Standalone](#realsense-standalone)
* [Realsense ROS Package](#realsense-ros-package)
* [Deep Learning FCN ROS Package](#deep-learning-fcn-ros-package)
* [FCN Training with Marvin](#fcn-training-with-marvin)
* [Evaluation Code](#evaluation-code)

## 6D Pose Estimation ROS Package
A Matlab ROS Package for estimating 6D object poses by model-fitting with ICP on RGB-D object segmentation results.

### Dependencies
1. [Deep Learning FCN ROS Package](#deep-learning-fcn-ros-package) and all of its respective dependencies.
2. Recommended: Matlab 2015b or later

### Compilation
1. Copy the ROS package `ros_packages/.../pose_estimation` into your catkin workspace source directory (e.g. `catkin_ws/src`)
2. Follow the instructions on the top of `pose_estimation/src/make.m` to compile ROS custom messages for Matlab
3. Compile a GPU CUDA kernel function in `pose_estimation/src`:
```shell
nvcc -ptx KNNSearch.cu
```

### Usage
* Start `roscore`
* To start the pose estimation service, run `pose_estimation/src/startService.m`. At each call (see service request format described in `pose_estimation/srv/EstimateObjectPose.srv`), the service:
* Calibrates the camera poses of the scene using calibration data
* Perform 3D background subtraction
* For each object in the scene, use model-fitting to estimate its 6D pose

### Demo
1. Install all dependencies and compile this package
2. Start `roscore` in terminal
3. Create a temporary directory to be used by marvin_convnet for reading RGB-D data and saving segmentation masks
* `mkdir /path/to/your/data/tmp`
4. `rosrun marvin_convnet detect _read_directory:="/path/to/your/data/tmp"`
5. Navigate to `pose_estimation/src`
6. Edit file paths and options on the top of `demo.m`
6. Open Matlab and run:
```shell
startService.m
demo.m
```

## Realsense Standalone

A standalone C++ executable for streaming and capturing data (RGB-D frames and 3D point clouds) in real-time using [librealsense](https://github.com/IntelRealSense/librealsense). Tested on Ubuntu 14.04 and 16.04 with an Intel® RealSense™ F200 Camera.
Expand Down Expand Up @@ -63,7 +105,7 @@ See `ros-packages/realsense_camera`
* Used for saving point clouds

### Compilation
1. Copy the ROS package `ros_packages/realsense_camera` into your catkin workspace source directory (e.g. `catkin_ws/src`)
1. Copy the ROS package `ros_packages/.../realsense_camera` into your catkin workspace source directory (e.g. `catkin_ws/src`)
2. If necessary, configure `realsense_camera/CMakeLists.txt` according to your respective dependencies
3. In your catkin workspace, compile the package with `catkin_make`
4. Source `devel/setup.sh`
Expand Down Expand Up @@ -105,26 +147,37 @@ sudo cp cuda/include/* /usr/local/cudnn/v5/include/
* Used for saving images

### Compilation
1. Copy the ROS package `ros_packages/marvin_fcn` into your catkin workspace source directory (e.g. `catkin_ws/src`)
1. Copy the ROS package `ros_packages/.../marvin_convnet` into your catkin workspace source directory (e.g. `catkin_ws/src`)
2. If necessary, configure `realsense_camera/CMakeLists.txt` according to your respective dependencies
3. In your catkin workspace, compile the package with `catkin_make`
4. Source `devel/setup.sh`

change location of where net is loaded
### Usage
* Navigate to `models/competition/` and run bash script `./download_weights.sh` to download our trained weights for object segmentation (trained on our [training dataset](http://www.cs.princeton.edu/~andyz/apc2016))
* Edit `marvin_convnet/src/detect.cu`: Towards the top of the file, specify the filepath to the network architecture .json file and .marvin weights.
* Create a folder called `tmp` in `apc-vision-toolbox/data` (e.g. `apc-vision-toolbox/data/tmp`). This where marvin_convnet will read/write RGB-D data. The format of the data in `tmp` follows the format of the scenes in our [datasets](http://www.cs.princeton.edu/~andyz/apc2016) and the format of the data saved by [Realsense Standalone](#realsense-standalone).
* marvin_convnet offers two services: `save_images` and `detect`. The former retrieves RGB-D data from the [Realsense ROS Package](#realsense-ros-package) and writes to disk in the `tmp` folder, while the latter reads from disk in the `tmp` folder and feeds the RGB-D data forward through the FCN and saves the response images to disk
* To start the RGB-D data saving service, run:

make sure where data is going to be read and written
```shell
rosrun marvin_convnet save_images _write_directory:="/path/to/your/data/tmp" _camera_service_name:="/realsense_camera"
```

ros package to compute hha
* To start the FCN service, run:

`rosrun marvin_convnet detect _service_mode:=1 _write_directory:="/home/andyz/apc/toolbox/data/tmp"`
```shell
rosrun marvin_convnet detect _read_directory:="/path/to/your/data/tmp" _service_name:="/marvin_convnet"
```

`rosrun marvin_convnet detect _service_mode:=2 _read_directory:="/home/andyz/apc/toolbox/data/tmp" _write_directory:="/home/andyz/apc/toolbox/data/tmp"`
* Example ROS service call to do object segmentation for glue bottle and expo marker box (assuming the scene's RGB-D data is in the `tmp` folder):

`rosservice call /marvin_convnet ["elmers_washable_no_run_school_glue","expo_dry_erase_board_eraser"] 0 0`
```shell
rosservice call /marvin_convnet ["elmers_washable_no_run_school_glue","expo_dry_erase_board_eraser"] 0 0
```

## FCN Training with Marvin

Code and models for training object segmentation using [FCNs (Fully Convolutional Networks)](https://arxiv.org/abs/1411.4038) with [Marvin](http://marvin.is/), a lightweight GPU-only neural network framework. Includes network architecture .json files in `convnet-training/models` and a Marvin data layer in `convnet-training/apc.hpp` that randomly samples images (RGB and HHA) from the segmentation training dataset [here](http://www.cs.princeton.edu/~andyz/apc2016).
Code and models for training object segmentation using [FCNs (Fully Convolutional Networks)](https://arxiv.org/abs/1411.4038) with [Marvin](http://marvin.is/), a lightweight GPU-only neural network framework. Includes network architecture .json files in `convnet-training/models` and a Marvin data layer in `convnet-training/apc.hpp` that randomly samples RGB-D images (RGB and HHA) from our [segmentation training dataset](http://www.cs.princeton.edu/~andyz/apc2016).

See `convnet-training`

Expand All @@ -146,19 +199,20 @@ sudo cp cuda/include/* /usr/local/cudnn/v5/include/
* Used for reading images

### Setup Instructions
1. Download segmentation training dataset from [here](http://www.cs.princeton.edu/~andyz/apc2016)
2. Specify training dataset filepath in APCData layer of network architecture in `convnet-training/models/train_shelf_color.json`
3. Navigate to `convnet-training/models/weights/` and run bash script `./download_weights.sh` to download VGG pre-trained weights on ImageNet (see [Marvin](http://marvin.is/) for more pre-trained weights)
1. Download our [segmentation training dataset](http://www.cs.princeton.edu/~andyz/apc2016)
2. Navigate to directory `convnet-training/`
2. Specify training dataset filepath in APCData layer of network architecture in `models/train_shelf_color.json`
3. Navigate to `models/weights/` and run bash script `./download_weights.sh` to download VGG pre-trained weights on ImageNet (see [Marvin](http://marvin.is/) for more pre-trained weights)
4. Navigate to `convnet-training/` and run in terminal `./compile.sh` to compile Marvin.
5. Run in terminal `./marvin train models/rgb-fcn/train_shelf_color.json models/weights/vgg16_imagenet_half.marvin` to train a segmentation model on RGB-D data with objects in the shelf (for objects in the tote, use network architecture `models/rgb-fcn/train_shelf_color.json`).

## Evaluation Code
Code used to perform the experiments in the paper - tests the full vision system on the 'Shelf & Tote' benchmark dataset.
Code used to perform the experiments in the paper; tests the full vision system on the 'Shelf & Tote' benchmark dataset.

See `evaluation`

### Setup Instructions
1. Download the full 'Shelf & Tote' benchmark dataset from [here](http://www.cs.princeton.edu/~andyz/apc2016) and extract its contents to `apc-vision-toolbox/data/benchmark` (e.g. `apc-vision-toolbox/data/benchmark/office`, `apc-vision-toolbox/data/benchmark/warehouse', etc.)
1. Download our 'Shelf & Tote' benchmark dataset from [here](http://www.cs.princeton.edu/~andyz/apc2016) and extract its contents to `apc-vision-toolbox/data/benchmark` (e.g. `apc-vision-toolbox/data/benchmark/office`, `apc-vision-toolbox/data/benchmark/warehouse', etc.)
2. In `evaluation/getError.m`, change the variable `benchmarkPath` to point to the filepath of your benchmark dataset directory
3. We have provided our vision system's predictions in a saved Matlab .mat file `evaluation/predictions.mat`. To compute the accuracy of these predictions against the ground truth labels of the 'Shelf & Tote' benchmark dataset, run `evaluation/getError.m`

Expand Down
90 changes: 90 additions & 0 deletions data/sample/calibration/shelf/cam.poses.A.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Camera-to-camera extrinsic matrix (camera pose) from frame-000000 to frame-000007
9.06193058e-01 3.42857864e-02 -4.21471976e-01 1.54523532e-01
-9.89394920e-02 9.86232759e-01 -1.32498762e-01 4.56576146e-02
4.11126646e-01 1.61769681e-01 8.97109498e-01 4.00954915e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000001 to frame-000007
9.74033990e-01 1.72623355e-02 -2.25742770e-01 8.42798845e-02
-5.44936063e-02 9.85651131e-01 -1.59756985e-01 5.61639449e-02
2.19745838e-01 1.67910271e-01 9.60998391e-01 1.65755026e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000002 to frame-000007
9.99941171e-01 9.65149581e-03 4.95001832e-03 -6.76798065e-04
-8.57582876e-03 9.82882741e-01 -1.84032532e-01 6.82766391e-02
-6.64147679e-03 1.83979255e-01 9.82907689e-01 8.37727195e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000003 to frame-000007
9.67112011e-01 6.05468410e-03 2.54278781e-01 -1.01311126e-01
3.73703670e-02 9.85484988e-01 -1.65598289e-01 6.27228817e-02
-2.51590567e-01 1.69654586e-01 9.52848103e-01 2.41726814e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000004 to frame-000007
8.85187346e-01 9.49438568e-03 4.65137849e-01 -1.78356751e-01
6.75713285e-02 9.86566465e-01 -1.48730377e-01 5.92754124e-02
-4.60301507e-01 1.63084230e-01 8.72654603e-01 5.42887834e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000005 to frame-000007
9.08533859e-01 -9.44479893e-02 4.06996074e-01 -1.42541407e-01
9.68736841e-02 9.95188207e-01 1.46942527e-02 5.83083812e-03
-4.06425536e-01 2.60769830e-02 9.13311707e-01 3.42312243e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000006 to frame-000007
9.75567915e-01 -4.42249756e-02 2.15200824e-01 -7.68780799e-02
4.49305673e-02 9.98988808e-01 1.61447263e-03 7.44182428e-03
-2.15054615e-01 8.09406742e-03 9.76568481e-01 1.14108454e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000007 to frame-000007
1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 1.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000008 to frame-000007
9.77786599e-01 5.05112051e-02 -2.03425627e-01 7.39230849e-02
-5.01212684e-02 9.98718104e-01 7.07162812e-03 -5.35480146e-03
2.03522053e-01 3.28140726e-03 9.79064863e-01 6.79063607e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000009 to frame-000007
9.06949455e-01 9.70696317e-02 -4.09902638e-01 1.50787138e-01
-9.18811102e-02 9.95243091e-01 3.23890512e-02 -1.23167106e-02
4.11096762e-01 8.28707713e-03 9.11554045e-01 3.04464373e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000010 to frame-000007
9.04354087e-01 1.57968695e-01 -3.96471408e-01 1.47415385e-01
-8.15376054e-02 9.75816796e-01 2.02813212e-01 -6.94670376e-02
4.18921598e-01 -1.51087628e-01 8.95364297e-01 3.55340376e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000011 to frame-000007
9.76539943e-01 7.69014782e-02 -2.01136525e-01 7.61511874e-02
-4.03811507e-02 9.82885646e-01 1.79736388e-01 -6.45838775e-02
2.11516197e-01 -1.67397638e-01 9.62932463e-01 1.10402609e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000012 to frame-000007
9.99998641e-01 -1.56633748e-03 -5.15200283e-04 -6.83942273e-05
1.63268064e-03 9.84298747e-01 1.76503003e-01 -5.78220348e-02
2.30647723e-04 -1.76503604e-01 9.84299967e-01 5.96666405e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000013 to frame-000007
9.75575029e-01 -8.57247925e-02 2.02248911e-01 -7.19359239e-02
5.06092233e-02 9.83654260e-01 1.72809152e-01 -5.54876912e-02
-2.13757031e-01 -1.58352634e-01 9.63966999e-01 1.39479082e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000014 to frame-000007
8.99900305e-01 -1.58189084e-01 4.06393473e-01 -1.49015967e-01
8.56453963e-02 9.77850486e-01 1.90979825e-01 -5.60021794e-02
-4.27602978e-01 -1.37057073e-01 8.93516117e-01 4.19620656e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

90 changes: 90 additions & 0 deletions data/sample/calibration/shelf/cam.poses.B.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Camera-to-camera extrinsic matrix (camera pose) from frame-000000 to frame-000007
9.05247257e-01 2.89516540e-02 -4.23897635e-01 1.49310311e-01
-9.97035745e-02 9.84292822e-01 -1.45694330e-01 5.84530536e-02
4.13021308e-01 1.74153502e-01 8.93914961e-01 4.00667733e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000001 to frame-000007
9.76027409e-01 1.84703952e-02 -2.16862495e-01 7.64990842e-02
-5.62802670e-02 9.83922320e-01 -1.69497492e-01 6.38484133e-02
2.10245164e-01 1.77639277e-01 9.61374671e-01 1.52328476e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000002 to frame-000007
9.99939471e-01 1.01612775e-02 4.21927314e-03 -3.39140620e-03
-9.23715004e-03 9.83659092e-01 -1.79803963e-01 6.77236939e-02
-5.97736435e-03 1.79754106e-01 9.83693414e-01 7.54003432e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000003 to frame-000007
9.73498541e-01 -4.48275288e-03 2.28649285e-01 -8.80713433e-02
4.76295117e-02 9.81858050e-01 -1.83538003e-01 7.15631809e-02
-2.23678386e-01 1.89564432e-01 9.56050891e-01 2.05026384e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000004 to frame-000007
8.96544327e-01 -1.46288801e-02 4.42712397e-01 -1.66263642e-01
9.87479638e-02 9.80903326e-01 -1.67563434e-01 7.03188472e-02
-4.31806797e-01 1.93944994e-01 8.80867884e-01 4.99335242e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000005 to frame-000007
9.05536125e-01 -8.34244741e-02 4.15986397e-01 -1.53295248e-01
8.94276848e-02 9.95980412e-01 5.07022091e-03 4.25375109e-03
-4.14737284e-01 3.26094322e-02 9.09356701e-01 3.79101188e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000006 to frame-000007
9.76479572e-01 -4.14692150e-02 2.11584378e-01 -7.83460624e-02
4.43033703e-02 9.98980503e-01 -8.66982143e-03 5.93579989e-03
-2.11009138e-01 1.78398046e-02 9.77321280e-01 1.13443493e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000007 to frame-000007
1.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 1.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 1.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000008 to frame-000007
9.78630296e-01 5.52592281e-02 -1.98063527e-01 6.69496630e-02
-5.35004321e-02 9.98466509e-01 1.42244501e-02 -4.07535827e-03
1.98545830e-01 -3.32399352e-03 9.80085968e-01 6.43911432e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000009 to frame-000007
9.14591958e-01 1.00486098e-01 -3.91693878e-01 1.33637585e-01
-9.54076667e-02 9.94908817e-01 3.24626480e-02 -3.41561345e-03
3.92961738e-01 7.68052220e-03 9.19522747e-01 2.85597281e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000010 to frame-000007
9.05477435e-01 1.57156091e-01 -3.94224020e-01 1.41790888e-01
-7.92369450e-02 9.75178771e-01 2.06755584e-01 -6.22180308e-02
4.16931794e-01 -1.55975409e-01 8.95454941e-01 3.54998236e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000011 to frame-000007
9.77589946e-01 8.00082873e-02 -1.94721781e-01 6.67172086e-02
-4.35950010e-02 9.81853397e-01 1.84562680e-01 -5.99440969e-02
2.05954787e-01 -1.71937724e-01 9.63337970e-01 1.09906624e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000012 to frame-000007
9.99960437e-01 -2.84680346e-03 8.42732460e-03 -5.87763588e-03
1.34444403e-03 9.84890233e-01 1.73174543e-01 -5.68454979e-02
-8.79298358e-03 -1.73156362e-01 9.84855095e-01 4.41092430e-03
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000013 to frame-000007
9.73023063e-01 -8.22832744e-02 2.15535571e-01 -8.16399246e-02
4.55981708e-02 9.84396366e-01 1.69954706e-01 -5.81909463e-02
-2.26156862e-01 -1.55541820e-01 9.61592333e-01 1.35834602e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

# Camera-to-camera extrinsic matrix (camera pose) from frame-000014 to frame-000007
9.01818422e-01 -1.67823869e-01 3.98194279e-01 -1.44817903e-01
9.78829696e-02 9.76885125e-01 1.90037830e-01 -6.06841190e-02
-4.20882951e-01 -1.32403177e-01 8.97400100e-01 4.05301648e-02
0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e+00

Loading

0 comments on commit d94666c

Please sign in to comment.