Skip to content

Stereo Visual Odometry based SLAM demonstrated on the KITTI dataset. Based on OpenCV, Eigen, Sophus, Ceres Solver and ROS.

Notifications You must be signed in to change notification settings

apresland/visual-slam

Repository files navigation

Stereo Visual Odometry

Visual odometry (VO) is the process of estimating the egomotion of an agent (e.g., vehicle, human, and robot) using only the input of a single or multiple cameras attached to it. Application domains include robotics, wearable computing, augmented reality, and automotive. The advantage of VO with respect to wheel odometry is that VO is not affected by wheel slip in uneven terrain or other adverse conditions. It has been demonstrated that compared to wheel odometry, VO provides more accurate trajectory estimates, with relative position error ranging from 0.1 to 2%.

Project Desription

This project provides a complete Stereo Visual Odometry (VO) frontend providing pose estimation and demonstrated using the KITTI dataset. The solution publishes estimated poses and perception results to ROS topics that can be visualized using RViz running in a dedicated docker instance. The project dependencies are summarised as follows:

  • Implemented in C++(17) using CMake.
  • OpenCV for FAST feature detection, KLT optical-flow, PnP and RANSAC.
  • Robot Operating System (ROS) as middleware.
  • Sophus for Lie Algebra and Eigen for Linear Algebra.
  • Ceres Solver for local pose-graph optimization.
  • Rviz to visualize perception and estimated poses.
  • Docker to ease dependency management and provide modular services.
  • Docker-Compose to run the multi-container application and provide a dedicated network.
  • CLion configured to conect to a remote docker target.

Project structure

.
├── CMakeLists.txt
├── docker_build.sh
├── docker_up.sh
├── docker_down.sh
├── docker-compose.yaml
├── Dockerfile_odometry
├── Dockerfile_visualizer
├── odometry.env
├── odometry.rviz
├── src
│   ├── backend.cpp
│   ├── backend.h
│   ├── CMakeLists.txt
│   ├── context.h
│   ├── frontend.cpp
│   ├── frontend.h
│   ├── map
│   │   ├── CMakeLists.txt
│   │   ├── map.cpp
│   │   └── map.h
│   ├── optimize
│   │   ├── CMakeLists.txt
│   │   ├── optimization.cpp
│   │   └── optimization.h
│   ├── package.xml
│   ├── sensor
│   │   ├── camera.cpp
│   │   ├── camera.h
│   │   ├── CMakeLists.txt
│   │   ├── feature.h
│   │   ├── frame.h
│   │   ├── mappoint.cpp
│   │   └── mappoint.h
│   ├── sequence.cpp
│   ├── sequence.h
│   ├── solve
│   │   ├── CMakeLists.txt
│   │   ├── detector.cpp
│   │   ├── detector.h
│   │   ├── estimation.cpp
│   │   ├── estimation.h
│   │   ├── matcher.cpp
│   │   ├── matcher.h
│   │   ├── tracker.cpp
│   │   ├── tracker.h
│   │   ├── triangulator.cpp
│   │   └── triangulator.h
│   ├── system.cpp
│   ├── system.h
│   ├── vizualization.cpp
│   └── vizualization.h

Problem Formulation

The vehicle is in motion and taking images with a rigidly attached camera system at discrete time instants k. This results in a left and a right image at every time instant, denoted by Il,0:n = {Il,0, ... , Il,n} and Ir,0:n = {Ir,0, ... , Ir,n} as shown in the illustration.

Two camera positions at adjacent time instants k1 and k are related by the rigid body transformation Tk,k-1 of the following form:

|  Rk,k-1   tk,k1  |
|     0          1  |

where Rk,k1 an element of SO(3) is the rotation matrix, and tk,k-1 is the translation vector. The set T1:n = {T1,0, ... , Tn,n-1} contains all subsequent motions. The set of camera poses C0:n = {C0, ... , Cn} contains the transformations of the camera with respect to the initial coordinate frame at k = 0. The current pose Cn can be computed by concatenating all the transformations Tk (k = 1 ... n), and, therefore, Cn = Cn-1Tn, with C0 being the camera pose at the instant k = 0.

The task of VO is to compute the relative transformations Tk from the images Ik and Ik-1 and then to concatenate the transformations to recover the full trajectory C0:n of the camera. This means that VO recovers the path incrementally, pose after pose.

Algorithm Description

The VO algorithm with 3-D-to-2-D correspondences is summarized as follows.

Algorithm: Visual Odometry from 3-D-to-2-D Correspondences.

  1. Do only once:
    1.1     Capture two frames Ik-2, Ik-1
    1.2     Detect and match features between frames
    1.3     Triangulate features from Ik-2, Ik-1

  2. Do at each iteration:
    2.1     Capture new frame Ik
    2.2     Detect features and match with previous frame Ik-1
    2.3     Compute camera pose (PnP) from 3-D-to-2-D matches
    2.4     Triangulate all new feture matches between Ik and Ik-1
    2.5     Go to 2.1

1. Image Capture

Capture a stereo image pair at time k and k-1 and process the images to compensate for lens distortion. Perform stereo rectification so that epipolar lines become parallel to horizontal. In KITTI dataset the input images are already corrected for lens distortion and stereo rectified.

2. Feature Detection

Generate features on the left camera image Il,k-1 using FAST (Features from Accelerated Segment Test) corner detector. FAST is computationally less expensive than other feature detectors like SIFT and SURF. Apply feature bucketing where the image is divided into non-overlapping rectangles and a constant number of feature points with maximal response values are then selected from each bucket. Bucketing has two benefits: i) Input features are well distributed throughout the image which results in higher accuracy in motion estimation. ii) The computation complexity of algorithm is reduced by the smaller sample of features.

3. Feature Matching (Tracking)

Egomotion requires features to be matched between the left and right images at time k-1 and with the left image at time k. Therefore we match in a ’circle’ starting from features detected in the current left image, the best match is found in the previous left image, next in the previous right image, the current right image and last in the current left image again. A circle match gets accepted, if the last feature coincides with the first feature. We use KLT (Kanade-Lucas-Tomasi) method for matching. Features from image at time k-1 are tracked at time k using a 15x15 search windows and 3 image pyramid level search. KLT tracker outputs the corresponding coordinates for each input feature and accuracy and error measure by which each feature was tracked. Feature points that are tracked with high error or lower accuracy are dropped from further computation.

Motion Estimation

Motion estimation is the computation of the camera motion between the current image and the previous image. Concatenating these single movements recovers the full trajectory of the vehicle. The transformation Tk between two images Ik-1 and Ik can be computed from two sets of corresponding features fk-1 and fk at time instants k-1 and k respectively. In general feature correspondences can be specified in 2-D or 3-D but we choose 3D-to-2D correspondences where fk-1 are specified in 3-D and fk are their corresponding 2-D reprojections on the image Ik. The general formulation in this case is to find Tk that minimizes the sum of the image reprojection error

argmin[ sum( || pk - pk-1 ||2 ) ]

where pk-1 is the reprojection of the 3-D point Xk-1 into image Ik according to the transformation Tk. This problem is known as perspective from n points (PnP).

Triangulation and Keyframe Selection

The previous motion estimation methods require triangulation of 3-D points (structure) from 2-D image correspondences. Triangulated 3-D points are determined by intersecting back-projected rays from 2-D image correspondences of at two image frames. In perfect conditions, these rays would intersect in a single 3-D point. However, because of image noise, camera model and calibration errors, and feature matching uncertainty, they never intersect. Therefore, the point at a minimal distance, in the least-squares sense, from all intersecting rays can be taken as an estimate of the 3-D point position.

About

Stereo Visual Odometry based SLAM demonstrated on the KITTI dataset. Based on OpenCV, Eigen, Sophus, Ceres Solver and ROS.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published