This repository contains my personal notes and notebooks used as part of the course 02504 Computer Vision, at DTU in Spring 2024. Syntax for writing markdown can be found at ashki23.
Week | Topic |
---|---|
Week 1: Pinhole and homogeneous | Homogenous coordinates pinhole model projection |
Week 2: Camera model and homography | Camera model Lens distortion Homography Point normalization |
Week 3: Multiview geometry | Epipolar Triangulation |
Week 4: Camera calibration | Linear camera calibration |
Week 5: Nonlinear and calibration | Levenberg-Marquardt Gradients Rotations in optimization |
Week 6: Simple Features | Harris corner Gaussian filtering Gaussian derivative |
Week 7: Robust model fitting | Hough line Hough transform RANSAC |
Week 8: SIFT features | BLOB detection Scale space pyramid Difference of Gaussians SIFT features |
Week 9: Geometry Constrained Feature Matching |
Esimate Fundamental matrix using RANSAC Sampson's distance |
Week 10: Image Stitching | Estimate Homography Panoramas |
Clone the repository.
git clone https://github.com/yufanana/ComputerVision02504.git
cd ComputerVision02504
Create an environment with Conda
.
conda create --name cv
pip install -r requirements.txt
Or, create a Python virtual environment with virtualenv
.
pip install virtualenv
virtualenv .venv
source .venv/bin/activate
pip install -r requirements.txt
# On Windows
.venv/Scripts activate
pip install -r requirements.txt
When running the Jupyter notebooks, select kernel of the environment previously created.
Set up pre-commit for code checking. Hooks can be found in .pre-commit-config.yaml
conda activate cv
pre-commit install
pre-commit run --all-files
pre-commit run --file my_file.ipynb
Now, pre-commit
will run every time before a commit. Remember to do git add .
to add the changes made by the pre-commit (if any). Hooks can be temporarily disabled using:
git commit -m "<message>" --no-verify
- Levenberg-Marquardt: least squares problem with 2nd order approximation using only 1st order derivatives
- Gradients: analytical or finite differences (Taylor series)
- Rotations in Optimization (euler angles, axis angles, quaternions)
Problems with image correspondence
- Scale, rotation, translation
$\to$ appearance changes depending on distance and pose of camera - Other issues: occlusion, lighting intensity, lighting diretion, clutter
- Key points/interest points/feature points: coordinate position of points in an image
- Descriptors: characterizes pattern around a point (usually a vector)
Convolution
- Synonymous with filtering.
- Is commutative
$I_g = g * I = I * g$ - Is separable $I_g = (gg^T) * I = g * (g^TI)$
- Size of Gaussian filter
- Uses an empiric rule of
$3\sigma$ or$5\sigma$ - size = $ 2 \cdot rule \cdot \sigma + 1$
- e.g. 5-rule,
$\sigma=2$ , size$= 2 \cdot 5 \cdot 2 + 1 = 21$
- Uses an empiric rule of
Derivative of Gaussian
Derivative of blurred image
Harris Corners
- Points with locally maximum change from a small shift
- A local area where
$\Delta I(x,y,\Delta_x, \Delta_y)^2$ is large no matter$\Delta_x, \Delta_y$
Harris corner measure
- approximated using Taylor series expansion
$$ \begin{align*}
c(x,y,\Delta_xm \Delta_y) &= g * \Delta I(x,y,\Delta _x, \Delta _y) \ &= g * (I(x,y,) - I(x+\Delta_x, y+ \Delta_y))^2 \ &\approx g * (\begin{bmatrix}I_x & I_y\end{bmatrix} \begin{bmatrix} \Delta_x \ \Delta_y \end{bmatrix})^2 \ &= g * (\begin{bmatrix}I_x & I_y\end{bmatrix} \begin{bmatrix}I_x^2 & I_x I_y \ I_x I_y & I_y^2\end{bmatrix} \begin{bmatrix} \Delta_x \ \Delta_y \end{bmatrix}) \ &= \begin{bmatrix}I_x & I_y\end{bmatrix} \begin{bmatrix} g * (I_x^2) & g * (I_x I_y) \[0.3em] g * (I_x I_y) & g * (I_y^2) \[0.3em] \end{bmatrix} \begin{bmatrix} \Delta_x \ \Delta_y \end{bmatrix}
\end{align*} $$
Structure tensor
$$ \begin{align*}
C(x,y) &= \begin{bmatrix} g * (I_x^2) & g * (I_x I_y) \[0.3em] g * (I_x I_y) & g * (I_y^2) \[0.3em] \end{bmatrix} \ &= \begin{bmatrix} a & c \[0.3em] c & b \[0.3em] \end{bmatrix} \
\end{align*} $$
Use eigenvalues
Harris corner metric
$$ \begin{align*}
r(x,y) &= \lambda_1 \lambda_2 - k(\lambda_1 + \lambda_2)^2 \ &= ab - c^2 - k(a+b)^2
\end{align*} $$
typically
- Corners are at points with
$r(x,y) > \tau$ - threshold
$\tau$ is about$0.1\cdot max(r(x,y)) < \tau < 0.8 \cdot max (r(x,y))$ - Find local maximum using non-max suppression
Canny Edges
- Metric is the gradient magnitude
- Choose
$\tau_1 > \tau_2$ - seed: labels edges with a
$m(x,y) > \tau_1$ - grow: grow edges with a
$m(x,y) > \tau_2$ , label iff new points are next to previously labelled edges
Hough Lines
- Vertical lines are undefined in cartesian
$\Rightarrow$ use polar coordinates - Represent a line with
$\theta, r$ -
$\theta$ is the angle between norm and x-axis -
$r$ is the norm distance of the line from the origin
-
Hough Transform
- Each point votes for all the possible lines that go through it
- Each point corresponds to a line in Hough space
- Peak in hough space
$\Rightarrow$ line in image - Find peaks using non-max suppression in a region
- Hough space not practical for more than 3 DoF
Random sample consensus, RANSAC
- Randomly sample minimum number of points needed to fit the model
- e.g. 2 data points for a line
- e.g. 8 corresponding data points for fundamental matrix
- Fit the model to the random samples
- Measure inliers that are close to the model below a threshold
$\rarr$ indicates good fit of model- e.g. euclidean distance for a line
- e.g. sampson distance for fundamental matrix
- Consensus is the number of inliers
- Keep track of best model and best inliers with the highest consensus
- Refit model to all inliers of the best model
RANSAC iterations
- Estimate the upper bound of the number of iterations required to have at least one sample with only inliers
- The estimate is updated while running RANSAC
-
$\hat \epsilon = 1 - \frac{s}{m}$ - s: no. of inliers of best model
- m: total no. of data points
-
$\hat N = \frac{log(1-p)}{log((1-(1-\hat \epsilon)^n))}$ - p: probability that at least one of N samples has only inliers, e.g.
$0.99$
- p: probability that at least one of N samples has only inliers, e.g.
- Terminate once there are more than
$\hat N$ iterations
Model | Codimension | |
---|---|---|
Line | 1 | |
Fundamental Matrix | 1 | |
Essential Matrix | 1 | |
Homography | 2 | |
Camera Matrix | 2 |
See examples in ex8.ipynb
SIFT: features localized at interest points, adapted to scale, inavariance to appearance changes
- scale-space blob detection using difference of Gaussians (DoG)
- interest point localization
- orientation assignment
- interest point descriptor
BLOB: binary large object
Hessian matrix
- contains 2nd order derivatives of images
- measures curvature
- eigenvalues and eigenvectors are used to measure the direction of most change
- the Laplacian is used to estimate the eigenvalues (?)
- the Laplacian is approximated with DoG
Difference of Gaussians (DoG)
- iteratively blurring already blurred images (efficient)
- scale invariance: allows features to be detected at different scales
- kernel size increase with each iteration
$DoG = L(x,y,k\sigma) - L(x,y,\sigma)$ - the same threshold can be applied for all scale spaces
- find local extrama of DoGs in scale space
- use absolute to find min & max simultaneously
Orientation assignment
- compute orientation of gradient around BLOB
- compute circular histogram of gradient orientations
- use histogram peak to assign orientation of point
Matching descriptors
- Use Euclidean distance between normalized vectors
- Cross checking: keep matches that are closest to each other
- Ratio test: compute ratio betwen closest and 2nd closest match, keep if it is below threshold e.g. 0.7
Variations
- RootSIFT: Hellinger kernel
- SURF, ORB, BRIEF, BRISK
Fundamentral and Essential Matrix
Estimate F by solving $0 = B^{(i)} flatten(F) $ using SVD, where:
$$
\begin{align*}
B^{(i)} &= [x_{1i} x_{2i} \ \
y_{1i}x_{2i} \ \
x_{2i} \ \
x_{1i}y_{2i} \ \
y_{1i}y_{2i} \ \
y_{2i} \ \
x_{1i} \ \
y_{1i} \ \
1] \
flatten(F) &= [F_{11} \ \ F_{12} \ \ F_{13} \ \ F_{21} \ \ F_{22} \ \ F_{23} \ \ F_{31} \ \ F_{32} \ \ F_{33}]^T
\end{align*} $$
F has 9 DoF, scale invariant
It is also possible to estimate using 7 points.
$ q_2i^TFq_1i $ is the distance from the epipolar lines.
Use Sampson's distance to measure distance from model.
$$ d_{Samp} (F, q_{1i}, q_{2i}) = \dfrac{(q_{2i}^T F q_{1i})^2} {(q_{2i}^T F)1^2 + (q{2i}^T F)2^2 + (F q{1i})1^2 + (F q{1i})_2^2} $$
Threshold for RANSAC
- Assume each sample has error with m-dimensional normal distribution
- Choose a confidence level e.g. 95%
- Look up CDF for
$\chi_m^2$ distribution
RANSAC Workflow
- Find features in both images using SIFT
- Match features using brute force matcher (e.g. 1000 matches)
- Sample 8 of these matching features (8 points from image 1, 8 points from image 2)
- Estimate fundamental matrix using SVD
- Compute sampson distance to estimated F for all matches
- Classify matches as inliers if distance < threshold
- Repeat for fixed number of iterations
- Refit fundamental matrix on set of best inliers