This project aims to build intuitions behind the creation of a large composite image by stitching together photographs. More specifically, in the first part of the project, I
In the second part, I automated the correspondence-determination process in Part 1 so that I can automatically stitch images together to create panoramas. That is, I:
Here are the series of pictures that I took to blend into mosaics:
From lecture, we define an projective transformation using a homography matrix as follow: \[H\mathbf{p} = \mathbf{p'}\] \[\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & 1 \end{bmatrix}\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix}\]
Exapnding this matrix multiplication out, we get the following system of equations: \[ \begin{cases} ax + by + c = x' \\ dx + ey + f = y' \\ gx + hy + 1 = 1 \end{cases} \Rightarrow \begin{cases} ax + by + c - gxx' - hyx' = x' \\ dx + ey + f - gxy' - hyy' = y' \end{cases} \]
Rewriting the system of equations as matrix multiplication, we obtain \[\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx'\\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix}\begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \]
Since each point gives up two equations and we need at least 4 equations to solve for \(H\), we need at least 4 correspondence points and the matrix multiplication becomes \[\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3x_3' & -y_3x_3' \\ 0 & 0 & 0 & x_3 & y_3 & 1 & -x_3y_3' & -y_3y_3' \\ x_4 & y_4 & 1 & 0 & 0 & 0 & -x_4x_4' & -y_4x_4' \\ 0 & 0 & 0 & x_4 & y_4 & 1 & -x_4y_4' & -y_4y_4' \\ \end{bmatrix}\begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ x_3' \\ y_3' \\ x_4' \\ y_4' \end{bmatrix} \]
Since I used more than 4 correspondence points (on averaged, I had about 14 correspondence points for each pair of images), the system of equation will be overconstrained. As a result, I used np.linalg.lstsq
to solve for least squares solution. I then just reorganized the entries from the least squares vector to obtain an approximate \(H\).
Given an image to be warped im
and a homography matrix \(H\), I warp the image as follow:
im
. This gives the polygon where the projected im
will be in the resulting warped image.im
for each pixel in the resulting image by applying the inverse of the homography matrix to each pixel of the resulting image (inverse warping guarantees the smoothness in the resulting image).scipy.interpolate.griddata
to obtain the final warped resulting image.To rectify an image, I completed the following steps:
warpImage
from the previous part to recover the original dataHere are some results of me applying rectification on different rectangular objects/surfaces:
Rectification of my monitor
Rectification of my laptop
Rectification of a book
To blend two images together, I completed the following procedure:
scipy.ndimage.distance_transform_edt
for each image and created a mask where a pixel value is 1 if the distance metric for that pixel is greater for the warped image and 0 otherwise.To blend three images together where the center image is chosen to be the reference image, I completed the following procedure:
Notice that the color a little off in some of the blended mosaics because the lighting conditions of the input images weren't consistent (also potentially because iPhone camera automatically adjusts the focus and exposure of the photos).
Here are the results of blending the images into a mosaic where for each group of images, the first row contains the input images and the second row displays the resulting warped and blended image:
To identify all the corners/interest points of an image, I used the functions provided by course staff, which outputs a list of coordinates with higher corner strength than their immediate neighbors. Notice that without further filtering, there are too many possible candidates for Harris corners, which is too computationally inefficient to work with. Here are the Harris corners on two example images:
Images of Soda Hall with labeled Harris Corners
I then impletemed Adaptive Non-Maximal Suppression on the list of Harris corners, which restricted the maximum number of interest points and ensured that the resulting points are spatially well distributed over the input image. Here are the remaining keypoints after ANMS:
Images of Soda Hall with labeled Harris Corners after ANMS
To extract features from each Harris corner after ANMS, I extracted axis-aligned 40x40 patches and downsampled them with anti_aliasing=True
to obtain 8x8 feature patches and avoid aliasing. I then normalized the descriptors to ensure that their mean is 0 and their variance is 1. Here are some examples of the feature descriptors of an image of Soda Hall:
Examples of feature descriptors of an image of Soda Hall
To match the features, I first flattened them from 8x8 patches to 64-dimensional vectors. I computed the Euclidean distance between each vector in the first descriptor and all vectors in the second descriptor to obtain a list of distances
. I then sorted this list in ascending order to retrieve the distance of the 1-NN and 2-NN and used Lowe's technique to determine matches: If the ratio of 1-NN over 2-NN is below the threshold threshold
, then the 1-NN is considered to be a match. Here are the matched features between two images of Soda Hall:
Feature matches with threshold \(t=0.5\)
Even after using Lowe's technique to reduce the number of outliers, there remains inaccurate pair of correspondence points, mainly due to the fact that Euclidean distance, or least-squares, is still vulnerable to outliers. As a result, I implemented RANSAC as followed:
computeH
from part Aepsilon
, add that pair of points to the list of inliers. If the list of inliers for this homography has greater size than the current list of best correspondence points, set this to be the list of best correspondence points. num_times
times and output the resulting list of best correspondence pointsHere is the result of running RANSAC on the set of matched features from Lowe's technique:
Feature matches with threshold \(t=0.5\) after RANSAC
Here are the automatically and manually stitched results side by side for comparison:
We notice that both automatically and manually mosaics yield pretty similar results, illustrating the robustness of the involved algorithms such as Harris corners, ANMS, or RANSAC.
The coolest things I have learned from this project are rectification and automatic stitching through feature descriptors extraction and matching. After project 3 and course lectures, I was already intrigued by the extended application of linear algebra in warping one shape into another. While working on this project, I was particularly surprised and amazed by how much we could stretch the image with the homography matrix to accurately rectify a rectangular surface without encountering aliasing problems even when the surface is extremely skewed.
Similarly, from project 3 to project 4A, the typical procedure of determining the transformation matrix given two images includes manually defining correspondence points through a mouse-clicking interface, which involves a decent amount of tedious work, and recovering the transformation matrix using the numpy
library. This procedure is effectively and sufficiently automated in project 4B, where I implemented interest point detection with Harris corners and ANMS, feature descriptor extraction with Gaussian blurring, and robust feature matching with Lowe's technique and RANSAC. Thanks to this much needed automation, I was able explore mosaicing images with fewer corners (I didn't include results for these images in this writeup).