[AUTO]STITCHING PHOTO MOSAICS

Quang Nguyen, SID: 3036566521

Project Overview

This project aims to build intuitions behind the creation of a large composite image by stitching together photographs. More specifically, in the first part of the project, I

Shot and digitized pictures.
Recovered homographies for projective warping.
Warped images.
Blended images into a mosaic by resampling and compositing them

In the second part, I automated the correspondence-determination process in Part 1 so that I can automatically stitch images together to create panoramas. That is, I:

Detected evenly spread-out Harris corners/interest points in an image with adaptive non-maximal suppression.
Extracted feature descriptors from the corners and matched the descriptors between two images.
Computed the homography matrix between two images with RANSAC
Blended images into a mosaic by resampling and compositing them using the resulting homography.

PART A: IMAGE WARPING AND MOSAICING

1. Shoot The Pictures

Here are the series of pictures that I took to blend into mosaics:

Photos of my room

Photos of the construction of the new Gateway building

Photos of 6th floor of Soda Hall

Photos of the common room of my apartment

Photos of the Helen Diller Anchor House before sunrise

Photos of the Berkeley Art Museum and Pacific Film Archive (BAMPFA) before sunrise

2. Recover Homographies

From lecture, we define an projective transformation using a homography matrix as follow: \[H\mathbf{p} = \mathbf{p'}\] \[\begin{bmatrix} a & b & c\\ d & e & f\\ g & h & 1 \end{bmatrix}\begin{bmatrix} x \\ y \\ 1 \end{bmatrix} = \begin{bmatrix} x' \\ y' \\ 1 \end{bmatrix}\]

Exapnding this matrix multiplication out, we get the following system of equations: \[ \begin{cases} ax + by + c = x' \\ dx + ey + f = y' \\ gx + hy + 1 = 1 \end{cases} \Rightarrow \begin{cases} ax + by + c - gxx' - hyx' = x' \\ dx + ey + f - gxy' - hyy' = y' \end{cases} \]

Rewriting the system of equations as matrix multiplication, we obtain \[\begin{bmatrix} x & y & 1 & 0 & 0 & 0 & -xx' & -yx'\\ 0 & 0 & 0 & x & y & 1 & -xy' & -yy' \end{bmatrix}\begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x' \\ y' \end{bmatrix} \]

Since each point gives up two equations and we need at least 4 equations to solve for \(H\), we need at least 4 correspondence points and the matrix multiplication becomes \[\begin{bmatrix} x_1 & y_1 & 1 & 0 & 0 & 0 & -x_1x_1' & -y_1x_1' \\ 0 & 0 & 0 & x_1 & y_1 & 1 & -x_1y_1' & -y_1y_1' \\ x_2 & y_2 & 1 & 0 & 0 & 0 & -x_2x_2' & -y_2x_2' \\ 0 & 0 & 0 & x_2 & y_2 & 1 & -x_2y_2' & -y_2y_2' \\ x_3 & y_3 & 1 & 0 & 0 & 0 & -x_3x_3' & -y_3x_3' \\ 0 & 0 & 0 & x_3 & y_3 & 1 & -x_3y_3' & -y_3y_3' \\ x_4 & y_4 & 1 & 0 & 0 & 0 & -x_4x_4' & -y_4x_4' \\ 0 & 0 & 0 & x_4 & y_4 & 1 & -x_4y_4' & -y_4y_4' \\ \end{bmatrix}\begin{bmatrix} a \\ b \\ c \\ d \\ e \\ f \\ g \\ h \end{bmatrix} = \begin{bmatrix} x_1' \\ y_1' \\ x_2' \\ y_2' \\ x_3' \\ y_3' \\ x_4' \\ y_4' \end{bmatrix} \]

Since I used more than 4 correspondence points (on averaged, I had about 14 correspondence points for each pair of images), the system of equation will be overconstrained. As a result, I used np.linalg.lstsq to solve for least squares solution. I then just reorganized the entries from the least squares vector to obtain an approximate \(H\).

3. Warp the Images

Given an image to be warped im and a homography matrix \(H\), I warp the image as follow:

Compute the bounding box in the resulting image by applying the homography matrix onto the four corners to im. This gives the polygon where the projected im will be in the resulting warped image.
Compute the corresponding pixels in the original im for each pixel in the resulting image by applying the inverse of the homography matrix to each pixel of the resulting image (inverse warping guarantees the smoothness in the resulting image).
Interpolate the pixels values using scipy.interpolate.griddata to obtain the final warped resulting image.

4. Image Rectification

To rectify an image, I completed the following steps:

Define 4 correspondence points using a known rectangular object in the image
Determine the height-to-width ratio of the object and use that to define the size of the ground plane rectification
Compute the homography matrix and use that with warpImage from the previous part to recover the original data

Here are some results of me applying rectification on different rectangular objects/surfaces:

Rectification of my monitor

Rectification of my laptop

Rectification of a book

5. Blend the images into a mosaic

To blend two images together, I completed the following procedure:

Computed the homography matrix between the two images and warp one image using the other as a reference.
Depending on the placement of the warped image (left, right, above, or below) of the reference image, computed the shape of the final mosaic.
Aligned the images onto the mosaic appropriately using the offsets output from the warping.
Computed a distance metric using scipy.ndimage.distance_transform_edt for each image and created a mask where a pixel value is 1 if the distance metric for that pixel is greater for the warped image and 0 otherwise.
Blended the two images together using a two-level Laplacian pyramid (similar to Project 2).

To blend three images together where the center image is chosen to be the reference image, I completed the following procedure:

Computed the shape of the final mosaic using the offsets output from warping the left and right images into the center image.
Blended the left and the center images together using the aforementioned process to create a temporary mosaic.
Blended this temporary mosaic and the right image together using the aforementioned process.

Notice that the color a little off in some of the blended mosaics because the lighting conditions of the input images weren't consistent (also potentially because iPhone camera automatically adjusts the focus and exposure of the photos).

Here are the results of blending the images into a mosaic where for each group of images, the first row contains the input images and the second row displays the resulting warped and blended image:

My Room

The Construction of the Gateway

warped and blended image of 6th floor of soda

6th Floor of Soda Hall

The Common Room of my Apartment

warped and blended image of the anchor house

Helen Diller Anchor House before Sunrise

Berkeley Art Museum and Pacific Film Archive before Sunrise

PART B: FEATURE MATCHING FOR AUTOSTITCHING

1. Harris Corner Detection

To identify all the corners/interest points of an image, I used the functions provided by course staff, which outputs a list of coordinates with higher corner strength than their immediate neighbors. Notice that without further filtering, there are too many possible candidates for Harris corners, which is too computationally inefficient to work with. Here are the Harris corners on two example images:

Images of Soda Hall with labeled Harris Corners

2. Adaptive Non-Maximal Suprression

I then impletemed Adaptive Non-Maximal Suppression on the list of Harris corners, which restricted the maximum number of interest points and ensured that the resulting points are spatially well distributed over the input image. Here are the remaining keypoints after ANMS:

harris corners of images of soda hall with anms

Images of Soda Hall with labeled Harris Corners after ANMS

3. Feature Descriptor Extraction

To extract features from each Harris corner after ANMS, I extracted axis-aligned 40x40 patches and downsampled them with anti_aliasing=True to obtain 8x8 feature patches and avoid aliasing. I then normalized the descriptors to ensure that their mean is 0 and their variance is 1. Here are some examples of the feature descriptors of an image of Soda Hall:

some feature descriptors of an image of soda hall

Examples of feature descriptors of an image of Soda Hall

4. Feature Matching

To match the features, I first flattened them from 8x8 patches to 64-dimensional vectors. I computed the Euclidean distance between each vector in the first descriptor and all vectors in the second descriptor to obtain a list of distances. I then sorted this list in ascending order to retrieve the distance of the 1-NN and 2-NN and used Lowe's technique to determine matches: If the ratio of 1-NN over 2-NN is below the threshold threshold, then the 1-NN is considered to be a match. Here are the matched features between two images of Soda Hall:

Feature matches with threshold \(t=0.5\)

5. RANSAC

Even after using Lowe's technique to reduce the number of outliers, there remains inaccurate pair of correspondence points, mainly due to the fact that Euclidean distance, or least-squares, is still vulnerable to outliers. As a result, I implemented RANSAC as followed:

Initialize an empty list that will contain the best correspondence points
Randomly sample 4 pairs of points and compute the homography matrix between the two set of points using the function computeH from part A
Transform all the points of one image to the other using the homography matrix. Compute the Euclidean distance between the transformed points and the ground truth points. If the distance between a pair is less than epsilon, add that pair of points to the list of inliers. If the list of inliers for this homography has greater size than the current list of best correspondence points, set this to be the list of best correspondence points.
Repeat this process num_times times and output the resulting list of best correspondence points

Here is the result of running RANSAC on the set of matched features from Lowe's technique:

feature matches with threshold 0.5 after ransac

Feature matches with threshold \(t=0.5\) after RANSAC

6. Results and Comparisons

Here are the automatically and manually stitched results side by side for comparison:

auto warped and blended image of my room

Automatically stitched image of my room

manual warped and blended image of my room

Manually stitched image of my room

auto warped and blended image of gateway

Automatically stitched image of the construction of the Gateway

manual warped and blended image of gateway

Manually stitched image of the construction of the Gateway

auto warped and blended image of soda hall

Automatically stitched image of the 6th floor of Soda Hall

manual warped and blended image of soda hall

Manually stitched image of the 6th floor of Soda Hall

auto warped and blended image of common room

Automatically stitched image of the 6th floor of my apartment's common room

manual warped and blended image of common room

Manually stitched image of the 6th floor of my apartment's common room

auto warped and blended image of the anchor house

Automatically stitched image of Helen Diller Anchor House before sunrise

manual warped and blended image of the anchor house

Manually stitched image of Helen Diller Anchor House before sunrise

Automatically stitched image of BAMPFA before sunrise

manual warped and blended image of bampfa

Manually stitched image of BAMPFA before sunrise

We notice that both automatically and manually mosaics yield pretty similar results, illustrating the robustness of the involved algorithms such as Harris corners, ANMS, or RANSAC.

7. What have I learned?

The coolest things I have learned from this project are rectification and automatic stitching through feature descriptors extraction and matching. After project 3 and course lectures, I was already intrigued by the extended application of linear algebra in warping one shape into another. While working on this project, I was particularly surprised and amazed by how much we could stretch the image with the homography matrix to accurately rectify a rectangular surface without encountering aliasing problems even when the surface is extremely skewed.

Similarly, from project 3 to project 4A, the typical procedure of determining the transformation matrix given two images includes manually defining correspondence points through a mouse-clicking interface, which involves a decent amount of tedious work, and recovering the transformation matrix using the numpy library. This procedure is effectively and sufficiently automated in project 4B, where I implemented interest point detection with Harris corners and ANMS, feature descriptor extraction with Gaussian blurring, and robust feature matching with Lowe's technique and RANSAC. Thanks to this much needed automation, I was able explore mosaicing images with fewer corners (I didn't include results for these images in this writeup).