This project aims to build intuitions behind image filtering and investigates different methods of leverage frequencies to alter, process, blend, and combine images. More specifically, as part of this project, I:
I defined the two finite difference operators D_x
and D_y
to be np.array([[1, -1]])
and np.array([[1, -1]]).T
respectively. I then convolved these two kernels with the image using scipy.signal.convolve2d
to produce the partial derivatives with respect to x and y, named gx
and gy
. To compute the gradient magnitude image, I did np.sqrt(gx ** 2 + gy ** 2)
, which treats the corresponding pair of pixel values of the two partial-derivative images as a gradient vector and then calculates its L2-norm to obtain the final pixel value. I then binarized this image with threshold 0.1 and 0.2 to obtain edge images.
Partial derivative with respect to x
Partial derivative with respect to y
Gradient magnitude image
Binarized gradient magnitude image with threshold = 0.1
Binarized gradient magnitude image with threshold = 0.2
I created a Gaussian kernel with kernel_size = 10
and sigma = kernel_size / 6
with cv2.getGaussianKernel
. I then blurred the image by convolving it with this Gaussian kernel. The blurred image would then undergo the same process and operations as the previous part 2.1.
There are some differences in the images produced by this method comparing to the results from the previous part. The partial derivatives with respect to x and with respect to y are more smoothed out in this case, similar to the output of convolving the result from previous part directly with the Gaussian (this is because convolution is associative). The edges in the binarized gradient magnitude image (the edge image) are also thicker and fuller. Last but not least, given the same threshold of 0.1, some of the edges (for example, the ones in the camera) are absent in the blurred one.
Partial derivative with respect to x
Partial derivative with respect to y
Gradient magnitude image
Binarized gradient magnitude image with threshold = 0.05
Binarized gradient magnitude image with threshold = 0.1
I first blurred the finte difference operators D_x
and D_y
using the same Gaussian kernel as the previous part 2.2.1, producing the partial derivatives of the Gaussian kernel with respect to x
(named gdx
) and y
(named gdy
) respectively. Then, I employ the same method as part 2.1 using gdx
and gdy
to produce the following images. Inspecting the results closely, I can verify that the images obtained in this part are similar, if not identical, to those in the other method in 2.2.1, aside from minor differences in some edges because of noise.
Partial derivative of image with respect to x
Partial derivative of image with respect to y
Gradient magnitude image
Binarized gradient magnitude image with threshold = 0.05
Binarized gradient magnitude image with threshold = 0.1
I first blurred the image with a Gaussian kernel, which acts as a low pass filter removing higher frequencies. I then subtracted the blurred image from the original image to obtain the high frequencies of the image. Finally, I will add back the high frequencies to the original image to acquire an image with sharpened, or emphasized edges. Syntactically, I computed the following: sharpened_img = img + alpha * (img - blurred_img)
where blurred_img = convolve2d(img, gaussian_kernel)
Notice that
\[f + \alpha( f-f*g) = f * ((1+\alpha)e - \alpha g), \;\;\;\;\;\; \text{where \(e\) denotes the unit impulse, or an identity kernel.}\]
I created the identity kernel with numpy
and use that to construct the modified kernel sharp_kernel = (1 + alpha) * unit_impulse - alpha * unit_impulse
. Then, I conlved the original image with this modified kernel to obtain the sharpened image.
Original Image of Taj Mahal
Blurred Image of Taj Mahal
Sharpened Image of Taj Mahal (\(\alpha = 1\))
Sharpened Image of Taj Mahal (\(\alpha = 2\))
Original Image of Sydney Opera House
Blurred Image of Sydney Opera House
Sharpened Image of Sydney Opera House (\(\alpha = 1\))
Sharpened Image of Sydney Opera House (\(\alpha = 2\))
I first blurred the image with a Gaussian kernel of kernel_size = 20
and sigma = kernel_size / 6
. I then resharpened the blurred image using the same Gaussian kernel and alpha = 2
.
Original Image of Notre-Dame
Blurred Image of Notre-Dame
Resharpened Image of Notre-Dame
The resharpened image contains many of the high frequencies of the original image, as evident by the well-defined edges of the cathedral. That is, sharpening does well in recovering from the blurring to an extent. However, the effects of blurring still exist in the resharpened image as sharpening was unable to recover some of the lost information/frequencies (for example, the trees at the bottom of the catheral still look a little blurry or some columns on the left of the image are not shrewd-looking).
Given two images, I extracted the low frequencies from one image using a Gaussian filter with sigma = sigma1
and kernel_size = 6 * sigma1
and the high frequecies from the other image by subtracting a Gaussian filter (with sigma = sigma2
and kernel_size = 6 * sigma2
) from the impulse filter. I then created the hybrid image by adding the low and the high frequencies together. After trying different combinations of grayscale and color, I notice that it works better to use color for both the high-frequency and low-frequency components.
Nutmeg
Derek
Nutmeg + Derek
For this case, I used sigma1 = 6.5
to extract the low frequencies from Derek and sigma2 = 15
to extract the high frequencies from Nutmeg.
Cristiano Ronaldo
Lionel Messi
Cristiano Messi
For this failed case, I used sigma1 = 1
to extract the low frequencies from Messi and sigma2 = 3
to extract the high frequencies from Ronaldo. This case is a failure to some extent primarily because the two resolutions of the two inputs are too different, making the resulting aligned and combined image seems less natural.
Confused Bean
Happy Bean
Ambiguous Bean
For this case, I used sigma1 = 1
to extract the low frequencies from Confused Bean and sigma2 = 2.5
to extract the high frequencies from Happy Bean.
Mark Wahlberg
Matt Damon
Mark Damon
For this case, I used sigma1 = 4
to extract the low frequencies from Mark and sigma2 = 6
to extract the high frequencies from Matt.
We can see from the Fourier transforms that the hybrid image is indeed the sum of the low frequencies of Mark and the high frequencies of Matt.
To create the Gaussian stack, I initialized level 0 of the stack with the original image. For each successive level, I blurred the previous level using a Gaussian kernel, ultimately resulting in a stack of images with the same sizes. Within each iteration of the for loop to create the Gaussian stack, I subtracted the newly blurred Gaussian level from the previous Gaussian level to obtain an entry for the Laplacian stack. Finally, I appended the last image from the Gaussian stack to the Laplacian stack, resulting in two stacks with the same number of levels.
Here are the Laplacian and Gaussian stacks for the apple image respectively:
Laplacian stack of apple
Gaussian stack of apple
Here are the Laplacian and Gaussian stacks for the orange image respectively:
Laplacian stack of orange
Gaussian stack of orange
I created the Laplacian stacks for each of the two input images. I also constructed a Gaussian stack for the mask input image to smooth out the transition between the two images. Then, I generated the stack for the blended image by running blended_stack = (1 - mask_gaussian) * image_1_laplacian + mask_gaussian * image_2_laplacian
. Finally, I collapsed the stack to get the final blended result: np.sum(blended_stack, axis = 0)
.
For all of the more irregular masks (i.e. aside from the linear mask in Oraple), I used Meta AI's Segment Anything to generate them by feeding in an input image and extracting the desired binary mask as a jpg file.
Image of Berkeley
Image of snowing sky
Mask input image
Laplacian stack of Berkeley
Laplacian stack of the snowing sky
Gaussian stack of the mask input image
Stack of the blended image
Final blended image
All of the displayed results on the websites, including the Gaussian/Laplacian stacks and multiresolution blending, are implemented with color, i.e. I applied the Gaussian filter to each channel and then stacked them together for the resulting image.
The most important thing I learned from this project is image processing/transformation/manipulation through the perspective of frequency. Prior to this project, I have worked with a variety of image processing techniques such as filtering, compressing, or segmentation, but all of those methods work directly with the raw pixel value of the image. In this project, I had the opportunity to extract low and high frequency out of an image and manipulate them appropriately to produce effects that are often provided by photo editing applications such as blending images or creating hybrid images.