For this project, we produce seamlessly stitched mosaics by shooting overlapped photographs with an identical center of projection, computing projective transformations between those photographs, warping them toward whichever is selected as a base image, and finally blending all of the processed photographs into one big panorama. To begin, we shoot pictures that are related by a projective transformation (by taking pictures from the same point of view but with different view directions). We are also careful to ensure that consecutive photographs have some common content, because in order to recover the transform between two images we have to select corresponding points in the images' overlapping scenes.
In selecting corresponding points, we have two options: manually click on high-profile feature points within the overlap across each of the two images, or let the computer find all the features itself. The first option is a ginput call – we have to do all the thinking, so it's easy for the computer and hard for us. On the other hand, the second option is hard for the computer and easy for us. We obviously invented computers so we could enslave them and make them do all the difficult things in life, so everyone's preference here should be a no-brainer. Unfortunately, we still have to give it some orders. How should our feature points be selected? In this case, we'll use the Harris interest point detector to identify decent features (i.e. Harris points) in an image. A Harris point is a fancy name for a corner, a location in the image with a large spatial gradient in all directions, found via Harris's algorithm. Several reduced-size sets of Harris points can be seen below.
ginput
5000 / 235746 Harris points (randomly selected)
1000 / 238458 Harris points
1000 / 159983 Harris points
corner_peaks
peak_local_max
From this moment onward, we mostly use corner_peaks to compute Harris points.
corner_peaks output (17767 points)
500 ANMS-filtered
500 highest strength
500 lowest from all ~200k
corner_peaks output (3944 points)
After we obtain our feature points, we have to match them. Our plan of attack: extract feature descriptors for each point and then match those instead. Said descriptors characterize the immediate region around every feature point. Accordingly, every such descriptor will be represented for us as a normalized local patch of size 8 x 8 – downsampled, that is, from 40 x 40. We make our descriptors invariant under rotation in order to increase robustness. To reiterate, these are the steps we take to generate our feature descriptors:
for every ANMS point: grab the ~60 x ~60 grayscale window around the point compute the avg gradient at the point in the x- and y- directions rotate the window by the negative atan of y-gradient / x-gradient extract the central 40 x 40 window from the rotated patch blur it with a Gaussian filter downsize the patch so that it's 8 x 8 bias/gain-normalize the patch using I' = (I - μ) / σ
ANMS points (500 in total)
Descriptors upscaled to 80 x 80
If the ratio exceeds 0.66, a match is more likely to be incorrect than correct
This was the only thing in the project that worked on the first try. I was hyped.
im1 (30 matches)
im2 (30 matches)
It looks good, but we're not necessarily out of the tunnel yet. We need to be dead sure that there are no homography-ruining outliers in our set of feature matches. (By "dead sure," I actually mean "almost" dead sure because nothing in life is certain.) So we use RANSAC. RANSAC, or RANdom SAmple Consensus, is a technique for estimating the homography matrix (math for that below) while avoiding feature-space outliers. How does it do this? The steps are as follows (all of these actions are performed within a loop that runs for 5000 iterations):
In order to identify the transformation between any two of our images, we need to compute the homography matrix \(H\). We can relate a pair of corresponding points through the equation \(\textbf{p}' = H\textbf{p}\), where \(\textbf{p}\) is an \(\begin{bmatrix}x & y & 1\end{bmatrix}^T\) vector representing a point in the first image and \(\textbf{p}'\) is a \(\begin{bmatrix}wx' & wy' & w\end{bmatrix}^T\) representing a scaled point in the second image. \(H\), incidentally, is what we're solving for and exists as a 3 x 3 matrix with eight degrees of freedom.
Let's go into a little more detail on that front. We're warping an image, which means we're taking the colors that were already in the image and moving them to different places. However, in our case we actually want to make use of an inverse warp. An inverse warp asks, for every pixel location in the output shape, "which color from the input image should go here?" The answer to that is usually a combination of input colors, so we bilinearly interpolate across these in order to make sure we don't miss anything. Mathematically, of course, it's our homography that tells us the locations of the input colors to use. To derive the inverse warp, we begin again from our initial equation (but en masse): \(P' = HP\). This time, \(P\) is a 3 x \(n\) matrix containing all of the points from the input image (the one we're warping; the one whose colors we're moving around) in homogeneous coordinates. Likewise, \(P'\) is a 3 x \(n\) matrix containing scaled points from the image whose coordinate system we're warping to. \(H\), as before, is the homography matrix that converts \(P\)'s spatial domain into \(P'\)'s spatial domain.
Enough math. Let's take what we've done so far and produce something visual! As a test of projective warp correctness, we attempt to rectify a couple of images. Specifically, we'll try to set up our warps so that certain planar surfaces are frontal-parallel in the output image. In the examples below, we select four input points over the corners of some planar surface, and then use the rectified version of those points as the corresponding \(P'\) coordinates.
Sunset view from my apartment (11/11/16)
Rectified using sq. window as a basis
Assortment of rectangular objects
Rectified over Family Life
Here, we warp and blend a sequence of overlapping images together – creating, in expectation, a nicely stitched panorama. Given a set of photographs that don't span more than 180 degrees (faraway scenes work well, since we can then project everything onto a plane at infinity), we start by selecting a centrally positioned photograph to serve as the base panorama image. We then project and blend all other images one-by-one into this photograph's coordinate system, amassing a holistic panorama as we go. That's pretty much it! Since individual images are projected directly onto their correct coordinates in the overall panorama, we don't need to align anything after a warp. All we need to do is blend, which we handle through weighted averaging at every pixel. To facilitate this weighting, we maintain alpha values for each image's pixels that represent how far every pixel is from the unwarped image's center column, i.e. how important the pixel is to its source. We assign the alpha value for the center column to 1, and let this value fall off linearly as it approaches the sides (such that it is 0 at the edges). Then, after warping, we compute the output pixels specific to the individual image, the output pixels specific to the growing panorama image, and the output pixels at which there is overlap. For the non-overlap pixels, we use the unmodified RGB values from their source images. For the overlap pixels, we normalize the alpha values so that they sum to 1 and then linearly interpolate between the color values of each image (using the normalized alphas as weights). Finally, we set the entire alpha channel to 1 in the output panorama at each stage in order to get rid of any residual fading effects. Below we have some results. The photographs are from me biking up to Indian Rock and taking a bunch of pictures of the distance; they yielded some decent mosaics in my opinion (especially after cropping!). [Part B edit: The automatically stitched images have now been included above each of their manual counterparts.]
Cropped view from a rock, pt. 1 (auto)
View from a rock, pt. 1 (auto)
Cropped view from a rock, pt. 1 (manual)
View from a rock, pt. 1 (manual)
Left image
Center image
Right image
Cropped view from a rock, pt. 2 (auto)
View from a rock, pt. 2 (auto)
Cropped view from a rock, pt. 2 (manual)
View from a rock, pt. 2 (manual)
"The Circle", cropped (auto)
It's up north somewhere (auto)
"The Circle", cropped (manual)
Manual feature selection
For non-planar (read: wide) fields of view, we'll want to utilize a cylindrical projection. Although this distorts the edges of the image, it's a worthwhile tradeoff when we're looking for a panorama with an exceedingly large horizontal dimension. As an added requirement, however, we need our input images to be horizontally aligned from the get-go; we're taking the homography out of the mix here. This time, we compute the inverse warping as follows:
Input image
Cylindrically warped output
1
2
3
4
5
6
7
Using our cylindrical warping technique, we can now put together a panorama spanning 360 degrees of view. We do this by using the same stitching algorithm as that of the previous section, while also making sure to merge the final image and the initial image at the end (so as to create a "true" 360 experience). The input images and final panorama can be seen below. Alignment wasn't perfect, but in examining the source images I've come to the conclusion that a perfect translational x-alignment doesn't exist anyway. To deal with the misalignment we'd probably have to do something more sophisticated (or take better pictures).
8
9
As discussed in one of the previous sections, our feature descriptors are also invariant under rotation. This is nice, because even if we fail at photography and rotate the camera around a non-yaw axis, our automatically detected features can still be standardized and matched. You probably don't believe me, so I've prepared as proof a couple of demonstrations –
Kavi im1 feature detection (no rotation)
Kavi im2 feature detection (no rotation)
Kavi im1 feature detection (rotation)
Kavi im2 feature detection (rotation)
Indian Rock image 1 features
Indian Rock image 2 features
Rotationally invariant panorama stitching
Here we implement an algorithm that, given an unordered list of images (some of which might match up), is able to automatically discover pairwise panoramas and stitch them together. It does this by extracting feature descriptors for every image and then exhaustively searching for every image's nearest neighbor via the SSD matching technique we've come to know and love. (The image with the most feature matches is our best guess for each image's neighbor.) Since not every image is guaranteed to have a panorama partner, we consider a subject to be isolated when, again emulating Lowe, the ratio between the match count with the 1-NN and the match count with the 2-NN is less than some threshold... say 4. If the 2-NN count is 0, we're willing to pair the image with its 1-NN as long as the 1-NN count is greater than 4. We also are forced to assume isolation if an image has less than four feature matches with its 1-NN (or RANSAC later leaves it with less than three inliers overall). Four is the absolute minimum number of feature pairings required, since that's how many correspondences we need in order to define a homography. To test our discovery algorithm, we feed it the following five images in the following scrambled order and let it do its thing. We expect an output of two panoramas... and to our excitement, the algorithm delivers.
Output panorama #1 (raw)
Output panorama #2 (raw)
First of all, I learned that planar mappings don't really work for > 180° mosaics, as I spent an embarrassing amount of time trying to warp and merge two images that spanned an excessive view space. However, I'm glad I got to experience the power of homographies / projective transformations. It was very exciting to be able to modify the perspective of my images in a specific way, and to be able to stitch multiple photographs into one! I never realized that it was possible to achieve these effects with just a 3 x 3 matrix and some blending. I leave you with a rectified image of my lunch. (Because I definitely ate all of these pizzas.) Thanks for reading! :)
Lunchtime carnage
Rectified carnage