In this assignment, we explore non-parametric warps and cross-dissolution as an approach to face morphing. For those unfamiliar with the concept, we define a "face morph" to be a transformation between two faces that transitions seamlessly over some period of time. As an overview, we accomplish such a task by

- selecting matching sets of feature points on two input images,
- creating a triangulation across the mean of those feature points,
- interpolating between the feature points in each image to create our output shape, i.e. some (non-uniformly?) averaged set of points,
- computing a local warp (affine transformation) between each triangle in our output shape and the corresponding triangle in each of our input shapes,
- bilinearly interpolating over input colors in order to arrive at final values for the pixels in every output triangle,
- and finally celebrating with poorly cooked food.

Let's walk through the algorithm (and maybe go into a bit more detail), so that you can be even more convinced. Say we have the following two images – one of me and one of my friend Tony. In the interest of higher quality results, I've standardized the image size at a not-at-all arbitrary

`476`

x `543`

px and "polygonal lasso tool"ed away most of the background. The edges of the remaining content have been blurred so as to smooth out the difference a bit:
`Me (photo credits: me)` |
`Tony` |

In this case, the points and mesh might end up looking like this (note that the triangulation was computed over the averaged points, although it is shown below as being applied to each image's individual set of points):

`Triangulation over feature points` |
`Corresponding triangulation for Tony's photograph` |

But how do we do that? Well, we're going to need a bit of math. It turns out that by the same principle as barycentric interpolation, we can define an affine transformation on the vertices of a triangle and it will correctly apply to all of the coordinates within the triangle. In other words, the transformation that turns one triangle's vertices into another triangle's vertices will also map each of the points within the first triangle into the associated point within the second triangle. So when we've defined this warp, we can bring a triangle from one image to the location of the corresponding triangle in another image. (By "corresponding triangle", we refer to the identical triangle in the Delaunay collection, just set up across a different set of points.)

We define \(X\) to be the \(3\) x \(3\) matrix whose columns are the source triangle's vertices (in homogeneous coordinates!) and \(Y\) to be the same matrix for the destination triangle. We'll refer to the transformation we're looking for as \(A\), which will also be a \(3\) x \(3\) matrix such that $$AX = Y$$ ...which, when expanded, might be shown as $$ A \begin{bmatrix} (x_1)_{x} & (x_2)_{x} & (x_3)_{x} \\ (x_1)_{y} & (x_2)_{y} & (x_3)_{y} \\ 1 & 1 & 1 \end{bmatrix} = \begin{bmatrix} (y_1)_{x} & (y_2)_{x} & (y_3)_{x} \\ (y_1)_{y} & (y_2)_{y} & (y_3)_{y} \\ 1 & 1 & 1 \end{bmatrix} $$ (where \(x_i\) is the \(i^{th}\) vertex of the source triangle and \((x_i)_{x}\) would be its \(x\)-coordinate. \(y_i\) and its coordinates would be defined similarly, of course, except that it would be part of the destination triangle instead).

Representations aside, we can solve for our affine transformation (/warp) very easily: $$ AX = Y \\ AXX^{-1} = YX^{-1} \\ A = YX^{-1} $$ Then we can vectorize our operations by warping every triangle point at once, which is nice: $$ A \begin{bmatrix} (x_1)_{x} & (x_2)_{x} & \dots & (x_n)_{x} \\ (x_1)_{y} & (x_2)_{y} & \dots & (x_n)_{y} \\ 1 & 1 & \dots & 1 \end{bmatrix} = \begin{bmatrix} (y_1)_{x} & (y_2)_{x} & \dots & (y_n)_{x} \\ (y_1)_{y} & (y_2)_{y} & \dots & (y_n)_{y} \\ 1 & 1 & \dots & 1 \end{bmatrix} $$ (Programmatically speaking, we obtain all of the triangle points through the use of

`scikit-image`

's polygon function.)
Anyway, we now have a means for computing affine transformations between arbitrary triangles... but which triangles are we warping

You may be wondering: why "from?" Answer: we want to carry out an inverse warp in order to avoid having holes in our morphed image – if we do it this way, and compute the warp from every pixel in the final image, then weknowthat every pixel will end up with some color value. There's no such guarantee with a forward warp.

So to answer our earlier question, we're warping

Finally, to obtain the overall color for each pixel in the destination image, we would cross-dissolve (i.e. linearly interpolate) between the colors from the two source images. $$ \text{final color} := t \cdot \text{color1} + (1 - t) \cdot \text{color2} $$ Aaand after doing this for every point in the destination image, we'd end up with our morphed image (again, parameterized by \(t\)). For instance, the below image has \(t = 0.5\) and is therefore exactly in the middle:

`50% Tony, 50% Owen` |

`They say the whole is greater than the sum of its parts` |

`William` |
`Combined GIF` |

The IMM Face Database is a freely available dataset of annotated Danish faces. (By "annotated", I mean that the feature points have already been selected.) The dataset contains the various poses and expressions of forty people (of which 33 are men; we choose to focus only on the men since there are a lot more of them). Each person has been photographed in six different settings, for example "front-facing neutral expression", "left-facing neutral expression", "front-facing smile"... you get the idea. During this part of the assignment, we compute the average of the 33 male faces for a single image type at a time.

This means averaging over the feature points for each face in order to discover the average face *shape*, and then morphing each of the faces into this average shape. These gentlemen have been morphed into the average face shape of a smiling male Dane:

`Guy #10` |
`Guy #20` |
`Guy #40` |

`Original #10` |
`Original #20` |
`Original #40` |

`Just your average smiling Danish male` |

`▲ Front-facing / ▼ Left-facing`

`This cracked me up` |
`My tilted Danish face` |

How else can we mess with people's faces? Well, we could create a caricature (def. "comically or grotesquely exaggerated representation") by extrapolating from the population mean of the previous section. In other words, we can calculate the difference of our image shape from the mean shape and scale it before adding it back to our original mean shape. This allows us to take a parametrically defined step in the direction of that difference (which is to say we can take as large of a step as we want), creating with luck a nicely exaggerated image. Mathematically: $$\text{caricature shape} = \text{mean shape} + \alpha (\text{image shape} - \text{mean shape})$$

Note that for this to be a true extrapolation, \(\alpha\) should be \(< 0\) or \(> 1\).

Then we can warp an arbitrary face to the caricature shape as normal, and produce images like this (which was created using the front-facing neutral mean and an \(\alpha\) of 2)

`Nice` |

`Image type 2,` \(\alpha = -0.5\) |
`Image type 5,` \(\alpha = 1.5\) |

Let's take my white roommate, Oliver, and make him a little less white by morphing his face with that of the average Japanese actor. Original images and their correspondence point triangulations are as follows; for reference, I obtained the Japanese average from this website.

`Midpoint image` |
`t = 21` |
`warp_frac = 1.0, t = 30` |

`Shape only` |
`Appearance only` |
`Both` |

Finally, I made a morphing video in order to highlight temporal variations in my facial attributes. The video has been composed of ten photographs across chronologically increasing points in time.

Original photographs (subject age ranging from about 0 y/o to 19 y/o):

Ideally the images would be more evenly spaced, but I was hard-pressed to find pictures of myself from the middle of my childhood. As a result, I ended up with a denser sampling of photographs from my most early and most recent years. (I suppose it could be worse.)