Insight in Plain Sight

Deconstructing the Homography Matrix

The homography is a core concept in computer vision and multiple view geometry. It describes the mapping between two images that observe the same plane.

Using homogeneous coordinates one can describe this mapping by a 3x3 matrix:

It might be intimidating to interpret the effects of a matrix with 9 parameters and 8 degrees of freedom at first. But we can decompose the matrix into separate parts, each with less DoF and easier to understand.

The Transformation Hierarchy

Therefore, we have to understand the hierarchy of transformations first. Each is more powerful than the former:

Euclidean
Similarity
Affine
Projective

We will cover each transform and step by step uncover the homography matrix.

Euclidean Transform

Euclidean geometry is very natural to us. It basically describes rigid object movement. A euclidean transform consists of a rotation and translation. It has 3 DoF.

Similarity Transform

Almost the same as Euclidean, except we scale the space in addition. It has 4 DoF. In photogrammetry, we can usually reconstruct a scene up to a similarity transform. This is the case when we have only images and no metric measurements in the outside world.

Affine Transform

The affine transformation is already more general. It includes a similarity transform. But it can also stretch and shear the space. We can also describe a transformation by its invariants, by asking the question: “What does not change?”. For Euclidean transform angles and sizes do not change. For similarity transform, relative angles do not change. And for affine transform parallel lines stay parallel. The affine transform has 6 DoF.

Projective Transform

Finally, we arrived at the projective transform. The projective transform is special, because the other transforms can be described in Euclidean space. A projective transform only makes sense in projective space. So, what power does this transform hold? The most unique ability of a projective transform is to warp points at infinity. For example, parallel lines will intersect at a finite point after transformation. This especially happens if you observe an image from an other view point.

The Chain Decomposition

One can go one step further and isolate the specific effects of the different transforms. Remember, we are dealing with a transformation hierarchy. It means that the affine transformation includes all similarity transforms. And a projective transform includes all other transforms. In fact, every general projective transform can be decomposed into three parts:

We can see that what distinguishes a projective transform from affine is a matrix multiplication with two DoF in the v vector. With these additional two parameters we gain the “ability” to affect points at infinity. Remember that vectors with the last entry equals to 0 are ideal points, that means intersections of two parallel lines at infinity.

The affine transformation cannot modify ideal points:

The isolated projective part does just that:

Similarly, the affine decomposition does not contain a translation vector t, since it is already covered by the similarity transform. One also observes, that similarity and euclidean transform are represented by one matrix.

Summary

We saw that a homography follows a hierarchy of transformation. From euclidean to similarity, affine and projective, each transformation adds a bit of functionality. Instead of understanding the matrix H as a whole, we can decompose H into a chain of transformations and isolate the specific effects.

Literature

Multiple view geometry in computer vision, R. Hartley, A, Zisserman