Insight in Plain Sight
Deconstructing the Homography Matrix
The homography is a core concept in computer vision and multiple view geometry. It describes the mapping between two images that observe the same plane.
Using homogeneous coordinates one can describe this mapping by a 3x3 matrix:
It might be intimidating to interpret the effects of a matrix with 9 parameters and 8 degrees of freedom at first. But we can decompose the matrix into separate parts, each with less DoF and easier to understand.
The Transformation Hierarchy
Therefore, we have to understand the hierarchy of transformations first. Each is more powerful than the former:
- Euclidean
- Similarity
- Affine
- Projective
Euclidean Transform
Euclidean geometry is very natural to us. It basically describes rigid object movement. A euclidean transform consists of a rotation and translation. It has 3 DoF.
Similarity Transform
Almost the same as Euclidean, except we scale the space in addition. It has 4 DoF. In photogrammetry, we can usually reconstruct a scene up to a similarity transform. This is the case when we have only images and no metric measurements in the outside world.
Affine Transform
The affine transformation is already more general. It includes a similarity transform. But it can also stretch and shear the space. We can also describe a transformation by its invariants, by asking the question: “What does not change?”. For Euclidean transform angles and sizes do not change. For similarity transform, relative angles do not change. And for affine transform parallel lines stay parallel. The affine transform has 6 DoF.
Projective Transform
Finally, we arrived at the projective transform. The projective transform is special, because the other transforms can be described in Euclidean space. A projective transform only makes sense in projective space. So, what power does this transform hold? The most unique ability of a projective transform is to warp points at infinity. For example, parallel lines will intersect at a finite point after transformation. This especially happens if you observe an image from an other view point.
The Chain Decomposition
One can go one step further and isolate the specific effects of the different transforms. Remember, we are dealing with a transformation hierarchy. It means that the affine transformation includes all similarity transforms. And a projective transform includes all other transforms. In fact, every general projective transform can be decomposed into three parts:
We can see that what distinguishes a projective transform from affine is a matrix multiplication with two DoF in the v vector. With these additional two parameters we gain the “ability” to affect points at infinity. Remember that vectors with the last entry equals to 0 are ideal points, that means intersections of two parallel lines at infinity.
The affine transformation cannot modify ideal points:
The isolated projective part does just that:
Similarly, the affine decomposition does not contain a translation vector t, since it is already covered by the similarity transform. One also observes, that similarity and euclidean transform are represented by one matrix.
Summary
We saw that a homography follows a hierarchy of transformation. From euclidean to similarity, affine and projective, each transformation adds a bit of functionality. Instead of understanding the matrix H as a whole, we can decompose H into a chain of transformations and isolate the specific effects.
Literature
Multiple view geometry in computer vision, R. Hartley, A, Zisserman