Multiple View Geometry in Computer Vision Chapter 6 Solutions -- Camera Models

Here’s a quick index to all the problems in this chapter.

The main index can be found here.

I. Let $I_0$ be a projective image, and $I_1$ be an image of $I_0$ (an image of an image). Let the composite image be denoted by $I'$. Show that the apparent camera center of $I'$ is the same as that of $I_0$. Speculate on how this explains why a portrait’s eyes “follow you round the room”. Verify on the other hand that all other parameters of $I'$ and $I_0$ may be different.

I was really confused about what this question was asking so I requested the authors to clarify it. Dr. Richard Hartley graciously replied and helped me understand the question.

Let me rephrase the question in my own words. If $I_0$ is a projective image of a 3D scene taken with camera $C_0$, $I_1$ is the projective image of $I_0$ taken with a different camera, and $I'$ (which looks exactly like $I_1$) is a projective image of the original 3D scene, then show that the camera that produced $I'$ must have the same center as $C_0$.

It has been stated in this chapter(pg 164) that the image of an image (a plane) induces a homography on the original image. This means that the perspective image $PX$ of $X$ becomes $HPX$ under the picture-of-a-picture process, where $H$ is a $3\times 3$ homography. This is proven in section 8.1.1(pg 196) and can also be proven using an alternate method¹.

Furthermore, as $H$ is a full rank $3\times 3$ square matrix, it does not change the rank of the product $HP$ upon multiplication. Hence $HP$ will have a nullspace of dimension 1. As the nullspace of $HP$ must be a subset of the nullspace of $P$ and $P$ has a 1-dimensional nullspace, we can conclude that the nullspace of $HP$ and $P$ is the same and is equal to $C$, the center of projection of $C_0$. Hence the center of projection of the camera that produced the image $I'$ is the same as that of the camera that produced the image $I_0$.

This result can be understood in another way too. As the images $I_1$ and $I_0$ are related by a homography $H$, there are an infinite number of projection matrices that can produce $I_1$ from $I_0$. This is because a homography has 8 degrees of freedom and a projection matrix has 11 (we have 11

8 = 3 free parameters). We need the one unique projection matrix that can also produce $I'$ from the original 3D scene ($I'$ and $I_1$ are identical in content but are produced from different cameras looking at different scenes, the first is looking at the 3D scene and the other is looking at $I_0$). This can happen only when the images $I_1$ and $I'$ are coincident and the rays back-projected from $I'$ to the 3D scene align with the rays back-projected from $I_1$ to $I_0$. This further implies that the rays back-projected from $I'$ to the 3D scene align with the rays back-projected from $I_0$ to the 3D scene. This transitive relation holds because the intersection of the ray with $I_0$ and $I'$ or $I_1$ is the image of the same scene point. Hence the cameras that produce $I'$ and $I_0$ are coincident.

All other parameters (the principal plane, axis planes, vanishing points of the axes, the principal point and principal axis vector) will be different because of the left multiplication by $H$.

When we walk around the room, we are looking at a homography of the original image as if we hadn’t moved but the image was replaced with its homography. So, if the portrait was taken with the person looking down the principal axis toward the center of projection and we stand at approximately the same depth as the distance between the person and the camera in the scene was, the homography of the portrait will still be looking at the center of projection (us) and they seem to “follow us” as we move. Of course central projection is an approximation to how human vision actually works.

2. Show that the ray back-projected from an image point $\textbf{x}$ under a projective camera $P$ (as in (6.14-p162)) may be written as $$L^* = P^T[\textbf{x}]_\times P$$ where $L^*$ is the dual Plücker representation of a line (3.9-p71).

I’m going to prove this by proving that the given expression has the following properties.

It is a skew symmetric matrix.
It is a $4 \times 4$ matrix with rank 2.
It has $C$ and $P^+\textbf{x}$ in its nullspace.

The first two properties prove that its a Plücker matrix. The last property proves that it is the line defined by $C$ and $P^+\textbf{x}$ which we know to be the ray back-projected from the image point $\textbf{x}$.

It is a skew symmetric matrix.
Expanding the expression $L^* = P^T[\textbf{x}]_\times P$ we get $$L^* = \begin{pmatrix}P^1 & P^2 & P^3\end{pmatrix} \begin{pmatrix}0 & -x_3 & x_2 \\ x_3 & 0 & -x_1 \\ -x_2 & x_1 & 0\end{pmatrix} \begin{pmatrix}P^{1T} \\ P^{2T} \\ P^{3T}\end{pmatrix}$$ $$= \begin{pmatrix}P^1 & P^2 & P^3\end{pmatrix} \begin{pmatrix} -x_3P^{2T} + x_2P^{3T} \\ x_3P^{1T} - x_1P^{3T} \\ -x_2P^{1T} + x_1P^{2T} \end{pmatrix}$$ $$= x_1(P^3P^{2T} - P^2P^{3T}) + x_2(P^1P^{3T} - P^3P^{1T}) + x_3(P^2P^{1T} - P^1P^{2T})$$ where $P^{1T}, P^{2T}, P^{3T}$ are rows of $P$.
As each of $P^3P^{2T} - P^2P^{3T}$, $P^1P^{3T} - P^3P^{1T}$ and $P^2P^{1T} - P^1P^{2T}$ are skew-symmetric matrices, $L^*$ is a linear combination of skew symmetric matrices and hence is a skew-symmetric matrix itself.
Furthermore, as the lines defined by $P^3P^{2T} - P^2P^{3T}$, $P^1P^{3T} - P^3P^{1T}$ and $P^2P^{1T} - P^1P^{2T}$ are orthogonal and independent, $L^*$ is not a zero matrix. This point will be useful in proving 3.
It is a $4 \times 4$ matrix with rank 2.
$L^*$ is a product of a $4 \times 3$, a $3 \times 3$ and a $3 \times 4$ matrix and hence will have dimension $4 \times 4$.
We know that $[\textbf{x}]_\times$ has rank 2 and $P$ has rank 3. The rank of a product of matrices is always less than or equal to the rank of the matrix with the smallest rank. Hence, in this case, $L^*$ can have at most rank 2. The rank of a skew-symmetric matrix is always even. So, as $L^*$ is not 0, its rank must be 2.
It has $C$ and $P^+\textbf{x}$ in its nullspace.
Clearly $P^T[\textbf{x}]_\times PC = 0$ as $PC = 0$. Now, $$P^T[\textbf{x}]_\times PP^+\textbf{x} = P^T[\textbf{x}]_\times \textbf{x} = 0$$ as $PP^+ = I$ and $\textbf{x}$ is in the nullspace of $[\textbf{x}]_\times$ by definition.
Hence $C$ and $P^+\textbf{x}$ are in nullspace of $L^*$ and define the line $L^*$.

3. The affine camera
(a) Show that the affine camera is the most general linear mapping on homogeneous coordinates that maps parallel world lines to parallel image lines. To do this consider the projection of points on $\boldsymbol\pi_\infty$, and show that only if $P$ has the affine form will they map to points at infinity in the image.
(b) Show that for parallel lines mapped by an affine camera the ratio of lengths on line segments is an invariant. What other invariants are there under an affine camera?

(a) The mapping of points on $\boldsymbol\pi_\infty$ to points on the line at infinity on the image plane will be as follows. $$\begin{pmatrix}P^{1T} \\ P^{2T} \\ P^{3T} \end{pmatrix} \begin{pmatrix}X \\ Y \\ Z \\ 0\end{pmatrix} = \begin{pmatrix}x \\ y \\ 0\end{pmatrix}$$

For this to hold, $P^{3T}\textbf{X} = 0$. Expanding this, we get $$p_{31}X + p_{32}Y + p_{33}Z + p_{34}*0 = 0$$ If this condition is to be satisfied for all values of $X, Y$ and $Z$, $p_{31}, p_{32}$ and $p_{33}$ must all be zero. Finally, $p_{34}$ can not also be zero as that would make $P$ a rank 2 matrix. Hence, as $P$ is defined up to scale, it is no specialization to set $p_{34} = 1$. This means $P^1 = (0, 0, 0, 1)^T$ and that $P$ must have the affine form.

(b) If the ratio of two parallel line segments is $\lambda$ then the inhomogeneous points defining the line segments will be related as follows.

$$\widetilde{\textbf{X}}_1 - \widetilde{\textbf{X}}_2 = \lambda(\widetilde{\textbf{X}}_3 - \widetilde{\textbf{X}}_4)$$

where $\widetilde{\textbf{X}}_1$ and $\widetilde{\textbf{X}}_2$ are points on the first line and $\widetilde{\textbf{X}}_3$, and $\widetilde{\textbf{X}}_4$ are points on the second line.

After an affine transformation, the homogeneous point $\textbf{X} = (X, Y, Z, 1)^T$ is transformed to $$\begin{pmatrix}P^{1T}\textbf{X} \\ P^{2T}\textbf{X} \\ 1 \end{pmatrix}$$

Hence the inhomogeneous coordinates after the transformation will be $$\widetilde{\textbf{X}} = \begin{pmatrix}P^{1T}\textbf{X} \\ P^{2T}\textbf{X} \end{pmatrix} = \begin{pmatrix}P^{1T} \\ P^{2T}\end{pmatrix}\textbf{X}$$

So the first line segment after the affine transformation will be $$\begin{pmatrix}P^{1T} \\ P^{2T}\end{pmatrix}(\textbf{X}_1 - \textbf{X}_2)$$ $$= \begin{pmatrix}P^{1T} \\ P^{2T}\end{pmatrix}\begin{pmatrix}\widetilde{\textbf{X}}_1 - \widetilde{\textbf{X}}_2 \\ 0 \end{pmatrix}$$ $$= \lambda\begin{pmatrix}P^{1T} \\ P^{2T}\end{pmatrix}\begin{pmatrix}\widetilde{\textbf{X}}_3 - \widetilde{\textbf{X}}_4 \\ 0 \end{pmatrix}$$ $$= \lambda\begin{pmatrix}P^{1T} \\ P^{2T}\end{pmatrix}(\textbf{X}_3 - \textbf{X}_4)$$

Hence, the ratio of two parallel line segments is preserved under an affine transformation.

Other invariants of an affine camera are ratio of areas and linear combinations of vectors (e.g. centroids).

References

Marc Pollefeys. Visual 3D Modeling from Images. Relation between projection matrices and image homographies. https://www.cs.unc.edu/~marc/tutorial/node40.html ↩︎