Multiple View Geometry in Computer Vision Chapter 4 Solutions -- Estimation

Here’s a quick index to all the problems in this chapter. I have skipped the text for notes and only included the text for the exercises.

iii iv v vi viii

The main index can be found here.

Note that there is a typographical error in Note (i). $H\textbf{x}_i = k(x_i, y_i, 1)^T$ should be $H\textbf{x}_i = k(x’_i, y’_i, 1)^T$ instead.

III. Scaling unbounded point sets. In the case of points at or near infinity in a plane, it is neither reasonable nor feasible to normalize coordinates using the isotropic (or non-isotropic) scaling schemes presented in this chapter, since the centroid and scale are infinite or near infinite. A method that seems to give good results is to normalize the set of points $\textbf{x}_i = (x_i, y_i, w_i)^T$ such that $$\sum\limits_{i} x_i = \sum\limits_{i} y_i = 0 ; \sum\limits_{i} x^2_i + y^2_i = 2\sum\limits_{i} w^2_i; \sum\limits_{i} x^2_i + y^2_i + w^2_i = 1 \forall i$$
Note that the coordinates $x_i$ and $y_i$ appearing here are the homogeneous coordinates, and the conditions no longer imply that the centroid is at the origin. Investigate methods of achieving this normalization, and evaluate its properties.

An equivalent problem, that of normalizing line coordinates when some lines pass through or near the origin, is discussed in detail in the paper “A new normalized method on line-based homography estimation” ¹. I recommend reading the paper to get a complete understanding of how the normalization reduces the condition number of the DLT matrix, making it less sensitive to noise. I will summarize below the transformations that lead to the required normalizations, and their properties. The goal is to spread out the points around the origin instead of having clusters of points far apart.

The following matrix $T$ ensures that $\sum\limits_{i} x_i = \sum\limits_{i} y_i = 0 \forall i$. In effect, it makes the relative distribution of the point coordinates more homogeneous about the $w$-axis in the $xyw$ 3D space.

$$T = \begin{pmatrix} 1 & 0 & -\frac{\sum{x_i}}{\sum{w_i}} \\ 0 & 1 & -\frac{\sum{y_i}}{\sum{w_i}} \\ 0 & 0 & 1 \end{pmatrix}$$

The following matrix $S$, applied after $T$, ensures that $\sum\limits_{i} x^2_i + y^2_i = 2\sum\limits_{i} w^2_i \forall i$. It makes the ratio of distance of a point from the $xy$ plane to its distance from the $w$ axis equal to $\sqrt{2}$, thus make the distribution of the point coordinates in $xyw$ space more homogeneous.

$$S = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & \sqrt\frac{\sum\limits_{i=1}^{n}{x^2_i + y^2_i}}{2\sum\limits_{i=1}^{n}{w^2_i}} \end{pmatrix}$$

Finally $\textbf{x}_i/\begin{Vmatrix}\textbf{x}_i\end{Vmatrix}$ ensures that $\sum\limits_{i} x^2_i + y^2_i + w^2_i = 1 \forall i$, thus making the transformed point coordinates lie on the unit sphere centered at the origin.

Note that simply satisfying the constraint $\sum\limits_{i} x^2_i + y^2_i + w^2_i = 1 \forall i$ without requiring the other two does not help much as demonstrated by Fig. 2a in the paper which shows that after such a normalization three points are clustered near (0, 0, 1) and one point is far away near (0.5, 0.8, 0), in other words, the matrix is still ill-conditioned.

In effect, the normalizing transformations reduce the condition number of the matrix by homogeneously distributing the point coordinates around the origin in 3D space.

IV. Transformation invariance of DLT. We consider computation of a 2D homography by minimizing algebraic error $\begin{Vmatrix}\textbf{Ah}\end{Vmatrix}$ subject to various constraints. Prove the following cases:
(a) If $\begin{Vmatrix}\textbf{Ah}\end{Vmatrix}$ is minimized subject to the constraint $h_9 = H_{33} = 1$, then the result is invariant under change of scale but not translation of coordinates.
(b) If instead the constraint is $H^2_{31} + H^2_{32} = 1$ then the result is similarity invariant.
(c) Affine case: The same is true for the constraint $H_{31} = H_{32} = 0; H_{33} = 1$.

For each case, we need to prove that the given transformation class preserves the constraint under the transformation $\widetilde{H} = T’HT^{-1}$.

Note that in each case, we are only concerned with the last row of $\widetilde{H}$.

Let $T$ be a similarity transformation. Any affine transformation leaves the last component of a homogeneous vector unchanged. Hence the last row of $H$ will be left unchanged by multiplication with $T’$, i.e $T’H$.

Thus the last row of $\widetilde{H}$ only depends on the result of right multiplication with $T^{-1}$, i.e. $HT^{-1}$.

(a) If $T$ represented a change of scale transform then so would $T^{-1}$. Such a matrix would have the last column as $(0, 0, 1)^T$ and hence would leave $H_{33}$ unchanged. Hence if $H_{33} = 1$, then $\widetilde{H}_{33} = 1$.

If $T$ represented a translation, then so would $T^{-1}$. Such a matrix would have the last column as $(-t_x, -t_y, 1)^T$ making $\widetilde{H}_{33} = -t_xH_{31} - t_yH_{32} + H_{33}$. Hence $H_{33}$ being equal to 1 would not imply that $\widetilde{H}_{33} = 1$.

Thus, if $\begin{Vmatrix}\textbf{Ah}\end{Vmatrix}$ is minimized subject to the constraint $h_9 = H_{33} = 1$, then the result is invariant under change of scale but not translation of coordinates.

(b) If $T$ represented a similarity transform then the first two columns of $T^{-1}$ would be $(cos\theta/s, -sin\theta/s, 0)^T$ and $(sin\theta/s, cos\theta/s, 0)^T$. This means $\widetilde{H}_{31} = 1/s(H_{31}cos\theta - H_{32}sin\theta)$ and $\widetilde{H}_{32} = 1/s(H_{31}sin\theta + H_{32}cos\theta)$. Furthermore,

$$\widetilde{H}^2_{31} + \widetilde{H}^2_{32} = 1/s^2[(H_{31}cos\theta - H_{32}sin\theta)^2 + (H_{31}sin\theta + H_{32}cos\theta)^2]$$ $$= 1/s^2(H^2_{31}cos^2\theta + H^2_{32}sin^2\theta - 2H_{31}H_{32}sin\theta cos\theta + H^2_{31}sin^2\theta + H^2_{32}cos^2\theta + 2H_{31}H_{32}sin\theta cos\theta)$$ $$= 1/s^2(H^2_{31}cos^2\theta + H^2_{32}sin^2\theta + H^2_{31}sin^2\theta + H^2_{32}cos^2\theta)$$ $$= 1/s^2(H^2_{31} + H^2_{32})$$ $$= 1/s^2$$

As $H^2_{31} + H^2_{32} = 1$ implies $\widetilde{H}^2_{31} + \widetilde{H}^2_{32} = constant$, the DLT algorithm will be invariant under this constraint.

(c) From the discussion in (b), we can see that if $H_{32} = H_{31} = 0$ then $\widetilde{H}_{32} = \widetilde{H}_{31} = 0$. The last column of $T^{-1}$ will be of the form $(a, b, 1)^T$ if $T$ is a similarity transform leading to $\widetilde{H}_{33} = aH_{31} + bH_{32} + H_{33} = 0 + 0 + H_{33} = 1$. As the constraints $H_{32} = H_{31} = 0; H_{33} = 1$ imply $\widetilde{H}_{32} = \widetilde{H}_{31} = 0; \widetilde{H}_{33} = 1$, the DLT algorithm will be invariant under these constraints.

V. Expressions for image coordinate derivative. For the map $\textbf{x}’ = (x’, y’, w’)^T = H\textbf{x}$, derive the following expressions (where $\widetilde{\textbf{x}}’ = (\tilde{x’}, \tilde{y’})^T = (x’/w’, y’/w’)$ are the inhomogeneous coordinates of the image point):
(a) Derivative wrt $\textbf{x}$ $$\partial\tilde{\textbf{x}}’ / \partial{\textbf{x}} = \frac{1}{w’}\begin{pmatrix}\textbf{h}^{1T} - \tilde{x’}\textbf{h}^{3T} \\ \textbf{h}^{2T} - \tilde{y’}\textbf{h}^{3T}\end{pmatrix}$$ where $\textbf{h}^{jT}$ is the $j$-th row of $H$.
(b) Derivative wrt $H$ $$\partial{\tilde{\textbf{x}}}’ / \partial{\textbf{h}} = \frac{1}{w’}\begin{pmatrix} \textbf{x}^T & 0 & -\tilde{x’}\textbf{x}^T \\ 0 & \textbf{x}^T & -\tilde{y’}\textbf{x}^T\end{pmatrix}$$ with $\textbf{h}$ as defined in (4.2-p89).

(a) It is clear that

$$\frac{\partial{\textbf{h}^{iT}\textbf{x}}}{\partial{x_1}} = \textbf{h}_{i1}, \frac{\partial{\textbf{h}^{iT}\textbf{x}}}{\partial{y_1}} = \textbf{h}_{i2}, \frac{\partial{\textbf{h}^{iT}\textbf{x}}}{\partial{w_1}} = \textbf{h}_{i3}$$

Hence, $$\frac{\partial{\textbf{h}^{iT}\textbf{x}}}{\partial{\textbf{x}}} = \textbf{h}^{iT}$$

Now, $$\tilde{x’} = x’/w’$$ $$= \frac{\textbf{h}^{1T}\textbf{x}}{\textbf{h}^{3T}\textbf{x}}$$

Applying the chain rule $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{\textbf{h}^{3T}\textbf{x}}\frac{\partial{\textbf{h}^{1T}\textbf{x}}}{\partial\textbf{x}} - \frac{\textbf{h}^{1T}\textbf{x}}{(\textbf{h}^{3T}\textbf{x})^2} \frac{\partial{\textbf{h}^{3T}\textbf{x}}}{\partial\textbf{x}}$$

Substituting the values of the partial derivatives we calculated earlier $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{\textbf{h}^{3T}\textbf{x}}\textbf{h}^{1T} - \frac{\textbf{h}^{1T}\textbf{x}}{(\textbf{h}^{3T}\textbf{x})^2}\textbf{h}^{3T}$$

Substituting $x’ = \textbf{h}^{1T}\textbf{x}$ and $w’ = \textbf{h}^{3T}\textbf{x}$ $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{w’}\textbf{h}^{1T} - \frac{x’}{(w’)^2}\textbf{h}^{3T}$$

Finally, substituting $\tilde{x’} = x’/w’$ $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{w’}\textbf{h}^{1T} - \frac{\tilde{x’} }{w’}\textbf{h}^{3T}$$ $$= \frac{1}{w’}(\textbf{h}^{1T} - \tilde{x’} \textbf{h}^{3T})$$

By applying a similar process we observe that $$\partial\tilde{y}’ / \partial\textbf{x} = = \frac{1}{w’}(\textbf{h}^{2T} - \tilde{y’} \textbf{h}^{3T})$$

Hence, we have established that $$\partial\tilde{\textbf{x}}’ / \partial{\textbf{x}} = \frac{1}{w’}\begin{pmatrix} \textbf{h}^{1T} - \tilde{x’}\textbf{h}^{3T} \\ \textbf{h}^{2T} - \tilde{y’}\textbf{h}^{3T} \end{pmatrix}$$

(b) In a similar fashion, applying the chain rule, we get $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{\textbf{h}^{3T}\textbf{x}}\frac{\partial{\textbf{h}^{1T}\textbf{x}}}{\partial\textbf{h}} - \frac{\textbf{h}^{1T}\textbf{x}}{(\textbf{h}^{3T}\textbf{x})^2} \frac{\partial{\textbf{h}^{3T}\textbf{x}}}{\partial\textbf{h}}$$

It is clear that $$\partial{\textbf{h}^{1T}\textbf{x}} / {\partial\textbf{h}} = \begin{pmatrix} \textbf{x}^T & 0 & 0 \end{pmatrix}$$ $$\partial{\textbf{h}^{2T}\textbf{x}} / {\partial\textbf{h}} = \begin{pmatrix} 0 & \textbf{x}^T & 0 \end{pmatrix}$$ $$\partial{\textbf{h}^{3T}\textbf{x}} / {\partial\textbf{h}} = \begin{pmatrix} 0 & 0 & \textbf{x}^T \end{pmatrix}$$

Substituting this into the expansion above $$\partial\tilde{x}’ / \partial\textbf{h} = \frac{1}{\textbf{h}^{3T}\textbf{x}}\begin{pmatrix} \textbf{x}^T & 0 & 0 \end{pmatrix} - \frac{\textbf{h}^{1T}\textbf{x}}{(\textbf{h}^{3T}\textbf{x})^2} \begin{pmatrix} 0 & 0 & \textbf{x}^T \end{pmatrix}$$

Substituting $x’ = \textbf{h}^{1T}\textbf{x}$ and $w’ = \textbf{h}^{3T}\textbf{x}$ $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{w’}\begin{pmatrix} \textbf{x}^T & 0 & 0 \end{pmatrix} - \frac{x’}{(w’)^2}\begin{pmatrix} 0 & 0 & \textbf{x}^T \end{pmatrix}$$

Finally, substituting $\tilde{x’} = x’/w’$ $$\partial\tilde{x}’ / \partial\textbf{x} = \frac{1}{w’}\begin{pmatrix} \textbf{x}^T & 0 & 0 \end{pmatrix} - \frac{\tilde{x’} }{w’}\begin{pmatrix} 0 & 0 & \textbf{x}^T \end{pmatrix}$$

$$= \frac{1}{w’}\begin{pmatrix} \textbf{x}^T & 0 & -\tilde{x’} \textbf{x}^T \end{pmatrix}$$

Performing a similar process for $\tilde{y’}$, we get $$\partial\tilde{y}’ / \partial\textbf{x} = \frac{1}{w’}\begin{pmatrix} 0 & \textbf{x}^T & -\tilde{y’} \textbf{x}^T \end{pmatrix}$$

Hence, we have established that $$\partial{\tilde{\textbf{x}}}’ / \partial{\textbf{h}} = \frac{1}{w’}\begin{pmatrix} \textbf{x}^T & 0 & -\tilde{x’}\textbf{x}^T \\ 0 & \textbf{x}^T & -\tilde{y’}\textbf{x}^T\end{pmatrix}$$

VI. Sampson error with non-isotropic error distributions. The derivation of Sampson error in section 4.2.6(p98) assumed that points were measured with circular error distributions. In the case where the point $\textbf{X} = (x, y, x’, y’)$ is measured with covariance matrix $\Sigma_\textbf{x}$ it is appropriate instead to minimize the Mahalanobis norm $\begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \boldsymbol{\delta}^T_\textbf{x}\Sigma^{-1}_\textbf{x}\boldsymbol{\delta}_\textbf{x}$. Show that in this case the formulae corresponding to (4.11-p99) and (4.12-p99) are $$\boldsymbol{\delta}_\textbf{x} = -\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$ and $$\begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \epsilon^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$ Note that if the measurements in the two images are independent, then the covariance matrix $\Sigma_\textbf{x}$ will be block-diagonal with two $2\times2$ diagonal blocks corresponding to the two images.

In this case, we need to find the extrema of $\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1}\boldsymbol\delta_\textbf{x} - 2\boldsymbol\lambda^T(J\boldsymbol\delta_\textbf{x} - \epsilon)$.

Derivative of $\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1}\boldsymbol\delta_\textbf{x}$ with respect to $\boldsymbol\delta_\textbf{x}$ is $2\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1}$ as $\Sigma_\textbf{x}^{-1}$ is symmetric (due to $\Sigma_\textbf{x}$ being symmetric). Hence taking the derivative of the whole expression with respect to $\boldsymbol\delta_\textbf{x}$ and equating to zero will give us

$$2\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1} - 2\boldsymbol\lambda^TJ = \textbf{0}^T$$ $$\implies \boldsymbol\delta_\textbf{x} = \Sigma_\textbf{x}J^T\lambda$$

Taking the derivative with respect to $\lambda$ and equating to zero will give us back the original constraint, $J\boldsymbol\delta_\textbf{x} + \epsilon$, as before. Substituting for $\boldsymbol\delta_\textbf{x}$ leads to

$$J\Sigma_\textbf{x}J^T\lambda = -\epsilon$$ $$\implies \lambda = -(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$ $$\implies \boldsymbol\delta_\textbf{x} = -\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$

and

$$\begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \boldsymbol{\delta}^T_\textbf{x}\Sigma^{-1}_\textbf{x}\boldsymbol{\delta}_\textbf{x} = (-\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})\Sigma_\textbf{x}^{-1}(-\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon)$$ $$= (\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})(\Sigma_\textbf{x}^{-1}\Sigma_\textbf{x})(J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon)$$ $$= (\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})(J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon)$$ $$= \epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}(J\Sigma^{T}_\textbf{x}J^T)(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$ $$= \epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}\epsilon$$ $$= \epsilon^T(J\Sigma^T_\textbf{x}J^T)^{-1}\epsilon$$ $$= \epsilon^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon$$

VIII. Minimizing geometric error for affine transformations. Given a set of correspondences $(x_i, y_i) \longleftrightarrow (x’_i, y’_i)$, find an affine transformation $H_A$ that minimizes geometric error (4.8-p95). We will step through the derivation of a linear algorithm based on Sampson’s approximation which is exact in this case. The complete method is summarized in algorithm 4.7.
(a) Show that the optimum affine transformation takes the centroid of the $\textbf{x}_i$ to the centroid of the $\textbf{x}’_i$, so by translating the points to have their centroid at the origin, the translation part of the transformation is determined.

Let’s express the affine transformation as a composition of transformation and scaling (in that order) as $A = ST$, where $S$ is the scaling transform and $T$ is the translation.

Then $T$ will have the form $$\begin{pmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{pmatrix}$$

with $\textbf{h} = (1, 0, t_x, 0, 1, t_y, 0, 0, 1)^T$.

The residual $\epsilon$ in this case will be $$\begin{pmatrix} y’_i\textbf{h}^{3T}\textbf{x}_i - w’_i\textbf{h}^{2T}\textbf{x}_i \\ w’_i\textbf{h}^{1T}\textbf{x}_i - x’_i\textbf{h}^{3T}\textbf{x}_i \end{pmatrix} = \begin{pmatrix} w_iy’_i - w’_iy_i + w_iw’_it_y \\ w’_ix_i + w_iw’_it_x - w_ix’_i \end{pmatrix}$$

Assuming that $w_i = w’_i = 1$, the inhomogeneous coordinates are simply $x_i, y_i, x’_i, y’_i$ and the residual can be written as $$\begin{pmatrix} y’_i - y_i + t_y \\ x_i + t_x - x’_i \end{pmatrix}$$

The Jacobian for this residual will then be $$J = \begin{pmatrix} 0 & -1 & 0 & 1 \\ 1 & 0 & -1 & 0 \end{pmatrix}$$

$$\implies (JJ^T)^{-1} = \frac{1}{2}I$$

where $I$ is the identity matrix.

The Sampson error for the $i^{th}$ correspondence can then be calculated as $$\epsilon_i^T(JJ^T)^{-1}\epsilon_i = \frac{1}{2}\epsilon_i^T\epsilon_i$$

From this it is clear that minimizing the geometric error is the same as minimizing the Sampson error which is the same as minimizing the algebraic error for a translation.

Expanding this error and dropping the constant, we get the error for the $i^{th}$ correspondence to be $$(y’_i - y_i + t_y)^2 + (x’_i - x_i + t_x)^2$$ $$= y’^2_i + y^2_i + t^2_y - 2y’_iy_i - 2y_it_y + 2y’_it_y + x’^2_i + x^2_i + t^2_x - 2x’_ix_i - 2x_it_x + 2x’_it_x $$

The sum of the errors for all such correspondences will be $$= \sum\limits_{i}y’^2_i + \sum\limits_{i}y^2_i + nt^2_y - 2\sum\limits_{i}y’_iy_i - 2t_y\sum\limits_{i}y_i + 2t_y\sum\limits_{i}y’_i + \sum\limits_{i}x’^2_i + \sum\limits_{i}x^2_i + nt^2_x - 2\sum\limits_{i}x’_ix_i - 2t_x\sum\limits_{i}x_i + 2t_x\sum\limits_{i}x’_it_x $$

To find the optimal values of $t_x$ and $t_y$ we take the derivatives w.r.t. $t_x$ and $t_y$ and set them to zero. By doing this we get

$$2nt_x + 2\sum\limits_{i}x’_i - 2\sum\limits_{i}x_i = 0$$ $$2nt_y + 2\sum\limits_{i}y’_i - 2\sum\limits_{i}y_i = 0$$

Hence, $$t_x = \frac{\sum\limits_{i}x’_i - \sum\limits_{i}x_i}{n} = \bar{x}’ - \bar{x}$$ $$t_y = \frac{\sum\limits_{i}y’_i - \sum\limits_{i}y_i}{n} = \bar{y}’ - \bar{y}$$

where $(\bar{x}, \bar{y})$ and $(\bar{x}’ , \bar{y}’ )$ are the centroids of the points in each image respectively.

Thus the translation portion of the optimal affine transformation takes the centroid of points in the first image to the centroid of the points in the second.

References

Zeng Hui, Xiaoming Deng, and Zhanyi Hu. A new normalized method on line-based homography estimation. Pattern Recognition Letters 29.9 (2008): 1236-1244. ^[return]