Here’s a quick index to all the problems in this chapter. I have skipped the text for notes and only included the text for the exercises.
The main index can be found here.
Note that there is a typographical error in Note (i). Hxi=k(xi,yi,1)T should be Hxi=k(x′i,y′i,1)T instead.
III. Scaling unbounded point sets. In the case of points at or near infinity in a plane, it is neither reasonable nor feasible to normalize coordinates using the isotropic (or non-isotropic) scaling schemes presented in this chapter, since the centroid and scale are infinite or near infinite. A method that seems to give good results is to normalize the set of points xi=(xi,yi,wi)T such that ∑ixi=∑iyi=0;∑ix2i+y2i=2∑iw2i;∑ix2i+y2i+w2i=1∀i
Note that the coordinates xi and yi appearing here are the homogeneous coordinates, and the conditions no longer imply that the centroid is at the origin. Investigate methods of achieving this normalization, and evaluate its properties.
An equivalent problem, that of normalizing line coordinates when some lines pass through or near the origin, is discussed in detail in the paper “A new normalized method on line-based homography estimation” 1. I recommend reading the paper to get a complete understanding of how the normalization reduces the condition number of the DLT matrix, making it less sensitive to noise. I will summarize below the transformations that lead to the required normalizations, and their properties. The goal is to spread out the points around the origin instead of having clusters of points far apart.
The following matrix T ensures that ∑ixi=∑iyi=0∀i. In effect, it makes the relative distribution of the point coordinates more homogeneous about the w-axis in the xyw 3D space.
T=(10−∑xi∑wi01−∑yi∑wi001)
The following matrix S, applied after T, ensures that ∑ix2i+y2i=2∑iw2i∀i. It makes the ratio of distance of a point from the xy plane to its distance from the w axis equal to √2, thus make the distribution of the point coordinates in xyw space more homogeneous.
S=(10001000√n∑i=1x2i+y2i2n∑i=1w2i)
Finally xi/‖xi‖ ensures that ∑ix2i+y2i+w2i=1∀i, thus making the transformed point coordinates lie on the unit sphere centered at the origin.
Note that simply satisfying the constraint ∑ix2i+y2i+w2i=1∀i without requiring the other two does not help much as demonstrated by Fig. 2a in the paper which shows that after such a normalization three points are clustered near (0, 0, 1) and one point is far away near (0.5, 0.8, 0), in other words, the matrix is still ill-conditioned.
In effect, the normalizing transformations reduce the condition number of the matrix by homogeneously distributing the point coordinates around the origin in 3D space.
IV. Transformation invariance of DLT. We consider computation of a 2D homography by minimizing algebraic error ‖Ah‖ subject to various constraints. Prove the following cases:
(a) If ‖Ah‖ is minimized subject to the constraint h9=H33=1, then the result is invariant under change of scale but not translation of coordinates.
(b) If instead the constraint is H231+H232=1 then the result is similarity invariant.
(c) Affine case: The same is true for the constraint H31=H32=0;H33=1.
For each case, we need to prove that the given transformation class preserves the constraint under the transformation ˜H=T′HT−1.
Note that in each case, we are only concerned with the last row of ˜H.
Let T be a similarity transformation. Any affine transformation leaves the last component of a homogeneous vector unchanged. Hence the last row of H will be left unchanged by multiplication with T′, i.e T′H.
Thus the last row of ˜H only depends on the result of right multiplication with T−1, i.e. HT−1.
(a) If T represented a change of scale transform then so would T−1. Such a matrix would have the last column as (0,0,1)T and hence would leave H33 unchanged. Hence if H33=1, then ˜H33=1.
If T represented a translation, then so would T−1. Such a matrix would have the last column as (−tx,−ty,1)T making ˜H33=−txH31−tyH32+H33. Hence H33 being equal to 1 would not imply that ˜H33=1.
Thus, if ‖Ah‖ is minimized subject to the constraint h9=H33=1, then the result is invariant under change of scale but not translation of coordinates.
(b) If T represented a similarity transform then the first two columns of T−1 would be (cosθ/s,−sinθ/s,0)T and (sinθ/s,cosθ/s,0)T. This means ˜H31=1/s(H31cosθ−H32sinθ) and ˜H32=1/s(H31sinθ+H32cosθ). Furthermore,
˜H231+˜H232=1/s2[(H31cosθ−H32sinθ)2+(H31sinθ+H32cosθ)2] =1/s2(H231cos2θ+H232sin2θ−2H31H32sinθcosθ+H231sin2θ+H232cos2θ+2H31H32sinθcosθ) =1/s2(H231cos2θ+H232sin2θ+H231sin2θ+H232cos2θ) =1/s2(H231+H232) =1/s2
As H231+H232=1 implies ˜H231+˜H232=constant, the DLT algorithm will be invariant under this constraint.
(c) From the discussion in (b), we can see that if H32=H31=0 then ˜H32=˜H31=0. The last column of T−1 will be of the form (a,b,1)T if T is a similarity transform leading to ˜H33=aH31+bH32+H33=0+0+H33=1. As the constraints H32=H31=0;H33=1 imply ˜H32=˜H31=0;˜H33=1, the DLT algorithm will be invariant under these constraints.
V. Expressions for image coordinate derivative. For the map x′=(x′,y′,w′)T=Hx, derive the following expressions (where ˜x′=(~x′,~y′)T=(x′/w′,y′/w′) are the inhomogeneous coordinates of the image point):
(a) Derivative wrt x ∂˜x′/∂x=1w′(h1T−~x′h3Th2T−~y′h3T) where hjT is the j-th row of H.
(b) Derivative wrt H ∂˜x′/∂h=1w′(xT0−~x′xT0xT−~y′xT) with h as defined in (4.2-p89).
(a) It is clear that
∂hiTx∂x1=hi1,∂hiTx∂y1=hi2,∂hiTx∂w1=hi3
Hence, ∂hiTx∂x=hiT
Now, ~x′=x′/w′ =h1Txh3Tx
Applying the chain rule ∂˜x′/∂x=1h3Tx∂h1Tx∂x−h1Tx(h3Tx)2∂h3Tx∂x
Substituting the values of the partial derivatives we calculated earlier ∂˜x′/∂x=1h3Txh1T−h1Tx(h3Tx)2h3T
Substituting x′=h1Tx and w′=h3Tx ∂˜x′/∂x=1w′h1T−x′(w′)2h3T
Finally, substituting ~x′=x′/w′ ∂˜x′/∂x=1w′h1T−~x′w′h3T =1w′(h1T−~x′h3T)
By applying a similar process we observe that ∂˜y′/∂x==1w′(h2T−~y′h3T)
Hence, we have established that ∂˜x′/∂x=1w′(h1T−~x′h3Th2T−~y′h3T)
(b) In a similar fashion, applying the chain rule, we get ∂˜x′/∂x=1h3Tx∂h1Tx∂h−h1Tx(h3Tx)2∂h3Tx∂h
It is clear that ∂h1Tx/∂h=(xT00) ∂h2Tx/∂h=(0xT0) ∂h3Tx/∂h=(00xT)
Substituting this into the expansion above ∂˜x′/∂h=1h3Tx(xT00)−h1Tx(h3Tx)2(00xT)
Substituting x′=h1Tx and w′=h3Tx ∂˜x′/∂x=1w′(xT00)−x′(w′)2(00xT)
Finally, substituting ~x′=x′/w′ ∂˜x′/∂x=1w′(xT00)−~x′w′(00xT)
=1w′(xT0−~x′xT)
Performing a similar process for ~y′, we get ∂˜y′/∂x=1w′(0xT−~y′xT)
Hence, we have established that ∂˜x′/∂h=1w′(xT0−~x′xT0xT−~y′xT)
VI. Sampson error with non-isotropic error distributions. The derivation of Sampson error in section 4.2.6(p98) assumed that points were measured with circular error distributions. In the case where the point X=(x,y,x′,y′) is measured with covariance matrix Σx it is appropriate instead to minimize the Mahalanobis norm \begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \boldsymbol{\delta}^T_\textbf{x}\Sigma^{-1}_\textbf{x}\boldsymbol{\delta}_\textbf{x}. Show that in this case the formulae corresponding to (4.11-p99) and (4.12-p99) are \boldsymbol{\delta}_\textbf{x} = -\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon and \begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \epsilon^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon Note that if the measurements in the two images are independent, then the covariance matrix \Sigma_\textbf{x} will be block-diagonal with two 2\times2 diagonal blocks corresponding to the two images.
In this case, we need to find the extrema of \boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1}\boldsymbol\delta_\textbf{x} - 2\boldsymbol\lambda^T(J\boldsymbol\delta_\textbf{x} - \epsilon).
Derivative of \boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1}\boldsymbol\delta_\textbf{x} with respect to \boldsymbol\delta_\textbf{x} is 2\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1} as \Sigma_\textbf{x}^{-1} is symmetric (due to \Sigma_\textbf{x} being symmetric). Hence taking the derivative of the whole expression with respect to \boldsymbol\delta_\textbf{x} and equating to zero will give us
2\boldsymbol\delta_\textbf{x}^T\Sigma_\textbf{x}^{-1} - 2\boldsymbol\lambda^TJ = \textbf{0}^T \implies \boldsymbol\delta_\textbf{x} = \Sigma_\textbf{x}J^T\lambda
Taking the derivative with respect to \lambda and equating to zero will give us back the original constraint, J\boldsymbol\delta_\textbf{x} + \epsilon, as before. Substituting for \boldsymbol\delta_\textbf{x} leads to
J\Sigma_\textbf{x}J^T\lambda = -\epsilon \implies \lambda = -(J\Sigma_\textbf{x}J^T)^{-1}\epsilon \implies \boldsymbol\delta_\textbf{x} = -\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon
and
\begin{Vmatrix}\boldsymbol{\delta}_\textbf{x}\end{Vmatrix}^2_{\Sigma_\textbf{x}} = \boldsymbol{\delta}^T_\textbf{x}\Sigma^{-1}_\textbf{x}\boldsymbol{\delta}_\textbf{x} = (-\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})\Sigma_\textbf{x}^{-1}(-\Sigma_\textbf{x}J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon) = (\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})(\Sigma_\textbf{x}^{-1}\Sigma_\textbf{x})(J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon) = (\epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}J\Sigma^{T}_\textbf{x})(J^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon) = \epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}(J\Sigma^{T}_\textbf{x}J^T)(J\Sigma_\textbf{x}J^T)^{-1}\epsilon = \epsilon^T(J\Sigma_\textbf{x}J^T)^{-T}\epsilon = \epsilon^T(J\Sigma^T_\textbf{x}J^T)^{-1}\epsilon = \epsilon^T(J\Sigma_\textbf{x}J^T)^{-1}\epsilon
VIII. Minimizing geometric error for affine transformations. Given a set of correspondences (x_i, y_i) \longleftrightarrow (x’_i, y’_i), find an affine transformation H_A that minimizes geometric error (4.8-p95). We will step through the derivation of a linear algorithm based on Sampson’s approximation which is exact in this case. The complete method is summarized in algorithm 4.7.
(a) Show that the optimum affine transformation takes the centroid of the \textbf{x}_i to the centroid of the \textbf{x}’_i, so by translating the points to have their centroid at the origin, the translation part of the transformation is determined.
Let’s express the affine transformation as a composition of transformation and scaling (in that order) as A = ST, where S is the scaling transform and T is the translation.
Then T will have the form \begin{pmatrix} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{pmatrix}
with \textbf{h} = (1, 0, t_x, 0, 1, t_y, 0, 0, 1)^T.
The residual \epsilon in this case will be \begin{pmatrix} y’_i\textbf{h}^{3T}\textbf{x}_i - w’_i\textbf{h}^{2T}\textbf{x}_i \\ w’_i\textbf{h}^{1T}\textbf{x}_i - x’_i\textbf{h}^{3T}\textbf{x}_i \end{pmatrix} = \begin{pmatrix} w_iy’_i - w’_iy_i + w_iw’_it_y \\ w’_ix_i + w_iw’_it_x - w_ix’_i \end{pmatrix}
Assuming that w_i = w’_i = 1, the inhomogeneous coordinates are simply x_i, y_i, x’_i, y’_i and the residual can be written as \begin{pmatrix} y’_i - y_i + t_y \\ x_i + t_x - x’_i \end{pmatrix}
The Jacobian for this residual will then be J = \begin{pmatrix} 0 & -1 & 0 & 1 \\ 1 & 0 & -1 & 0 \end{pmatrix}
\implies (JJ^T)^{-1} = \frac{1}{2}I
where I is the identity matrix.
The Sampson error for the i^{th} correspondence can then be calculated as \epsilon_i^T(JJ^T)^{-1}\epsilon_i = \frac{1}{2}\epsilon_i^T\epsilon_i
From this it is clear that minimizing the geometric error is the same as minimizing the Sampson error which is the same as minimizing the algebraic error for a translation.
Expanding this error and dropping the constant, we get the error for the i^{th} correspondence to be (y’_i - y_i + t_y)^2 + (x’_i - x_i + t_x)^2 = y’^2_i + y^2_i + t^2_y - 2y’_iy_i - 2y_it_y + 2y’_it_y + x’^2_i + x^2_i + t^2_x - 2x’_ix_i - 2x_it_x + 2x’_it_x
The sum of the errors for all such correspondences will be = \sum\limits_{i}y’^2_i + \sum\limits_{i}y^2_i + nt^2_y - 2\sum\limits_{i}y’_iy_i - 2t_y\sum\limits_{i}y_i + 2t_y\sum\limits_{i}y’_i + \sum\limits_{i}x’^2_i + \sum\limits_{i}x^2_i + nt^2_x - 2\sum\limits_{i}x’_ix_i - 2t_x\sum\limits_{i}x_i + 2t_x\sum\limits_{i}x’_it_x
To find the optimal values of t_x and t_y we take the derivatives w.r.t. t_x and t_y and set them to zero. By doing this we get
2nt_x + 2\sum\limits_{i}x’_i - 2\sum\limits_{i}x_i = 0 2nt_y + 2\sum\limits_{i}y’_i - 2\sum\limits_{i}y_i = 0
Hence, t_x = \frac{\sum\limits_{i}x’_i - \sum\limits_{i}x_i}{n} = \bar{x}’ - \bar{x} t_y = \frac{\sum\limits_{i}y’_i - \sum\limits_{i}y_i}{n} = \bar{y}’ - \bar{y}
where (\bar{x}, \bar{y}) and (\bar{x}’ , \bar{y}’ ) are the centroids of the points in each image respectively.
Thus the translation portion of the optimal affine transformation takes the centroid of points in the first image to the centroid of the points in the second.
References
- Zeng Hui, Xiaoming Deng, and Zhanyi Hu. A new normalized method on line-based homography estimation. Pattern Recognition Letters 29.9 (2008): 1236-1244. [return]