A statistically-based Newton method for pose refinement

... 569–574. [7]AD Worrall, JM Ferryman, GD Sullivan and KD Baker, Pose and structure recovery using active models, Proceedings of the 6th BMVC, University of Sheffield Printing Unit, Sheffield (1995), pp. 137–146. ... star, open Electronic Annexes
of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  ELSEVIER Image and Vision Computing 16 1998) 541-544 A statistically-based Newton method for pose refinement’ Arthur Pete*, Anthony Worrall* zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQ University of Reading, Computational Vision Group, Department of Computer Science, P.O. Box 225, Reading RG6 6AY, UK Abstract Given a measure for the match of an instantiated model line to an image, it is possible to minimize the probability of obtaining an accidental match by descending the gradient of the ‘line energy’ (log-probability). By projecting this gradient onto parameter space, Newton’s method can be applied to recovering the pose parameters of 3D models that minimize the probability of an accidental match between instantiated model and image. 0 1998 Elsevier Science B.V. All rights reserved. Keywords: Newton method; Pose refinement; Vehicle tracking; Model-based vision There are two main approaches to recovering the pose of a 3D object from image data. The purely top-down, passive search approach relies on projecting the model onto the image and finding the object pose that maximizes some probabilistic measure of the goodness-of-fit between model projection and image, without inverting the perspective projection [ 11. The active search approach relies on detecting simple image features (e.g. edges or high-contrast points), matching these features to geometrical components of the object (e.g. the closest projected model lines), and minimizing the distances between image features and model features by perspective inversion [2-71. The method described in this paper combines some of the features of both approaches: perspective inversion is used, but no image features are matched to model lines; instead, the inverse perspective is used to project into parameter space the gradient of a probabilistic measure, similar to the measure employed in passive search methods. By avoid- ing commitment to particular correspondences between image features and model lines, this method is more robust than other active search methods, while maintaining the speed advantage over passive search methods. ’ Electronic Annexes available. See * E-mail: [A.E.C. Pete, Anthony.Worral] 0262-8856/98/ 19.00 0 1998 Elsevier Science B.V. All rights reserved. PII SO262-8856 98)00098-S 1. The statistical framework The particular application for which our method was developed is tracking cars and similar vehicles in traffic. The relevant prior knowledge is a 3D model of the car and a definition of the ground plane (relative to the camera) to which the car is assumed to be constrained Ref. [l]. Following earlier work [l], the probability of a given model pose is estimated from the sums of squared image derivatives, in the directions normal to projected model lines: k= -m 1) where Vi is the unit normal to the projected model line at location ui; AZ ui) is the discrete derivative, in the direction vi, of the grey-level value at location ui (obtained by bilinear interpolation from pixel values); and w(a) is a window func- tion. Any smooth, even, non-negative, integrable window function, monotonically decreasing away from the srcin can be used. We use a Gaussian window: 2 w(a) =exp + ) \ L 1 This point evaluation is computed at equally-spaced points on each projected model line. By summing over all point evaluations on a line, a line evaluation score is obtained. The log-probability of this score has been estimated by  542 A. Pete/Image and Vision Computing 16 (1998) 541-544 Monte Carlo methods and found to be linearly dependent on the score and inversely proportional to the square root of line length: (3) where the sum in the left-hand side of Eq. (3) is over a line of length L, CL is a constant (depending only on 15) needed to ensure that probabilities add up to unity, and /3 is a constant independent of L. As shown in the Annexe, Eq. (3) is a good approximation to the log-probability except for very low evaluation scores. Such a simple scaling law is a common statistical property of natural images [8]. Assuming that the scores for different model lines are statistically independent, then the log-probability of acci- dentally obtaining a given set of scores for the model lines is equal to the sum of log-probabilities of the scores, i.e. to the sum of the point evaluations over all model lines, nor- malized by the square roots of the corresponding line lengths. The log-probability of accidentally obtaining the line scores computed at a given pose can be defined as the energy E of the model at that pose. 2. zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA nergy minimization Having derived a probabilistic objective function for the model pose, our approach differs from the previous ‘passive search’ approach [l] in trying to minimize it by Newton’s method, rather than by the downhill simplex method. 2.1. Relationship between pose parameters and image coordinates The pose of a rigid object on the ground plane is defined by two translational and one rotational degrees of freedom. The discrepancy between the model pose and the actual pose can be expressed by the three-vector Ap = AX,AY,AO). This discrepancy will induce a discrepancy between model lines (projected onto the image plane) and image edges. Given n points on the model lines, it is possi- ble to compute the relationship between changes of the three pose parameters and changes of the 2n image coordinates of the line-points Au, where u is a 2n vector. By using a small- angle approximation, and assuming that distances within the object are negligible compared to the distance between object and camera, this relationship can be summarized by a Jacobian: Au=JAp (4) where J is a 2n X 3 matrix of derivatives of image coordi- nates with respect to (w.r.t.) the pose parameters of the object. The form of the Jacobian is given in the Annexe. 2.2. Newton’s method The model energy, as a function of the pose parameters, can be approximated by the first three terms of its Taylor expansion: E(Ap) = E, + (V,E), JAp + ApTJT(V:E), J + . . . (3 where V,E is a 2n vector whose elements are the derivatives of the normalized point evaluations w.r.t. the displacements of the line-points along the normal and ViE is a 2n X 2n diagonal matrix whose diagonal elements are the second derivatives of the normalized point evaluations w.r.t. the line-point displacements. The model energy can be minimized by Newton’s method [ 1 l] by solving the system for the pose parameters at which the gradient is zero: &= - H - lJT(V,E)O (6) where the Hessian H is given by: H = JT(V;E)d (7) Unfortunately, the Hessian is not positive definite if a smooth window function is used in Eq. (1). However, an approximation for the Hessian can be used that is guaran- teed to remain positive definite and can be proved to lead to the same local minima of the energy. Full details are given in the Annexe. 3. Results Fig. 1 shows the results of convergence tests comparable to those shown in Refs. [7,9]: an optimal pose for a car model is determined by eye to fit a car image and the method is applied to the car model after it has been displaced from the optimal pose by fixed amounts, ranging from - 1 m to 1 m along both the x and y axes and from - 25 to 25 degrees of rotation around the z axis. The histograms show that the optimal pose is recovered from most starting positions, with the exception of those in the corners of the grid. These results compare favourably with those obtained by the most effective method developed to date [7]. The computa- tional cost per iteration is ‘almost identical for the two methods; therefore we conclude that the new method is an improvement both in speed and accuracy. 4. Conclusions The method outlined in this paper differs from previous active search methods [7] in being based on a statistical framework, similar to the framework of passive search methods [I]. Numerical experiments show that, in addition to clarifying the theoretical relationship between active search and passive search, the new method also achieves faster and more stable convergence.  A. Pecehuzge and Vision Computing 16 (1998) 541-544 543 zyxwvutsrq X Y zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB istogram of initial poses 1 X-Y histogram of poses after 25 iterations Needle plot of poses in peak of X-Y histogram after 25 iterations zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA Needle plot of initial poses Needle plot of poses after 25 iterations Initial poses which converge to the peak. Fig. I. Results of using the Newton method. The top row shows the initial conditions and the middle row shows the result after 25 interactions. The bottom row shows the 60% of the poses which converged to the peak in the histogram ( + / - 10 cm) in their final and initial poses. (The plots for all iterations are available on the Annexe.) The formulation of the method is sufficiently general to allow its use with different evaluation functions. The function that we use allows a linear approximation of the log-probability, which results in a simple and fast computa- tion of gradient and Hessian during the iteration. The method can be easily extended so that the evaluation is performed over successively smaller scales, by decreasing the value of u at each iteration. This modification could further increase the radius of convergence. The least-squares method [7] has been applied to shape recovery as well as pose recovery; we are planning to test the new method on shape recovery as well. The only differ- ence between the two applications is in the form of the Jacobian. Given the greater robustness of the new method, it becomes feasible to apply it to refinement of extrinsic camera parameters: again, the only modification that is needed is in the form of the Jacobian. Preliminary tests suggest that the method can perform quite well in this appli- cation [lo]. Acknowledgements The research described in this paper was supported by the EU-TMR program SMART II. References [l] G.D. Sullivan, Visual interpretation of known objects in constrained scenes, Phil. Trans. R. Sot. Lond. B 337 (1992) 361-370. 121 D.G. Lowe, Fitting parametrized 3-D models to images, IEEE Trans. PAM1 13 (5) (1991) 441-450. [3] D. Keller, K. Daniilidis, H.-H. Nagel, Model-based object tracking in monocular image sequences of road traffic scenes, Int. J. Comp. Vis. 10 (3) (1993) 257-281. [41 Stephens, R.S., Real-time 3D object tracking. Proceedings of the Alvey Vis. Conference, University of Sheffield Printing Unit, Shef- field 1989, pp. 85-90. [5] Harris, C., Tracking with rigid models, in: A. Blake, A. Yuille (Eds.), Active Vision, MIT Press, Cambridge, MA, 1992, pp. 59-73. [6] Kollnig, H.. Nagel, H.-H., 3D pose estimation by fitting image gradients directly to polyhedral models. Proceedings of the 5th  544 A. Pecehuzge and Vision Computing I6 1998) 541-544 ICCV, IEEE Computer Sot. Press, Los Alamitos, CA 1995, pp. 569- 514. [7] Worrall, A.D., Ferryman, J.M., Sullivan, G.D., Baker, K.D., Pose and structure recovery using active models. Proceedings of the 6th BMVC, University of Sheffield Printing Unit, Sheffield 1995, pp. 131-146. [8] D.L. Ruderman, The statistics of natural images, Network 5 (1994) 517-548. [9] Worrall, A.D., Sullivan, G.D., Baker, K.D., Pose refinement of active models using forces in 3D. Proceedings of the 3rd ECCV, Springer- Verlag, New York, 1994, pp. 341-352. [lo] Pete, A.E.C., Sullivan, G.D., Model-based control of an active camera head. Proceedings of the EU-HCM SMART Workshop, Lisbon, April 1995. [II] Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P., Numerical Recipes in C, 2nd edn., Cambridge University Press, Cambridge, 1992.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks