Resumes & CVs

A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation

Description
A novel parametrization of the perspective-three-point problem for a direct computation of absolute camera position and orientation
Categories
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Novel Parametrization of the Perspective-Three-Point Problem for a DirectComputation of Absolute Camera Position and Orientation Laurent Kneip laurent.kneip@mavt.ethz.ch Davide Scaramuzza davide.scaramuzza@mavt.ethz.ch Roland Siegwart rsiegwart@ethz.ch Autonomous Systems Lab, ETH Zurich Abstract The Perspective-Three-Point (P3P) problem aims at de-termining the position and orientation of the camera in theworld reference frame from three 2D-3D point correspon-dences. This problem is known to provide up to four solu-tions that can then be disambiguated using a fourth point. All existing solutions attempt to first solve for the positionof the points in the camera reference frame, and then com- pute the position and orientation of the camera in the world  frame, which alignes the two point sets. In contrast, inthis paper we propose a novel closed-form solution to theP3P problem, which computes the aligning transformationdirectly in a single stage, without the intermediate deriva-tion of the points in the camera frame. This is made pos-sible by introducing intermediate camera and world refer-ence frames, and expressing their relative position and ori-entation using only two parameters. The projection of aworld point into the parametrized camera pose then leadsto two conditions and finally a quartic equation for findingup to four solutions for the parameter pair. A subsequent backsubstitution directly leads to the corresponding cam-era poses with respect to the world reference frame. Weshow that the proposed algorithm offers accuracy and pre-cision comparable to a popular, standard, state-of-the-art approach but at much lower computational cost (15 times faster). Furthermore, it provides improved numerical sta-bility and is less affected by degenerate configurations of the selected world points. The superior computational ef- ficiency is particularly suitable for any RANSAC-outlier-rejection step, which is always recommended before apply-ing PnP or non-linear optimization of the final solution. 1. Introduction The Perspective- n -Point (PnP) problem is srcinatedfrom camera calibration [1, 10, 17, 28]. Also known as pose estimation, it aims at retrieving the position and ori-entation of the camera with respect to a scene object from n  corresponding 3D points. This problem has found manyapplications in computer animation [30], computer vision[16], augmented reality, automation, image analysis, auto-mated cartography [10], photogrammetry [1, 24], robotics [35], and model-based machine vision systems [34]. In 1981, Fischler and Bolles [10] summarized the problem asfollows:  Given the relative spatial locations of   n  control points, and given the angle to every pair of control points P  i  from an additional point called the center of perspective C   , find the lengths of the line segments joining  C   to each of the control points . The next step then consists of retrievingthe orientation and translation of the camera with respect tothe object reference frame.The Direct Linear Transformation was first developed byphotogrammetrists [31] as a solution to the PnP problem—when the 3D points are in a general configuration—andthen introduced in the computer vision community [7, 16]. When the points are coplanar, the homography transforma-tion can be exploited [16] instead.In this paper, we address the particular case of PnP for n  = 3 . This problem is also known as Perspective-Three-Point (P3P) problem. The P3P is the smallest subset of con-trol points that yields a finite number of solutions. When theintrinsic camera parameters are known and we have  n  ≥  4 points, the solution is generally unique.The P3P problem was first investigated in 1841 byGrunert [14] and in 1903 by Finsterwalder [8], who noticed that for a calibrated camera there can be up to four solu-tions, which can then be disambiguated using a fourth point.In the literature, there exist many solutions to this prob-lem, which can be classified into iterative, non-iterative, lin-ear, and non-linear ones. In 1991, Haralick et al. [15] re-viewed the major direct solutions up to 1991, including thesix algorithms given by Grunert (1841) [14], Finsterwalder(1903)—as summarized by Finsterwalder and Scheufele in[8]—, Merritt (1949) [25], Fischler and Bolles (1981) [10], Hung et al. (1985) [20], Linnainmaa et al. (1988) [23], and Grafarend et al. (1989) [13], respectively. They alsogave the analytical solution for the P3P problem with re-1  sultant computation. Different solutions to the P3P prob-lem have been later proposed by Quan and Lan (1999) [28]and Gao et al. (2003) [12]. A different approach—butfor non-single-viewpoint cameras—was proposed by Nis-ter and Stewenius in 2006 [27].It is important to remark here that P3P is the most basiccase of the PnP problem. All PnP problems include theP3P problem as a special case. Among those that handlearbitrary values of  n are those of Fischler and Bolles (1981)[10], Dhome et al. (1989) [6], Horaud et al. (1989) [17], Haralick et al. (1991) [15], DeMenthon and Davis (1995)[4, 5], Quan and Lan (1999) [28], Triggs (1999) [32], Fiore (2001) [9], Ansar and Daniilidis (2003) [2], and Lepetit et al. (2009) [22]—this last one, in particular, also works fordeformable objects.Applications such as feature-point-based camera track-ing [29, 21], structure from motion, and visual odometry [26] require dealing with hundreds or even thousands of noisyfeaturepointsandoutliersinreal-time, whichrequirescomputationally efficient methods. The standard approachconsists of first using P3P in a RANSAC scheme [10]—inorder to remove the outliers—and then PnP on all remain-ing inliers. If necessary, a further non-linear optimizationcan also be applied to refine the final solution.All existing P3P algorithms cited above first estimate thedistances   CP  i   between the camera center C and the 3Dpoints  P  i  from constraints given by the triangles  CP  i P  j (see Fig. 1). Once the distances are known, the  P  i  areexpressed in the camera frame as  P  ν i  . Then, the orienta-tion and translation  [ R | t ]  of the camera in the world ref-erence frame is taken to be the transformation that alignsthe points  P  i  on  P  ν i  and can be found in closed-form solu-tion using quaternions [18] or singular value decomposition(SVD) [3, 19, 33, 11]. Particularly in RANSAC, the trans- formation into the world reference frame is a necessary stepas it allows us to compute the camera projection matrix,which is then used—in combination with the reprojectionerror—to validate the RANSAC hypotheses.In contrast to all previous approaches, in this paper weprovide a closed-form solution for the P3P problem, whichcomputes directly the position and orientation (i.e.,  [ R | t ] )of the camera in the world reference frame as a function of the image coordinates and the coordinates of the referencepointsintheworldframe. Tothebestofourknowledge, thisis the first work in this endeavor. The performance of theproposed algorithm will be evaluated against Gao-et-al.’s[12] implementation, which is one of the most popular androbust P3P solvers. The main advantage of the direct com-putation of   [ R | t ]  is its superior computational efficiency. Inthe first stage, we avoid determining the points in the cam-era reference frame, and in the second stage, the aligningtransformation—which would require SVD [33, 11]. As we will show in the results section, our algorithm is 15times faster than Gao’s and requires only 2 microsecondson a 2.8Ghz Dual Core laptop, which scales very well forRANSAC implementations. The second advantage is itssuperior numerical stability and robustness with respect toGao’s solution.The structure of the paper is as follows. Section 2presents the derivations that lead to the new solution of theP3P algorithm for retrieving the camera position and ori-entation directly. Section 3 provides a thorough analysisof the algorithm’s performance, including numerical stabil-ity, computational cost, accuracy, and precision. The resultswill be compared to Gao’s implementation [12]. Section 4, finally, concludes the work. 2. Theory We consider the problem illustrated in Fig. 1. The goalis to find the exact position  C   and orientation matrix  R  of acamera with respect to the world frame  ( O,X,Y,Z  ) , underthe condition that the absolute spatial coordinates of threeobserved feature points  P  1 ,  P  2 , and  P  3  are given. We fur-thermore assume that the intrinsic camera parameters areknown. Hence, we can assume that the unitary vectors   f  1 ,  f  2 , and   f  3 —pointing towards the three considered featurepoints from the camera frame—are given. Figure 1. Synopsis of the problem. Let us denote the srcinal camera frame with  ν  . The firststep involves the definition of a new, intermediate cameraframe  τ   from the feature vectors   f  1  and   f  2  inside  ν  . Asshown in Fig. 2, the new camera frame is defined as  τ   =( C, t x , t y , t z ) , where  t x  =   f  1  t z  =  f  1  ×   f  2 ||  f  1  ×   f  2 ||  t y  =   t z  ×  t x .  Figure 2. Illustration of the intermediate camera frame τ   = ( C, t x , t y , t z )  and the intermediate world frame η  = ( P  1 ,n x ,n y ,n z ) . Via the transformation matrix  T   = [  t x , t y , t z ] T  , featurevectors can then be transformed into  τ   using  f  τ i  =  T   ·   f  i .  (1)If we are able to define the orientation of   τ   with respectto the world frame, the orientation of   ν   is obviously alsogiven using  T  .The second step involves the definition of a new worldframe  η  from the world points  P  1 ,  P  2 , and  P  3 . The newspatial frame is defined as  η  = ( P  1 ,n x ,n y ,n z ) , where n x  = −→ P  1 P  2 || −→ P  1 P  2  || n z  =  n x × −→ P  1 P  3 || n x × −→ P  1 P  3  || n y  =  n z  × n x . Via the transformation matrix  N   = [ n x ,n y ,n z ] T  , worldpoints can finally be transformed into  η  using P  ηi  =  N   · ( P  i  − P  1 ) .  (2)Again, if we are able to define the orientation of   τ   withrespect to  η , the orientation of   τ   is given automatically in-side the world frame via  N  , and thus via  T   also the orien-tation of   ν  . A similar matter accounts for the camera center C   that is—if defined inside  η —recovered inside the worldframe via a straightforward linear transformation. The re-sulting situation is illustrated in Fig. 2. The condition of existance of   η  is that  P  1 ,  P  2 , and  P  3  are not colinear. Thiscan be easily avoided by verifying that −→ P  1 P  2  × −→ P  1 P  3  is notzero.In the following, we will focus on the transformation be-tween  η  and  τ  . We define the semi-plane  Π  that containspoints P  1 , P  2 , and C  , and hence also the unitary vectors n x ,  t x ,  t y ,   f  1 , and   f  2 , as shown in Fig. 3. Points  P  1 ,  P  2 , and  C  form a triangle of which two parameters are known, namelythe distance  d 12  between  P  1  and  P  2 , and the angle  β   be-tween   f  1  and   f  2 . The latter can be easily obtained via thedot-product  cos β   =   f  1 ·   f  2 . Since the later parametrizationwill only depend on  cot β  , we define b  = cot β   = ± r   11 − cos 2 β   − 1 = ± s   11 − (  f  1 ·   f  2 ) 2 − 1 . (3) The sign of   b  is given by the sign of   cos β  . We define thefree parameter  α  ∈  [0; π ]  as the angle ∠ P  2 P  1 C  . Using thesine-law, we obtain || −→ CP  1  || d 12 = sin( π − α − β  )sin β  . The position of the camera center  C   inside the plane  Π is then given by C  Π ( α ) =  cos α ·|| −→ CP  1  || sin α ·|| −→ CP  1  || 0  =  d 12  cos α sin( α + β  )sin − 1 β d 12  sin α sin( α + β  )sin − 1 β  0  =  d 12  cos α (sin α cot β   + cos α ) d 12  sin α (sin α cot β   + cos α )0  Figure 3. Semi-plane  Π  containing the triangle  ( P  1 ,P  2 ,C  ) . Thebluetrajectoryindicatesthepossiblelocationsofthecameracentre C   depending on the free parameter α , and the fixed parameters d 12 and  β  .  ⇒  C  Π ( α ) =  d 12  cos α (sin α · b + cos α ) d 12  sin α (sin α · b + cos α )0  .  (4)The basis vectors of   τ   inside  Π  are easily given with  t Π x  = ( − cos α, − sin α, 0) T  ,   t Π y  = (sin α, − cos α, 0) T  ,and  t Π z  = (0 , 0 , 1) T  .In order to have  C  ,  t x ,  t y , and  t z  expressed inside  η , weneed to take into account a second free parameter, namelythe rotation  θ  of   Π  around  n x , as illustrated in Fig. 4. Thecorresponding rotation matrix is given by R θ  =  1 0 00 cos θ  − sin θ 0 sin θ  cos θ  . Note that  θ  ∈  [0; π ]  if   f  τ  3 ,z  <  0 , and  θ  ∈  [ − π ;0]  if  f  τ  3 ,z  >  0 , where   f  τ  3  is obtained from   f  3  via (1). This con-straint is given very intuitively by the condition that   f  3  and P  3  need to lie on the same side of   Π . It follows that thecamera center  C   inside  η  is given with C  η ( α,θ ) =  R θ  · C  Π =  d 12  cos α (sin α · b + cos α ) d 12  sin α cos θ (sin α · b + cos α ) d 12  sin α sin θ (sin α · b + cos α )  , (5)and the transformation matrix from  η  to  τ   is given by Q ( α,θ ) =  R θ  ·   t Π x   t Π y   t Π z  T  =  − cos α  − sin α cos θ  − sin α sin θ sin α  − cos α cos θ  − cos α sin θ 0  − sin θ  cos θ  . (6) Figure 4. Rotation of the plane  Π  around  n x  by the angle  θ . The two conditions for finding the correct values of theparameters  α  and  θ  are then established by transformingthe third point  P  η 3  into  τ  , and imposing that the direction of this point is equal to the one of    f  τ  3  . Respecting that  P  η 3  =(  p 1 ,p 2 , 0) T  , we obtain P  τ  3  =  Q ( α,θ )  ·  ( P  η 3  −  C  η ( α,θ ))= 0@ − cos α  ·  p 1  −  sin α cos θ  ·  p 2  +  d 12 (sin α  ·  b  + cos α )sin α  ·  p 1  −  cos α cos θ  ·  p 2 − sin θ  ·  p 2 1A . (7) After defining φ 1  = f  τ  3 ,x f  τ  3 ,z and  φ 2  = f  τ  3 ,y f  τ  3 ,z ,  (8)the two conditions finally result in  φ 1  =  P  τ  3 ,x P  τ  3 ,z φ 2  =  P  τ  3 ,y P  τ  3 ,z ⇔  φ 1  =  − cos α ·  p 1 − sin α cos θ ·  p 2 + d 12 (sin α · b +cos α ) − sin θ ·  p 2 φ 2  =  sin α ·  p 1 − cos α cos θ ·  p 2 − sin θ ·  p 2 ⇔  sin θ sin α  p 2  =  − cot α ·  p 1 − cos θ ·  p 2 + d 12 ( b +cot α ) − φ 1 sin θ sin α  p 2  =  p 1 − cot α cos θ ·  p 2 − φ 2 ⇒  cot α  = φ 1 φ 2  p 1  + cos θ ·  p 2  − d 12  · b φ 1 φ 2 cos θ ·  p 2  −  p 1  + d 12 .  (9)Furthermore, we have φ 2  = P  τ  3 ,y P  τ  3 ,z ⇔  sin 2 θ · f  22  p 22  = sin 2 α (  p 1  − cot α cos θ ·  p 2 ) 2 ⇔  (1 − cos 2 θ )(1 + cot 2 α ) f  22  p 22 =  p 21  − 2cot α cos θ ·  p 1  p 2  + cot 2 α cos 2 θ ·  p 22 . (10)Replacing (9) in (10), expanding, and collecting then easily leads to a fourth order polynomial of the form a 4 · cos 4 θ + a 3 · cos 3 θ + a 2 · cos 2 θ + a 1 · cos θ + a 0  = 0 ,  (11)where,  a 4  =  − φ 22  p 42  − φ 21  p 42  −  p 42 a 3  = 2  p 32 d 12 b + 2 φ 22  p 32 d 12 b − 2 φ 1 φ 2  p 32 d 12 a 2  =  − φ 22  p 21  p 22  − φ 22  p 22 d 212 b 2 − φ 22  p 22 d 212  + φ 22  p 42 + φ 21  p 42  + 2  p 1  p 22 d 12  + 2 φ 1 φ 2  p 1  p 22 d 12 b − φ 21  p 21  p 22  + 2 φ 22  p 1  p 22 d 12  −  p 22 d 212 b 2 − 2  p 21  p 22 a 1  = 2  p 21  p 2 d 12 b + 2 φ 1 φ 2  p 32 d 12 − 2 φ 22  p 32 d 12 b − 2  p 1  p 2 d 212 ba 0  =  − 2 φ 1 φ 2  p 1  p 22 d 12 b + φ 22  p 22 d 212  + 2  p 31 d 12 −  p 21 d 212  + φ 22  p 21  p 22  −  p 41  − 2 φ 22  p 1  p 22 d 12 + φ 21  p 21  p 22  + φ 22  p 22 d 212 b 2 . Up to four real solutions for  cos θ  are then obtained bysimply applying Ferrari’s closed form solution for findingthe roots of a fourth order polynomial. Via replacement in(9), each value for  cos θ  will then also lead to exactly onevaluefor cot α . Eachreal ( α,θ ) -pairisthenbacksubstitutedinto (5) and (6), and the camera center and orientation with respect to the world reference frame are finally given as C   =  P  1  + N  T  · C  η (12)and R  =  N  T  · Q T  · T.  (13)Note that a proper implementation of the algorithm ex-cludes the use of any computationally expensive trigono-metric functions. Using the restricted domains of parame-ters  α  and  θ , all appearing trigonometric forms of the pa-rameters can be directly derived from  cot α  and  cos θ  us-ing simple trigonometric relationships. Furthermore, dur-ing the tests we observed that, due to noise, we sometimesget complex solutions with small imaginary parts instead of real ones. In this case, it is better to retain the real part of these solutions instead of ignoring them completely.The full procedure may be summarized as follows: •  compute the transformation matrix  T   and the featurevector   f  τ  3  using (1) •  compute the transformation matrix  N   and the worldpoint  P  η 3  using (2) •  extract  p 1  and  p 2  from  P  η 3 •  compute  d 12  and  b  using (3) •  compute  φ 1  and  φ 2  using (8) •  compute the factors  a 4 ,  a 3 ,  a 2 ,  a 1 , and  a 0  of polyno-mial (11) •  find the real roots of the polynomial (values for  cos θ ) •  for each solution, find the values for  cot α  using (9) •  compute all necessary trigonometric forms of   α  and θ  using trigonometric relationships and the restrictedparameter domains •  for each solution, compute C  η and Q using (5) and (6), respectively •  for each solution, compute the absolute camera center C   and orientation  R  using (12) and (13), respectively •  backproject a fourth point for disambiguationPlease note that the final version of the Matlab- and C++-implementations used during the experiments can be down-loaded at •  http://www.laurentkneip.de 3. Results The algorithm presented in Section 2 has been thor-oughly tested by means of synthetic data, and compared toGao’s [12] solution to the P3P-problem. The code for thecomparison algorithm is available online. In order to havea fair comparison, Gao’s solution for finding the three dis-tances between the camera center C   and the world points P  i has been extended by Arun’s method [3] to find the aligningtransformation between the two point sets. This is needed inorder to derive the absolute position and orientation of thecamera frame from the relative position of the three points,and thus obtain comparable entities. Gao’s method, obvi-ously, also returns up to four possible solutions. For both al-gorithms, the disambiguation of the four possible solutionshas been done using the same fourth point, and exactly thesame method.The synthetic data consists of 1’000 3D points that areuniformly distributed in a volume of 4 × 4 × 4, centeredaround the srcin of the world frame. The position of thecamera is fixed at  C   =  0 0 6  T  , and the orientation iskept at  R  =  1 0 00  − 1 00 0  − 1  , thus perfectly downlooking.For each experimental run, synthetic 2D-3D correspon-dences are created by randomly selecting three points fromthe entire point set, and projecting them into image spaceusing a virtual calibrated camera with resolution 640 × 480,principal point  ( u c ,v c ) = (320 , 240) , and effective focallengths  f  u  =  f  v  = 800 . Depending on the experiment, adifferent level of white Gaussian noise ranging from 0 to5 pixels is then added to the 2D coordinates before finallyreprojecting the features on the unit sphere.
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks