A Variational Framework for Structure From Motion In Omnidirectional Image Sequences

A Variational Framework for Structure From Motion In Omnidirectional Image Sequences
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/41114541 A Variational Framework for Structure fromMotion in Omnidirectional Image Sequences  Article   in  Journal of Mathematical Imaging and Vision · November 2011 DOI: 10.1007/s10851-011-0267-1 · Source: OAI CITATIONS 11 READS 30 3 authors , including:Luigi BagnatoÉcole Polytechnique Fédérale de Lausanne 13   PUBLICATIONS   58   CITATIONS   SEE PROFILE Pascal FrossardÉcole Polytechnique Fédérale de Lausanne 427   PUBLICATIONS   4,170   CITATIONS   SEE PROFILE All content following this page was uploaded by Luigi Bagnato on 10 April 2014. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.  J Math Imaging Vis (2011) 41:182–193DOI 10.1007/s10851-011-0267-1 A Variational Framework for Structure from Motionin Omnidirectional Image Sequences Luigi Bagnato  · Pascal Frossard  · Pierre Vandergheynst Published online: 1 March 2011© Springer Science+Business Media, LLC 2011 Abstract  We address the problem of depth and ego-motionestimation from omnidirectional images. We propose acorrespondence-free structure-from-motion problem for se-quences of images mapped on the 2-sphere. A novel graph-based variational framework is first proposed for depth esti-mation between pairs of images. The estimation is cast as aTV-L1 optimization problem that is solved by a fast graph-based algorithm. The ego-motion is then estimated directlyfrom the depth information without explicit computation of the optical flow. Both problems are finally addressed to-gether in an iterative algorithm that alternates between depthand ego-motion estimation for fast computation of 3D infor-mation from motion in image sequences. Experimental re-sults demonstrate the effective performance of the proposedalgorithm for 3D reconstruction from synthetic and naturalomnidirectional images. This work has been partially supported by the Swiss National ScienceFoundation under Grant 200021-125651.L. Bagnato (  )Signal Processing Laboratory (LTS2 and LTS4), Institute of Electrical Engineering, Ecole Polytechnique Fédérale deLausanne (EPFL), Lausanne, 1015 Switzerlande-mail: luigi.bagnato@epfl.chP. FrossardSignal Processing Laboratory (LTS4), Institute of ElectricalEngineering, Ecole Polytechnique Fédérale de Lausanne (EPFL),Lausanne, 1015 Switzerlande-mail: pascal.frossard@epfl.chP. VandergheynstSignal Processing Laboratory (LTS2), Institute of ElectricalEngineering, Ecole Polytechnique Fédérale de Lausanne (EPFL),Lausanne, 1015 Switzerlande-mail: pierre.vandergheynst@epfl.ch Keywords  Structure from motion  ·  Ego-motion  ·  Depthestimation  ·  Omnidirectional  ·  Variational 1 Introduction Recently, omnidirectional imagers such as catadioptric cam-eras, have sparked tremendous interest in image processingand computer vision. These sensors are particularly attrac-tive due to their (nearly) full field of view. The visual infor-mation coming from a sequence of omnidirectional imagescan be used to perform a 3D reconstruction of a scene. Thistype of problem is usually referred to as  Structure from Mo-tion  (SFM) [9] in the literature. Let us imagine a monocu- lar observer that moves in a rigid unknown world; the SFMproblem consists in estimating the 3D rigid self-motion pa-rameters, i.e., rotation and direction of translation, and thestructure of the scene, usually represented as a depth mapwith respect to the observer position. Structure from motionhas attracted considerable attention in the research commu-nity over the years with applications such as autonomousnavigation, mixed reality, or 3D video.In this paper we introduce a novel structure from motionframework for omnidirectional image sequences. We firstconsider that the images can be mapped on the 2-sphere,which permits to unify various models of single effectiveviewpoint cameras. Then we propose a correspondence-free SFM algorithm that uses only differential motion be-tween two consecutiveframes of an image sequence throughbrightness derivatives. Since the estimation of a dense depthmap is typically an ill-posed problem, we build on [3] and we propose a novel variational framework that solves theSFM problem on the 2-sphere when the camera motion isunknown. Variational techniques are among the most suc-cessful approaches to solve under-determined inverse prob-  J Math Imaging Vis (2011) 41:182–193 183 lems and efficient implementations have been proposed re-cently so that their use becomes appealing [26]. We show in this paper that it is possible to extend very efficient vari-ational approaches to SFM problems, while naturally han-dling the geometry of omnidirectional images. We embed adiscrete image in a weighted graph whose connections aregiven by the topology of the manifold and the geodesic dis-tances between pixels. We then cast the depth estimationproblem as a TV-L1 optimization problem, and we solvethe resulting variational problem with fast graph-based op-timization techniques similar to [10, 20, 27]. To the best of  our knowledge, this is the first time that graph-based vari-ational techniques are applied to obtain a dense depth mapfrom omnidirectional image pairs.Then we address the problem of ego-motion estimationfrom the depth information. The camera motion is not per-fectly known in practice, but it can be estimated from thedepth map. We propose to compute the parameters of the3D camera motion with the help of a low-complexity leastsquare estimation algorithm that determines the most likelymotion between omnidirectional images using the depth in-formation. Our formulation permits to avoid the explicitcomputation of the optical flow field and the use of featurematching algorithms. Finally, we combine both estimationprocedures to solve the SFM problem in the generic situa-tionwherethecameramotionisnotknownapriori.Thepro-posed iterative algorithm alternatively estimates depth andcamera ego-motionin a multi-resolutionframework, provid-ing an efficient solution to the SFM problem in omnidirec-tional image sequences. Experimental results with syntheticspherical images and natural images from a catadioptric sen-sor confirm the validity of our approach for 3D reconstruc-tion.Therestofthepaperisstructuredasfollows.Wefirstpro-videabriefoverviewoftherelatedworkinSect.2.Then,wedescribe in Sect. 3 the framework used in this paper for mo-tion and depth estimation and the corresponding discrete op-eratorsingraph-basedrepresentations.Thevariationaldepthestimation problem is presented in Sect. 4, and the ego-motion estimation is discussed in Sect. 5. Section 6 presents the joint depth and ego-motion estimation algorithm, whileSect. 7 presents experiments of 3D reconstruction from syn-thetic and natural omnidirectional image sequences. 2 Related Work The depth and ego-motion estimation problems have beenquite widely studied in the last couple of decades andwe describe here the most relevant papers that presentcorrespondence-free techniques. Correspondence-free algo-rithmsgetridoffeaturecomputationandmatchingstepsthatmight prove to be complex and sensitive to transformationsbetween images. Most of the literature in correspondence-free depth estimation is dedicated to stereo depth estima-tion[22].Inthestereodepthestimationproblemcamerasare usually separated by a large distance in order to efficientlycapture the geometry of the scene. Registration techniquesare often used to find a disparity map between the two im-age views, and the disparity is eventually translated into adepth map. In our problem, we rather assume that the dis-placement between two consecutive frames in the sequenceis small as it generally happens in image sequences. Thispermits to compute the differential motion between imagesand to build low-complexity depth estimation through im-age brightness derivatives. Then, most of the research aboutcorrespondence-free depth estimation has concentrated onperspective images; the depth estimation has also been stud-ied in the case of omnidirectional images in [18], which stays as one of the rare works that carefully considers thespecific geometry of the images in the depth estimation. Wehandle this geometry by graph-based processing on a spher-ical manifold and we introduce a novel variational frame-work in our algorithm, which is expected to provide a highrobustness to quantization errors, noise or illumination gra-dients.On the other hand, ego-motion estimation approachesusually proceed by first estimating the image displacementfield, the so-called optical flow. The optical flow field can berelated to the global motion parameters by a mapping thatdepends on the specific imaging surface of the camera. Themapping typically defines the space of solutions for the mo-tion parameters, and specific techniques can eventually beused to obtain an estimate of the ego-motion [6, 13, 16, 24]. Most techniques reveal sensitivity to noisy estimation of the optical flow. The optical flow estimation is a highly ill-posed inverse problem that needs some sort of regulariza-tion in order to obtain displacement fields that are physicallymeaningful; a common approach is to impose a smooth-ness constraint on the field [5, 14]. In order to avoid the computation of the optical flow, one can use the so-called“direct approach” where image derivatives are directly re-lated to the motion parameters. Without any assumption onthe scene, the search space of the ego-motion parameters islimited by the  depth positivity constraint  . For example, theworks in [15, 23] estimate the motion parameters that re- sult into the smallest amount of negative values in the depthmap. Some algorithms srcinally proposed for planar cam-eras have later been adapted to cope with the geometricaldistortion introduced by omnidirectional imaging systems.For example, an omnidirectional ego-motion algorithm hasbeen presented by Gluckman in [11], where the optical flow field is estimated in the catadioptric image plane and thenback-projected onto a spherical surface. Not many, though,have been trying to take advantage from the wider field of view of the omnidirectional devices: in spherical images the  184 J Math Imaging Vis (2011) 41:182–193 focus of expansion and the focus of contraction are bothpresent, which imply that translation motion cannot be con-fused with rotational one. In our work, we take advantage of the latter property and directly estimate the ego-motion witha very efficient scheme based on a least square optimizationproblem, which further permits to avoid the computation of the optical flow.Ideas of alternating minimization steps have also beenproposed in [1, 12]. In these works, however, the authors use planar sensors and assume to have an initial rough esti-mate of the depth map. In addition, they use a simple locallyconstant depth model. In our experiments we show that thismodel is an oversimplification of the real world, which doesnot apply to scenes with a complex structure. In the novelframework proposed in this paper, we use a spherical cam-era model and we derive a linear set of motion equationsthat explicitly include camera rotation. The complete ego-motion parameters can then be efficiently estimated jointlywith depth. 3 Framework Description In this section, we introduce the framework and the notationthat will be used in the paper. We derive the equations thatrelate global motion parameters and depth map to the bright-ness derivatives on the sphere. Finally, we show how we em-bed our spherical framework on a weighted graph structureand define differential operators in this representation.We choose to work on the 2-sphere  S  2 , which is a nat-ural spatial domain to perform processing of omnidirec-tional images as shown in [8] and references therein. For example, catadioptric camera systems with a single effec-tive viewpoint permit a one-to-one mapping of the cata-dioptric plane onto a sphere via inverse stereographic pro- jection [4]. The centre of that sphere is co-located with the focal point of the parabolic mirror and each direction repre-sents a light ray incident to that point. We assume then that apre-processing step transforms the srcinal omnidirectionalimages into spherical ones as depicted in Fig. 1. Fig. 1  Left  : the srcinal catadioptric image.  Right  : projection on thesphere The starting pointof ouranalysisisthe  brightnessconsis-tency equation , which assumes that pixel intensity values donot change during motion between successive frames. Let usdenote  I(t, y )  an image sequence, where  t   is time and  y  = (y 1 ,y 2 ,y 3 )  describes a spatial position in 3-dimensionalspace. If we consider only two consecutive frames in theimage sequence, we can drop the time variable  t   an use  I  0 and  I  1  to refer to the first and the second frame respec-tively. The brightness consistency assumption then reads: I  0 ( y ) − I  1 ( y + u ) =  0 where  u  is the displacement field be-tween the frames. We can linearize the brightness consis-tency constraint around  y + u 0  as: I  1 ( y + u 0 ) + ( ∇  I  1 ( y + u 0 )) T  ( u − u 0 ) − I  0 ( y ) = 0 ,  (1)with an obvious abuse of notation for the equality. Thisequation relates the motion field  u  (also known as opticalflow field) to the (spatial and temporal) image derivatives. Itis probably worth stressing that, for this simple linear modelto hold, we assume that the displacement  u  −  u 0  betweenthe two scene views  I  0  and  I  1  is sufficiently small.When data live on  S  2 we can express the gradient opera-tor  ∇   from (1) in spherical coordinates as: ∇  I(φ,θ) =  1sin θ ∂ φ I(φ,θ) ˆ φ  + ∂ θ  I(φ,θ) ˆ θ,  (2)where  θ   ∈ [ 0 ,π ]  is the colatitude angle,  φ  ∈ [ 0 , 2 π [  is theazimuthal angle and  ˆ φ,  ˆ θ   are the unit vectors on the tangentplane corresponding to infinitesimal displacements in  φ  and θ   respectively (see Fig. 2). Note also that by construction theoptical flow field  u  is defined on the tangent bundle  TS   =  ω ∈ S  2  T  ω S  2 , i.e.  u : S  2 ⊂ R 3 → TS  .3.1 Global Motion and Optical FlowUnder the assumption that the motion is slow betweenframes, we have derived above a linear relationship betweenthe apparent motion  u  on the spherical retina and the bright-ness derivatives. If the camera undergoes rigid translation  t Fig. 2  The representation and coordinate on the 2-sphere  S  2  J Math Imaging Vis (2011) 41:182–193 185 Fig. 3  The sphere and the motion parameters and rotation around the axis   , then we can derive a geo-metrical constraint between  u  and the parameters of the 3Dmotion of the camera. Let us consider a point P in the scene,with respect to a coordinate system fixed at the center of thecamera. We can express  P  as:  P = D( r ) r  where  r  is the unitvector giving the direction to  P  and  D( r )  is the distance of the scene point from the center of the camera. During cam-era motion, as illustrated in Fig. 3, the scene point moveswith respect to the camera by the quantity: δ P =− t −  × r .  (3)We can now build the geometric relationship that relates themotion field  u  to the global motion parameters  t  and   . Itreads u ( r ) =−  t D( r ) −  × r =− Z( r ) t −  × r ,  (4)where the function  Z( r )  is defined as the multiplicative in-verse of the distance function  D( r ) . In the following we willrefer to  Z  as the  depth map . In (4) we find all the unknownsof our SFM problem: the depth map  Z( r )  describing thestructure of the scene and the 3D motion parameters  t  and  . Due to the multiplication between  Z( r )  and  t , both quan-tities can only be estimated up to a scale factor. So in thefollowing we will consider that  t  has unitary norm.We can finally combine (1) and (4) in a single equation: I  1 ( r + u 0 ) + ( ∇  I  1 ( r + u 0 )) T  ( − Z( r ) t −  × r − u 0 ) − I  0 ( r ) = 0 .  (5)Equation (5) relates image derivatives directly to 3D motionparameters. The equation is not linear in the unknowns andit defines an under-constrained system (i.e., more unknownthan equations). We will use this equation as constraint inthe optimization problem proposed in the next section.3.2 Discrete Differential Operators on the 2-SphereWe have developed our previous equations in the continuousspatial domain, but we have to remember that our imagesare digital. Although the 2-sphere is a simple manifold withconstant curvature and a simple topology, a special attentionhas to be paid to the definition of the differential operatorsthat are used in the variational framework.We assume that the omnidirectional images recorded bythe sensor are interpolated onto a spherical equiangular grid: { θ  m  =  mπ/M,φ n  =  n 2 π/N  } , with  M   ·  N   the total numberof samples. This operation can be performed, for example,by mapping the omnidirectional image on the sphere andthen using bilinear interpolation to extract the values at thegiven positions  (θ  m ,φ n ) . In spherical coordinates, a sim-ple discretization of the gradient obtained from finite dif-ferences reads: ∇  θ  f(θ  i,j  ,φ i,j  ) =  f(θ  i + 1 ,j  ,φ i,j  ) − f(θ  i ,φ j  )θ , ∇  φ f(θ  i,j  ,φ i,j  ) =  1sin θ  i,j   f(θ  i,j  ,φ i,j  + 1 ) − f(θ  i,j  ,φ i,j  )φ  . (6)The discrete divergence, by analogy with the continuous set-tings, is defined by div  =−∇  ∗ where  ∇  ∗ is the adjoint of   ∇  .It is then easy to verify that the divergence is given by:div p (θ  i,j  ,φ i,j  ) =  p φ (θ  i,j  ,φ i,j  ) − p φ (θ  i,j  ,φ i,j  − 1 ) sin θ  i,j  φ +  sin θ  i,j  p θ  (θ  i,j  ,φ i,j  ) − sin θ  i,j  p θ  (θ  i − 1 ,j  ,φ i,j  ) sin θ  i,j  θ .  (7)Both (6) and (7) contain a  ( sin θ) − 1 term that induces veryhigh values around the poles (i.e., for  θ   ≃  0 and  θ   ≃ π ) andcan cause numerical instability. We therefore propose to de-fine discrete differential operators on weighted graphs (i.e.,discrete manifold) as a general way to deal with geometry ina coordinate-free fashion.We represent our discretized (spherical) imaging surfaceas a weighted graph, where the vertices represent image pix-els and edges define connections between pixels (i.e., thetopology of the surface) as represented in Fig. 4. A weightedundirected graph  Ŵ  = (V,E,w)  consists of a set of vertices V  , a set of vertices pairs  E  ⊆ V   × V  , and a weight function w  :  E   → R  satisfying  w(u,v) >  0 and  w(u,v)  =  w(v,u) , ∀ (u,v)  ∈  E . Following Zhou et al. [27], we now define the gradient and divergence over  Ŵ  as: ( ∇  w f)(u,v) =   w(u,v)d(u)f(u) −   w(u,v)d(v)f(v)  (8)and ( div w F)(u) =  v ∼ u   w(u,v)d(v)(F(v,u) − F(u,v)),  (9)
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks