Description

A Variational Framework for Structure From Motion In Omnidirectional Image Sequences

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/41114541
A Variational Framework for Structure fromMotion in Omnidirectional Image Sequences
Article
in
Journal of Mathematical Imaging and Vision · November 2011
DOI: 10.1007/s10851-011-0267-1 · Source: OAI
CITATIONS
11
READS
30
3 authors
, including:Luigi BagnatoÉcole Polytechnique Fédérale de Lausanne
13
PUBLICATIONS
58
CITATIONS
SEE PROFILE
Pascal FrossardÉcole Polytechnique Fédérale de Lausanne
427
PUBLICATIONS
4,170
CITATIONS
SEE PROFILE
All content following this page was uploaded by Luigi Bagnato on 10 April 2014.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.
J Math Imaging Vis (2011) 41:182–193DOI 10.1007/s10851-011-0267-1
A Variational Framework for Structure from Motionin Omnidirectional Image Sequences
Luigi Bagnato
·
Pascal Frossard
·
Pierre Vandergheynst
Published online: 1 March 2011© Springer Science+Business Media, LLC 2011
Abstract
We address the problem of depth and ego-motionestimation from omnidirectional images. We propose acorrespondence-free structure-from-motion problem for se-quences of images mapped on the 2-sphere. A novel graph-based variational framework is ﬁrst proposed for depth esti-mation between pairs of images. The estimation is cast as aTV-L1 optimization problem that is solved by a fast graph-based algorithm. The ego-motion is then estimated directlyfrom the depth information without explicit computation of the optical ﬂow. Both problems are ﬁnally addressed to-gether in an iterative algorithm that alternates between depthand ego-motion estimation for fast computation of 3D infor-mation from motion in image sequences. Experimental re-sults demonstrate the effective performance of the proposedalgorithm for 3D reconstruction from synthetic and naturalomnidirectional images.
This work has been partially supported by the Swiss National ScienceFoundation under Grant 200021-125651.L. Bagnato (
)Signal Processing Laboratory (LTS2 and LTS4), Institute of Electrical Engineering, Ecole Polytechnique Fédérale deLausanne (EPFL), Lausanne, 1015 Switzerlande-mail: luigi.bagnato@epﬂ.chP. FrossardSignal Processing Laboratory (LTS4), Institute of ElectricalEngineering, Ecole Polytechnique Fédérale de Lausanne (EPFL),Lausanne, 1015 Switzerlande-mail: pascal.frossard@epﬂ.chP. VandergheynstSignal Processing Laboratory (LTS2), Institute of ElectricalEngineering, Ecole Polytechnique Fédérale de Lausanne (EPFL),Lausanne, 1015 Switzerlande-mail: pierre.vandergheynst@epﬂ.ch
Keywords
Structure from motion
·
Ego-motion
·
Depthestimation
·
Omnidirectional
·
Variational
1 Introduction
Recently, omnidirectional imagers such as catadioptric cam-eras, have sparked tremendous interest in image processingand computer vision. These sensors are particularly attrac-tive due to their (nearly) full ﬁeld of view. The visual infor-mation coming from a sequence of omnidirectional imagescan be used to perform a 3D reconstruction of a scene. Thistype of problem is usually referred to as
Structure from Mo-tion
(SFM) [9] in the literature. Let us imagine a monocu-
lar observer that moves in a rigid unknown world; the SFMproblem consists in estimating the 3D rigid self-motion pa-rameters, i.e., rotation and direction of translation, and thestructure of the scene, usually represented as a depth mapwith respect to the observer position. Structure from motionhas attracted considerable attention in the research commu-nity over the years with applications such as autonomousnavigation, mixed reality, or 3D video.In this paper we introduce a novel structure from motionframework for omnidirectional image sequences. We ﬁrstconsider that the images can be mapped on the 2-sphere,which permits to unify various models of single effectiveviewpoint cameras. Then we propose a correspondence-free SFM algorithm that uses only differential motion be-tween two consecutiveframes of an image sequence throughbrightness derivatives. Since the estimation of a dense depthmap is typically an ill-posed problem, we build on [3] and
we propose a novel variational framework that solves theSFM problem on the 2-sphere when the camera motion isunknown. Variational techniques are among the most suc-cessful approaches to solve under-determined inverse prob-
J Math Imaging Vis (2011) 41:182–193 183
lems and efﬁcient implementations have been proposed re-cently so that their use becomes appealing [26]. We show
in this paper that it is possible to extend very efﬁcient vari-ational approaches to SFM problems, while naturally han-dling the geometry of omnidirectional images. We embed adiscrete image in a weighted graph whose connections aregiven by the topology of the manifold and the geodesic dis-tances between pixels. We then cast the depth estimationproblem as a TV-L1 optimization problem, and we solvethe resulting variational problem with fast graph-based op-timization techniques similar to [10, 20, 27]. To the best of
our knowledge, this is the ﬁrst time that graph-based vari-ational techniques are applied to obtain a dense depth mapfrom omnidirectional image pairs.Then we address the problem of ego-motion estimationfrom the depth information. The camera motion is not per-fectly known in practice, but it can be estimated from thedepth map. We propose to compute the parameters of the3D camera motion with the help of a low-complexity leastsquare estimation algorithm that determines the most likelymotion between omnidirectional images using the depth in-formation. Our formulation permits to avoid the explicitcomputation of the optical ﬂow ﬁeld and the use of featurematching algorithms. Finally, we combine both estimationprocedures to solve the SFM problem in the generic situa-tionwherethecameramotionisnotknownapriori.Thepro-posed iterative algorithm alternatively estimates depth andcamera ego-motionin a multi-resolutionframework, provid-ing an efﬁcient solution to the SFM problem in omnidirec-tional image sequences. Experimental results with syntheticspherical images and natural images from a catadioptric sen-sor conﬁrm the validity of our approach for 3D reconstruc-tion.Therestofthepaperisstructuredasfollows.Weﬁrstpro-videabriefoverviewoftherelatedworkinSect.2.Then,wedescribe in Sect. 3 the framework used in this paper for mo-tion and depth estimation and the corresponding discrete op-eratorsingraph-basedrepresentations.Thevariationaldepthestimation problem is presented in Sect. 4, and the ego-motion estimation is discussed in Sect. 5. Section 6 presents
the joint depth and ego-motion estimation algorithm, whileSect. 7 presents experiments of 3D reconstruction from syn-thetic and natural omnidirectional image sequences.
2 Related Work
The depth and ego-motion estimation problems have beenquite widely studied in the last couple of decades andwe describe here the most relevant papers that presentcorrespondence-free techniques. Correspondence-free algo-rithmsgetridoffeaturecomputationandmatchingstepsthatmight prove to be complex and sensitive to transformationsbetween images. Most of the literature in correspondence-free depth estimation is dedicated to stereo depth estima-tion[22].Inthestereodepthestimationproblemcamerasare
usually separated by a large distance in order to efﬁcientlycapture the geometry of the scene. Registration techniquesare often used to ﬁnd a disparity map between the two im-age views, and the disparity is eventually translated into adepth map. In our problem, we rather assume that the dis-placement between two consecutive frames in the sequenceis small as it generally happens in image sequences. Thispermits to compute the differential motion between imagesand to build low-complexity depth estimation through im-age brightness derivatives. Then, most of the research aboutcorrespondence-free depth estimation has concentrated onperspective images; the depth estimation has also been stud-ied in the case of omnidirectional images in [18], which
stays as one of the rare works that carefully considers thespeciﬁc geometry of the images in the depth estimation. Wehandle this geometry by graph-based processing on a spher-ical manifold and we introduce a novel variational frame-work in our algorithm, which is expected to provide a highrobustness to quantization errors, noise or illumination gra-dients.On the other hand, ego-motion estimation approachesusually proceed by ﬁrst estimating the image displacementﬁeld, the so-called optical ﬂow. The optical ﬂow ﬁeld can berelated to the global motion parameters by a mapping thatdepends on the speciﬁc imaging surface of the camera. Themapping typically deﬁnes the space of solutions for the mo-tion parameters, and speciﬁc techniques can eventually beused to obtain an estimate of the ego-motion [6, 13, 16, 24].
Most techniques reveal sensitivity to noisy estimation of the optical ﬂow. The optical ﬂow estimation is a highly ill-posed inverse problem that needs some sort of regulariza-tion in order to obtain displacement ﬁelds that are physicallymeaningful; a common approach is to impose a smooth-ness constraint on the ﬁeld [5, 14]. In order to avoid the
computation of the optical ﬂow, one can use the so-called“direct approach” where image derivatives are directly re-lated to the motion parameters. Without any assumption onthe scene, the search space of the ego-motion parameters islimited by the
depth positivity constraint
. For example, theworks in [15, 23] estimate the motion parameters that re-
sult into the smallest amount of negative values in the depthmap. Some algorithms srcinally proposed for planar cam-eras have later been adapted to cope with the geometricaldistortion introduced by omnidirectional imaging systems.For example, an omnidirectional ego-motion algorithm hasbeen presented by Gluckman in [11], where the optical ﬂow
ﬁeld is estimated in the catadioptric image plane and thenback-projected onto a spherical surface. Not many, though,have been trying to take advantage from the wider ﬁeld of view of the omnidirectional devices: in spherical images the
184 J Math Imaging Vis (2011) 41:182–193
focus of expansion and the focus of contraction are bothpresent, which imply that translation motion cannot be con-fused with rotational one. In our work, we take advantage of the latter property and directly estimate the ego-motion witha very efﬁcient scheme based on a least square optimizationproblem, which further permits to avoid the computation of the optical ﬂow.Ideas of alternating minimization steps have also beenproposed in [1, 12]. In these works, however, the authors
use planar sensors and assume to have an initial rough esti-mate of the depth map. In addition, they use a simple locallyconstant depth model. In our experiments we show that thismodel is an oversimpliﬁcation of the real world, which doesnot apply to scenes with a complex structure. In the novelframework proposed in this paper, we use a spherical cam-era model and we derive a linear set of motion equationsthat explicitly include camera rotation. The complete ego-motion parameters can then be efﬁciently estimated jointlywith depth.
3 Framework Description
In this section, we introduce the framework and the notationthat will be used in the paper. We derive the equations thatrelate global motion parameters and depth map to the bright-ness derivatives on the sphere. Finally, we show how we em-bed our spherical framework on a weighted graph structureand deﬁne differential operators in this representation.We choose to work on the 2-sphere
S
2
, which is a nat-ural spatial domain to perform processing of omnidirec-tional images as shown in [8] and references therein. For
example, catadioptric camera systems with a single effec-tive viewpoint permit a one-to-one mapping of the cata-dioptric plane onto a sphere via inverse stereographic pro- jection [4]. The centre of that sphere is co-located with the
focal point of the parabolic mirror and each direction repre-sents a light ray incident to that point. We assume then that apre-processing step transforms the srcinal omnidirectionalimages into spherical ones as depicted in Fig. 1.
Fig. 1
Left
: the srcinal catadioptric image.
Right
: projection on thesphere
The starting pointof ouranalysisisthe
brightnessconsis-tency equation
, which assumes that pixel intensity values donot change during motion between successive frames. Let usdenote
I(t,
y
)
an image sequence, where
t
is time and
y
=
(y
1
,y
2
,y
3
)
describes a spatial position in 3-dimensionalspace. If we consider only two consecutive frames in theimage sequence, we can drop the time variable
t
an use
I
0
and
I
1
to refer to the ﬁrst and the second frame respec-tively. The brightness consistency assumption then reads:
I
0
(
y
)
−
I
1
(
y
+
u
)
=
0 where
u
is the displacement ﬁeld be-tween the frames. We can linearize the brightness consis-tency constraint around
y
+
u
0
as:
I
1
(
y
+
u
0
)
+
(
∇
I
1
(
y
+
u
0
))
T
(
u
−
u
0
)
−
I
0
(
y
)
=
0
,
(1)with an obvious abuse of notation for the equality. Thisequation relates the motion ﬁeld
u
(also known as opticalﬂow ﬁeld) to the (spatial and temporal) image derivatives. Itis probably worth stressing that, for this simple linear modelto hold, we assume that the displacement
u
−
u
0
betweenthe two scene views
I
0
and
I
1
is sufﬁciently small.When data live on
S
2
we can express the gradient opera-tor
∇
from (1) in spherical coordinates as:
∇
I(φ,θ)
=
1sin
θ ∂
φ
I(φ,θ)
ˆ
φ
+
∂
θ
I(φ,θ)
ˆ
θ,
(2)where
θ
∈ [
0
,π
]
is the colatitude angle,
φ
∈ [
0
,
2
π
[
is theazimuthal angle and
ˆ
φ,
ˆ
θ
are the unit vectors on the tangentplane corresponding to inﬁnitesimal displacements in
φ
and
θ
respectively (see Fig. 2). Note also that by construction theoptical ﬂow ﬁeld
u
is deﬁned on the tangent bundle
TS
=
ω
∈
S
2
T
ω
S
2
, i.e.
u
:
S
2
⊂
R
3
→
TS
.3.1 Global Motion and Optical FlowUnder the assumption that the motion is slow betweenframes, we have derived above a linear relationship betweenthe apparent motion
u
on the spherical retina and the bright-ness derivatives. If the camera undergoes rigid translation
t
Fig. 2
The representation and coordinate on the 2-sphere
S
2
J Math Imaging Vis (2011) 41:182–193 185
Fig. 3
The sphere and the motion parameters
and rotation around the axis
, then we can derive a geo-metrical constraint between
u
and the parameters of the 3Dmotion of the camera. Let us consider a point
P
in the scene,with respect to a coordinate system ﬁxed at the center of thecamera. We can express
P
as:
P
=
D(
r
)
r
where
r
is the unitvector giving the direction to
P
and
D(
r
)
is the distance of the scene point from the center of the camera. During cam-era motion, as illustrated in Fig. 3, the scene point moveswith respect to the camera by the quantity:
δ
P
=−
t
−
×
r
.
(3)We can now build the geometric relationship that relates themotion ﬁeld
u
to the global motion parameters
t
and
. Itreads
u
(
r
)
=−
t
D(
r
)
−
×
r
=−
Z(
r
)
t
−
×
r
,
(4)where the function
Z(
r
)
is deﬁned as the multiplicative in-verse of the distance function
D(
r
)
. In the following we willrefer to
Z
as the
depth map
. In (4) we ﬁnd all the unknownsof our SFM problem: the depth map
Z(
r
)
describing thestructure of the scene and the 3D motion parameters
t
and
. Due to the multiplication between
Z(
r
)
and
t
, both quan-tities can only be estimated up to a scale factor. So in thefollowing we will consider that
t
has unitary norm.We can ﬁnally combine (1) and (4) in a single equation:
I
1
(
r
+
u
0
)
+
(
∇
I
1
(
r
+
u
0
))
T
(
−
Z(
r
)
t
−
×
r
−
u
0
)
−
I
0
(
r
)
=
0
.
(5)Equation (5) relates image derivatives directly to 3D motionparameters. The equation is not linear in the unknowns andit deﬁnes an under-constrained system (i.e., more unknownthan equations). We will use this equation as constraint inthe optimization problem proposed in the next section.3.2 Discrete Differential Operators on the 2-SphereWe have developed our previous equations in the continuousspatial domain, but we have to remember that our imagesare digital. Although the 2-sphere is a simple manifold withconstant curvature and a simple topology, a special attentionhas to be paid to the deﬁnition of the differential operatorsthat are used in the variational framework.We assume that the omnidirectional images recorded bythe sensor are interpolated onto a spherical equiangular grid:
{
θ
m
=
mπ/M,φ
n
=
n
2
π/N
}
, with
M
·
N
the total numberof samples. This operation can be performed, for example,by mapping the omnidirectional image on the sphere andthen using bilinear interpolation to extract the values at thegiven positions
(θ
m
,φ
n
)
. In spherical coordinates, a sim-ple discretization of the gradient obtained from ﬁnite dif-ferences reads:
∇
θ
f(θ
i,j
,φ
i,j
)
=
f(θ
i
+
1
,j
,φ
i,j
)
−
f(θ
i
,φ
j
)θ ,
∇
φ
f(θ
i,j
,φ
i,j
)
=
1sin
θ
i,j
f(θ
i,j
,φ
i,j
+
1
)
−
f(θ
i,j
,φ
i,j
)φ
.
(6)The discrete divergence, by analogy with the continuous set-tings, is deﬁned by div
=−∇
∗
where
∇
∗
is the adjoint of
∇
.It is then easy to verify that the divergence is given by:div
p
(θ
i,j
,φ
i,j
)
=
p
φ
(θ
i,j
,φ
i,j
)
−
p
φ
(θ
i,j
,φ
i,j
−
1
)
sin
θ
i,j
φ
+
sin
θ
i,j
p
θ
(θ
i,j
,φ
i,j
)
−
sin
θ
i,j
p
θ
(θ
i
−
1
,j
,φ
i,j
)
sin
θ
i,j
θ .
(7)Both (6) and (7) contain a
(
sin
θ)
−
1
term that induces veryhigh values around the poles (i.e., for
θ
≃
0 and
θ
≃
π
) andcan cause numerical instability. We therefore propose to de-ﬁne discrete differential operators on weighted graphs (i.e.,discrete manifold) as a general way to deal with geometry ina coordinate-free fashion.We represent our discretized (spherical) imaging surfaceas a weighted graph, where the vertices represent image pix-els and edges deﬁne connections between pixels (i.e., thetopology of the surface) as represented in Fig. 4. A weightedundirected graph
Ŵ
=
(V,E,w)
consists of a set of vertices
V
, a set of vertices pairs
E
⊆
V
×
V
, and a weight function
w
:
E
→
R
satisfying
w(u,v) >
0 and
w(u,v)
=
w(v,u)
,
∀
(u,v)
∈
E
. Following Zhou et al. [27], we now deﬁne the
gradient and divergence over
Ŵ
as:
(
∇
w
f)(u,v)
=
w(u,v)d(u)f(u)
−
w(u,v)d(v)f(v)
(8)and
(
div
w
F)(u)
=
v
∼
u
w(u,v)d(v)(F(v,u)
−
F(u,v)),
(9)

Search

Similar documents

Tags

Related Search

Structure from MotionA conceptual framework for the forklift-to-grConceptions of curriculum: A framework for unMotion In United States LawFor Profit Higher Education In The United StaCitizens For Responsibility And Ethics In WasMotion In LiminePolitics from below in the Ancient Near EastLegal Framework for Nuclear Counterterrorismframework for organizational transformation:

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks