International Journal of Computer Vision 57(1), 23–47, 2004c
2004 Kluwer Academic Publishers. Manufactured in The Netherlands.
A General Framework for the Selection of World Coordinate Systemsin Perspective and Catadioptric Imaging Applications
JO˜AO P. BARRETO AND HELDER ARAUJO
Institute of Systems and Robotics, Department of Electrical and Computer Engineering,University of Coimbra, Coimbra, Portugal Received January 15, 2002; Revised November 19, 2002; Accepted April 17, 2003
Abstract.
An imaging system with a single effective viewpoint is called a central projection system. The conventional perspective camera is an example of central projection system. A catadioptric realization of omnidirectionalvision combines reﬂective surfaces with lenses. Catadioptric systems with an unique projection center are alsoexamples of central projection systems. Whenever an image is acquired, points in 3D space are mapped into pointsin the 2D image plane. The image formation process represents a transformation from
ℜ
3
to
ℜ
2
, and mathematicalmodels can be used to describe it. This paper discusses the deﬁnition of world coordinate systems that simplify themodeling of general central projection imaging. We show that an adequate choice of the world coordinate referencesystem can be highly advantageous. Such a choice does not imply that new information will be available in theimages.Insteadthegeometrictransformationswillberepresentedinacommonandmorecompactframework,whilesimultaneously enabling newer insights. The ﬁrst part of the paper focuses on static imaging systems that includeboth perspective cameras and catadioptric systems. A systematic approach to select the world reference frame ispresented. In particular we derive coordinate systems that satisfy two differential constraints (the “compactness”and the “decoupling” constraints). These coordinate systems have several advantages for the representation of thetransformations between the 3D world and the image plane. The second part of the paper applies the derived mathematical framework to active tracking of moving targets. In applications of visual control of motion the relationshipbetween motion in the scene and image motion must be established. In the case of active tracking of moving targetsthese relationships become more complex due to camera motion. Suitable world coordinate reference systems aredeﬁned for three distinct situations: perspective camera with planar translation motion, perspective camera withpan and tilt rotation motion, and catadioptric imaging system rotating around an axis going through the effectiveviewpoint and the camera center. Position and velocity equations relating image motion, camera motion and target 3D motion are derived and discussed. Control laws to perform active tracking of moving targets using visualinformation are established.
Keywords:
sensor modeling, catadioptric, omnidirectional vision, visual servoing, active tracking
1. Introduction
Many applications in computer vision, such as surveillance and model acquisition for virtual reality, requirethat a large ﬁeld of view is imaged. Visual control of motion can also beneﬁt from enhanced ﬁelds of view.The computation of camera motion from a sequenceof images obtained with a traditional camera suffersfrom the problem that the direction of translation maylie outside the ﬁeld of view. Panoramic imaging overcomes this problem making the uncertainty of cameramotion estimation independent of the motion direction(Gluckman and Nayar, 1998). In position based visualservoing keeping the target in the ﬁeld of view duringmotion raises severe difﬁculties (Malis et al., 1999).Withalargeﬁeldofviewthisproblemnolongerexists.
24
Barreto and Araujo
Oneeffectivewaytoenhancetheﬁeldofviewofacamera is to use mirrors (Bogner, 1995; Nalwa, 1996; Yagiand Kawato, 1990; Yamazawa et al., 1993, 1995). Thegeneral approach of combining mirrors with conventional imaging systems is referred to as catadioptricimage formation (Hecht and Zajac, 1974).The ﬁxed viewpoint constraint is a requirement ensuring that the visual sensor only measures the intensity of light passing through a single point in 3D space(the projection center). Vision systems verifying theﬁxed viewpoint constraint are called central projectionsystems.Centralprojectionsystemspresentinterestinggeometric properties. A single effective viewpoint is anecessaryconditionforthegenerationofgeometricallycorrect perspective images (Baker and Nayar, 1998),and for the existence of epipolar geometry inherent tothe moving sensor and independent of the scene structure(Svobodaetal.,1998).Itishighlydesirableforanyvision system to have a single viewpoint. The conventional perspective CCD camera is widely used in computer vision applications. In general it is described byacentralprojectionmodelwithasingleeffectiveviewpoint.Centralprojectioncamerasarespecializationsof the general projective camera that can be modeled by a3
×
4matrixwithrank3(HartleyandZisserman,2000).InBakerandNayar(1998),BakerandNayarderivetheentire class of catadioptric systems with a single effective viewpoint. Systems built using a parabolic mirrorwith an orthographic camera, or an hyperbolic, elliptical or planar mirror with a perspective camera verifythe ﬁxed viewpoint constraint.In Geyer and Daniilidis (2000), introduce an unifying theory for all central catadioptric systems whereconventional perspective imaging appears as a particular case. They show that central panoramic projectionis isomorphic to a projective mapping from the sphereto a plane with a projection center on the perpendicular to the plane. A modiﬁed version of this unifyingmodel is introduced in the paper (Barreto and Araujo,2001).General central projection image formation can berepresented by a transformation from
ℜ
3
to
ℜ
2
. Whenever an image is acquired, points in 3D space aremappedintopointsinthe2Dimageplane.Cartesiancoordinate systems are typically used to reference pointsboth in space and in the image plane. The mapping isnoninjectiveandimplieslossofinformation.Therelationship between position and velocity in the 3D spaceand position and velocity in the image are in generalcomplex,difﬁcultandnonlinear.Thispapershowsthatthe choice of the coordinate system to reference pointsinthe3Dspaceisimportant.Theintrinsicnatureofimage formation process is kept unchanged but the mathematical relationship between the world and the imagebecomes simpler and more intuitive. This can help notonly the understanding of the imaging process but alsothe development of new algorithms and applications.The ﬁrst part of the paper focuses on static imaging systems that include both perspective cameras andcatadioptric systems. A general framework to describethe mapping from 3D points to 2D points in the imageplaneispresented.Themathematicalexpressionofthisglobalmappingdependsonthecoordinatesystemusedtoreferencepointsinthescene.Asystematicapproachto select the world coordinate system is presented anddiscussed. Differential constraints are deﬁned to enable the choice of a 3D reference frame. Coordinatetransformationssatisfyingthesedifferentialconstraintsbringadvantageouspropertieswhenmapping3Dspacevelocities into 2D image velocities. One such coordinate transformation is described for the case of theperspective camera and then generalized for centralcatadioptric image formation. Using these coordinatetransformationsdoesnotimplythatnewinformationisavailable in the images. Instead the geometric transformations are represented in a common and more compact framework, while simultaneously enabling newerinsights into the image formation process. Examplesand applications that beneﬁt from an adequate choiceof the world coordinate system are presented and discussed.The second part of our article applies the derivedmathematical framework to active tracking of movingtargets. For this purpose it is assumed that the imaging sensor is mounted on a moving platform. Threedifferent cases are considered: a perspective camerawith translational motion in the XY plane, a perspective camera with rotational pan and tilt motion and aparabolic omnidirectional camera with a rotational degree of freedom around the
Z
axis. The goal of thetrackingapplicationistocontrolthemotionoftheplatform in such a way that the position of the target in theimage plane is kept constant.In the classical eyeinhand positioning problem thecamera is typically attached to the end effector of a6 d.o.f. manipulator. The platforms considered in thiswork have less than 3 d.o.f. For the purpose of controlling the constrained 3D motion of these robots it is notnecessary to determine the full pose of the target. It isassumed that target motion is characterized by the 3D
Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications 25position and velocity of the corresponding mass centerinaninertialreferenceframe.Itisalsoassumedthattheposition of each degree of freedom is known (possiblyvia an encoder).In active tracking applications the image motiondepends both on target and camera 3D motion. Thederived general framework to describe the mappingbetween 3D points and points in the 2D image planeis extended to central catadioptric imaging systemswith rigid motion. The mathematical expression of theglobalmappingdependsontheworldcoordinatesusedto reference points in the scene. General criteria to select suitable coordinate systems are discussed. Adequate choices are presented for each type of platform.The derived mathematical framework is used to establish the position and velocity relationships betweentarget 3D motion, camera motion and image motion.The expressions obtained are used to implement image based active visual tracking. Simpliﬁcations of theequationsobtained(todecouplethedegreesoffreedomof the pan and tilt vision system) are discussed.
2. Static Imaging Systems
This section focuses on the static central projectionvision systems. Examples of such systems are the perspectivecameraandcatadioptricsystemsthatverifytheﬁxed viewpoint constraint (Baker and Nayar, 1998).The image acquisition process maps points from the3D space into the 2D image plane. Image formationperforms a transformation from
ℜ
3
to
ℜ
2
that can bedenoted by
F
. A generic framework to illustrate thetransformation
F
is proposed. This framework is general to both conventional perspective cameras and central projection catadioptric systems. It is desirable that
F
be as simple as possible and as compact as possible.This can be achieved by selecting a speciﬁc coordinatesystems to reference the world points. General criteriatoselecttheworldcoordinatesystemarepresentedanddiscussed. Advantages of using different world coordinate systems to change the format of the
F
mappingare presented.
2.1. Mapping Points from the 3D Spacein the 2D Image Plane
Figure 1 depicts a generic framework to illustrate thetransformation
F
from
ℜ
3
in
ℜ
2
performedbyacentralprojection vision system. If the vision system has a
ix
(x , y )i i
R
2
f ()i
P
2
P
3
XwXh
R
3
R
3
f ()h
ε
P=R[I
−
C]
(x ,y ,z )
ε
x=
function
=
εε
( X, Y, Z, 1)=( X, Y, Z )=
( φ, ψ, ρ )
=
ε
T()
function
Γ()Ω
Figure 1
. Schematic of the mapping performed by general centralprojection imaging systems.
unique viewpoint, it preserves central projection andgeometrically correct perspective images can alwaysbe generated (Gluckman and Nayar, 1998).
X
w
=
(
X
,
Y
,
Z
)
t
is a vector with the Cartesian 3Dcoordinates of a point in space. The domain of transformation is the set
D
of visible points in the worldwith
D
⊂ℜ
3
. Function
f
h
maps
ℜ
3
into the projectivespace
℘
3
. It is a noninjective and surjective functiontransforming
X
w
=
(
X
,
Y
,
Z
)
t
in
X
h
=
(
X
,
Y
,
Z
,
1)
t
thatarethehomogeneousworldpointcoordinates.
P
isan arbitrary 3
×
4 homogeneous matrix with rank 3. Itrepresentsageneralprojectivetransformationperforming a linear mapping of
℘
3
into the projective plane
℘
2
(
x
=
PX
h
). The rank 3 requirement is due to the factthat if the rank is less than 3 then the range of the matrix will be a line or a point and not the whole plane.The rank 3 requirement guarantees that the transformation is surjective. In the case of
P
being a cameramodel it can be written as
P
=
KR
[
I
−
˜ C
] where
I
is a 3
×
3 identity matrix,
K
is the intrinsic parameters matrix,
R
the rotation matrix between cameraand world coordinate systems and
˜ C
the projectioncenter in world coordinates (Hartley and Zisserman,2000). If nothing is stated we will assume
K
=
I
and standard central projection with
P
=
[
I

0
]. Function
f
i
transforms coordinates in the projective plane
x
=
(
x
,
y
,
z
)
t
into Cartesian coordinates in the imageplane
x
i
=
(
x
i
,
y
i
)
t
. It is a noninjective, surjective
26
Barreto and Araujo
function of
℘
2
in
ℜ
2
that maps projective rays in theworld into points in the image. For conventional perspective cameras
x
i
=
f
i
(
x
)
⇔
(
x
i
,
y
i
)
=
(
x z
,
y z
). However, as it will be shown later, these relations are morecomplex for generic catadioptric systems.The transformation
F
maps 3D world points into2D points in the image. Points in the scene were represented using standard cartesian coordinates. However a different coordinate system can be used to reference points in the 3D world space. Assume that
Ω
=
(
φ,ψ,ρ
)
t
are point coordinates in the new reference frame and that
X
w
=
T
(
Ω
) where
T
is a bijectivefunctionfrom
ℜ
3
in
ℜ
3
.Thetransformation
F
,mapping3D world points
Ω
in image points
x
i
(see Eq. (1)), canbe written as the composition of Eq. (2).
x
i
=
F
(
Ω
) (1)
F
(
Ω
)
=
f
i
(
Pf
h
(
T
(
Ω
))) (2)Equation(3),obtainedbydifferentiatingEq.(1)withrespect to time, establishes the relationship betweenvelocity in 3D space
˙
Ω
=
( ˙
φ,
˙
ψ,
˙
ρ
)
t
and velocity inimage
˙x
i
=
(˙
x
i
,
˙
y
i
)
t
.
˙x
i
and
˙
Ω
are related by the jacobianmatrix
J
F
oftransformation
F
.Equation(4)shows
J
F
as the product of the Jacobians of the transformations that make up
F
.
˙x
i
=
J
F
˙
Ω
(3)
J
F
=
J
f
i
·
J
P
·
J
f
h
·
J
T
(4)Function
T
represents a change of coordinates. Itmust be bijective which guarantees that it admits aninverse. Assume that
Γ
is the inverse function of
T
(
Γ
=
T
−
1
). Function
Γ
, from
ℜ
3
into
ℜ
3
, transforms cartesian coordinates
X
w
in new coordinates
Ω
(Eq. (5)).
J
Γ
is the jacobian matrix of
Γ
(Eq. (6)). If
T
is injective then the jacobian matrix
J
T
is nonsingularwith inverse
J
Γ
. Replacing
J
T
by
J
−
1
Γ
in Eq. (4) yieldsEq. (7) showing the jacobian matrix of
F
expressed interms of the scalar function of
Γ
and its partial derivatives.
Γ
(
X
w
)
=
(
φ
(
X
,
Y
,
Z
)
,ψ
(
X
,
Y
,
Z
)
,ρ
(
X
,
Y
,
Z
))
t
(5)
J
Γ
=
φ
X
φ
Y
φ
Z
ψ
X
ψ
Y
ψ
Z
ρ
X
ρ
Y
ρ
Z
(6)
J
F
=
J
f
i
·
J
P
·
J
f
h
·
J
−
1
Γ
(7)
2.2. Criteria to Select the World Coordinate System
Function
F
is a transformation from
ℜ
3
(3D worldspace) into
ℜ
2
(image plane). In Eqs. (8) and (9)
F
and
J
F
arewrittenintermsofscalarfunctionsandtheirpartialderivatives.Therelationshipbetweenworldandimagepointscanbecomplexandcounterintuitive.Themathematical expression of the mapping function
F
depends on the transformation
T
(see Eqs. (2), (4) and(7)).Theselectionofacertaincoordinatesystemtoreference points in the scene changes the way
F
is writtenbutdoesnotchangetheintrinsicnatureofthemapping.However, with an adequate choice of the world coordinate system, the mathematical relationship betweenpositionandvelocityinspaceandpositionandvelocityin the image plane can become simpler, more intuitiveor simply more suitable for a speciﬁc application. Inthis section we discuss criteria for the selection of theworld coordinate system.
F
(
Ω
)
=
(
h
(
φ,ψ,ρ
)
,
g
(
φ,ψ,ρ
))
t
(8)
J
F
=
h
φ
h
ψ
h
ρ
g
φ
g
ψ
g
ρ
(9)
2.2.1. The Compactness Constraint.
Consider centralprojectionvisionsystemsasmappingsof3Dpoints,expressed in Cartesian coordinates
X
w
=
(
X
,
Y
,
Z
)
t
,into the 2D image coordinates
x
i
=
(
x
i
,
y
i
). The transformation is a function from
ℜ
3
into
ℜ
2
with loss of information (depth). In general the two coordinatesin the image plane depend on the three coordinatesin space. The image gives partial information abouteach one of the three world coordinates but we are notable to recover any of those parameters without further constraints. The imaging process implies loss of information and there is no additional transformation
T
that can change that. However it would be advantageous that image coordinates depend only on twoof the 3D parameters. In many situations that can beachieved by means of a change of coordinates
T
. Thecoordinate change must be performed in such a waythat
F
only depends on two of those coordinates. Assuming that
Ω
=
(
φ,ψ,ρ
) are the new 3D coordinates,
F
becomes a function of only
φ
and
ψ
whichmeans that partial derivatives
h
ρ
and
g
ρ
are both equalto zero. Whenever a certain change of coordinates
T
leads to a jacobian matrix
J
F
with a zero column, itis said that mapping
F
is in a compact form and coordinate transformation
T
veriﬁes the “compactnessconstraint”.
Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications 27Assume that a world coordinate system satisfyingthe “compactness constraint” is selected. If Eq. (10)is veriﬁed then the image coordinates (
x
i
,
y
i
) dependonly on (
φ,ψ
) and
F
becomes a function from
ℜ
2
in
ℜ
2
(
x
i
=
F(
Ω
c
)
with
Ω
c
=
(
φ,ψ
)
t
). A function from
ℜ
3
into
ℜ
2
is never invertible, thus putting
F
in a compact form is a necessary condition to ﬁnd out an inverse mapping
F
−
1
. If
F
−
1
exists then two of the three3D parameters of motion can be recovered from image (
Ω
c
=
F
−
1
(
x
i
)) and the jacobian matrix
J
F
can bewritten in term of image coordinates
x
i
and
y
i
. By verifying the “compactness constraint” the relationshipsin position and velocity between the 3D world and theimage plane tend to be more compact and intuitive andvision yields all the information about two of the 3Dworld coordinates and none about the third one.
h
ρ
=
0
∧
g
ρ
=
0 (10)
2.2.2. The Decoupling Constraint.
Assume that the“compactness constraint” is veriﬁed. This means thata coordinate transformation
T
is used such that imagecoordinates (
x
i
,
y
i
) depend only on (
φ,ψ
). It would bealso advantageous to deﬁne a world coordinate systemsuch that
x
i
depends only of
φ
and
y
i
depends only of
ψ
. This is equivalent to say that
h
ψ
and
g
φ
are bothzero. The one to one correspondence is an advantageous feature allowing a better understanding of theimaging process and simplifying subsequent calculations.Ifacoordinatetransformation
T
isusedsuchthatbothEqs.(10)and(11)areveriﬁedthenitissaidthat
F
is in a compact and decoupled form and that
T
veriﬁesboththe“compactnessconstraint”andthe“decouplingconstraint”.
h
ψ
=
0
∧
g
φ
=
0 (11)In short, given a general central projection mapping,thegoalistoselectacoordinatetransformation
T
verifying both:
•
the “compactness constraint” (Eq. (10))
•
the “decoupling constraint” (Eq. (11))Thecoordinatesystemusedtoreferencepointsinthescene does not change the intrinsic nature of the mapping nor introduces any additional information. Thereare situations where it is impossible to ﬁnd a worldcoordinates transformation that veriﬁes the “compactness constraint” and/or the “decoupling constraint.”Methodologies to ﬁnd out if it exists such a transformation will be introduced latter.
2.3. Conventional Perspective Camera
Consider image acquisition performed by a static conventional perspective camera. The image formationprocess follows the scheme depicted in Fig. 1 wherefunction
f
i
is given by Eq. (12). Assume that the matrixofintrinsicparametersis
K
=
I
and
P
=
[
I

0
](theorigin of the cartesian reference frame is coincident withthe camera center and the image plane is perpendicularto the
Z
axis). This section derives a world coordinatesystem that veriﬁes both the compactness and decoupling constraint. If nothing is stated we will work withtheinversetransformation
Γ
insteadofthedirecttransformation
T
.
f
i
()
: (
x
,
y
,
z
)
−→
x z
,
y z
(12)
2.3.1. Constraining
Γ
to Obtain a New World Coordi nate System.
Functions
f
i
,
P
and
f
h
, as well as their jacobian matrices, are deﬁned for the perspective camera case. Replacing
J
Γ
(Eq. (6)) in Eq. (7) yields
J
F
interms of the partial derivatives of the scalar functionsof
Γ
(the computation is omitted). If
F
is in a compactformthenthethirdcolumnof
J
F
mustbezero(Eq.(10))which leads to Eqs. (13). A transformation of coordinates
Γ
that veriﬁes the compactness constraint can becomputed by solving the partial differential Eqs. (13)withrespecttothescalarfunctions
φ
,
ψ
and
ρ
(Eq.(5)).
Z
(
φ
Y
ψ
Z
−
φ
Z
ψ
Y
)
+
X
(
φ
Y
ψ
X
−
φ
X
ψ
Y
)
=
0
Z
(
φ
Z
ψ
X
−
φ
X
ψ
Z
)
+
Y
(
φ
Y
ψ
X
−
φ
X
ψ
Y
)
=
0(13)The partial differential equations corresponding tothe “decoupling constraint” can be derived in a similarway. If the mapping
F
is decoupled then both
h
ψ
and
g
φ
must be zero, which leads to Eq. (14). A world coordinate transformation
Γ
verifying both the compactness and the decoupling constraint can be computedby solving simultaneously Eqs. (13) and (14). Nevertheless the integration of systems of partial differentialequations can be difﬁcult and in general it generatesmany solutions. Adequate coordinate systems will bederivedbygeometricalmeans.Equations(13)and(14)will be used to prove that the selected coordinate transformation veriﬁes the compactness and/or decouplingconstraints.
Z
(
φ
Z
ρ
Y
−
φ
Y
ρ
Z
)
+
X
(
φ
Y
ρ
X
−
φ
X
ρ
Y
)
=
0
Z
(
ψ
Z
ρ
X
−
ψ
X
ρ
Z
)
+
Y
(
ψ
Y
ρ
X
−
ψ
X
ρ
Y
)
=
0(14)