Absolute Scale in Structure from Motion from a Single Vehicle Mounted Cameraby Exploiting Nonholonomic Constraints
Davide Scaramuzza
1
, Friedrich Fraundorfer
2
ETH Zurich
1
Autonomous Systems LabRoland Siegwart
1
and Marc Pollefeys
2
ETH Zurich
2
Computer Vision and Geometry Lab
Abstract
In structurefrommotion with a single camera it is wellknown that the scene can be only recovered up to a scale. Inorder to compute the absolute scale, one needs to know thebaseline of the camera motion or the dimension of at least one element in the scene. In this paper, we show that thereexists a class of structurefrommotion problems where it is possibletocomputetheabsolutescalecompletelyautomatically without using this knowledge, that is, when the camerais mounted on wheeled vehicles (e.g. cars, bikes, or mobilerobots). The construction of these vehicles puts interesting constraints on the camera motion, which are known as“nonholonomic constraints” The interesting case is whenthe camera has an offset to the vehicle’s center of motion.We show that by just knowing this offset, the absolute scalecan be computed with a good accuracy when the vehiclemoves. We give a mathematical derivation and provide ex perimental results on both simulated and real data. To our knowledge this is the ﬁrst time nonholonomic constraintsof wheeled vehicles are used to estimate the absolute scale.We believe that the proposed method can be useful in thoseresearch areas involving visual odometry and mapping withvehicle mounted cameras.
1. Introduction
Visual odometry (also called structure from motion) isthe problem of recovering the motion of a camera from thevisual input alone. This can be done by using single cameras (perspective or omnidirectional) [2, 10], stereo cam
eras [6], or multicamera systems [1]. The advantage of
using more than one camera is that both the motion and the3D structure can be computed directly in the absolute scalewhen the distance between the cameras is known. Furthermore, the cameras not necessarily need to have an overlapping ﬁeld of view, as shown in [1]. Conversely, when using a single camera the absolute scale must be computedin other ways, like by measuring the motion baseline or the
L
(a) (b)
Figure 1. If the camera is located on the vehicle’s non steeringaxle, the rotation and translation of both the camera and the carare exactly the same (a). If the camera is mounted with an offsetto the axle, rotation and translation of camera and car are different(b). While case (a) can be used to simplify motion estimation, case(b) can be used to compute absolute scale from a single camera.
size of an element in the scene [2], or by using other sensors
like IMU and GPS [7].In the case of a single camera mounted on a vehicle, thecamera follows the movement of the vehicle. Most wheeledvehicles (e.g. car, bike, mobile robot) possess an instantaneous center or rotation, that is, there exists a point aroundwhich each wheel of the vehicle follows a circular course[11]. For instance, for carlike vehicles the existence of this
point is insured by the Ackerman steering principle (Fig. 2).
This property assures that the vehicle undergoes rolling motion, i.e. without slippage. Accordingly, the motion of thevehicle can be locally described by circular motion. As wewill show in the paper, this property puts interesting constraints on the camera motion. Depending on the positionof the camera on such a vehicle, the camera can undergoexactly the same motion or deviate from it. The interestingcase is when the camera has an offset to the vehicle center of motion (see Fig. 1). By just knowing this offset andthe camera relative motion from the point correspondences,the absolute scale can be computed. The key concept of our new method is that, because of the Ackerman steeringmodel, the different motions of camera and vehicle can becomputed from the same camera measurements. Then, the
difference between them can be used to compute the absolute scale. In this paper, we describe the method to computeabsolute scale for a vehicle moving in a plane. We give aminimal solution as well as a leastsquares solution to theabsolute scale. In addition, we also present and efﬁcientalgorithm that can cope with outliers.The recent street level mapping efforts of various companies and research centers make the proposed approach veryinteresting. Inthesecasesthecarsareusuallyequippedwithasingleomnidirectionalcameraandwithournovelmethodit would be possible to compute the absolute scale of the recovered map.This paper is organized as follows. Section 2 reviewsthe related work. Section 3 explains the motion model of wheeledvehicles. Section4providestheequationsforcomputing the absolute scale. Section 5 explains how to detectthe circular motion. Finally, sections 6 and 7 present the
experimental results and conclusions.
2. Related work
The standard way to get the absolute scale in motion estimation is the use of a stereo setup with known baseline. Avery well working approach in this fashion has been demonstrated by Nister et al. [6]. The ﬁelds of views of the two
cameras were overlapping and motion estimation was doneby triangulating feature points, tracking them, and estimating new poses from them. Other approaches using stereosetups are described in [3, 4] and can be traced back to as
early as [5]. A recent approach from Clipp et al. [1] re
laxed the need of overlapping stereo cameras. They proposed a method for motion estimation including absolutescale from two nonoverlapping cameras. From independently tracked features in both cameras and with knownbaseline, full 6DOF
1
motion could be estimated. In theirapproach the motion up to scale was computed from featuretracks in one camera. The remaining absolute scale couldthen be computed from one additional feature track in theother camera.For the case of single cameras, some prior knowledgeabout the scene has been used to recover the absolute scale.Davison et al. [2] used a pattern of known size for both
initializing the feature locations and computing the absolutescale in 6DOF visual odometry. Scaramuzza et al. [10] used
the distance of the camera to the plane of motion and featuretracks from the ground plane to compute the absolute scaleinavisualodometrysystemforgroundvehicleapplications.In this paper, we propose a completely novel approach tocompute the absolute scale from a single camera mountedonavehicle. Ourmethodexploitstheconstraintimposedbynonholonomic wheeled vehicles, that is, their motion can belocally described by circular motion.
1
DOF = Degrees Of Freedom
3. Motion model of nonholonomic vehicles
A vehicle is said to be nonholonomic if its controllabledegrees of freedom are less than its total degrees of freedom [11]. An automobile is an example of a nonholonomic
vehicle. The vehicle has three degrees of freedom, namelyits position and orientation in the plane. Yet it has only twocontrollable degrees of freedom, which are the accelerationand the angle of the steering. A car’s heading (the directionin which it is traveling) must remain aligned with the orientationofthecar, or
180
◦
fromitifthecarisgoingbackward.It has no other allowable direction. The nonholonomicity of a car makes parking and turning in the road difﬁcult. Otherexamples of nonholonomic wheeled vehicles are bikes andmost mobile robots.The nonholonomicity reveals an interesting property of the vehicle’s motion, that is, the existence of an Instantaneous Center of Rotation (ICR). Indeed, for the vehicle toexhibit rolling motion without slipping, a point must existaround which each wheel of the vehicle follows a circularcourse. The ICR can be computed by intersecting all theroll axes of the wheels (see Fig. 2). For cars the existence
of the ICR is ensured by the Ackermann steering principle[11]. This principle ensures a smooth movement of the ve
hicle by applying different steering angles to the left andright front wheel while turning. This is needed as all thefour wheels move in a circle on four different radii aroundthe ICR (Fig. 2). As the reader can perceive, every point of
the vehicle and any camera installed on it undergoes locallyplanar circular motion. Straight motion can be representedalong a circle of inﬁnite radius of curvature.Let us now derive the mathematical constraint on the vehicle motion. Planar motion is described by three parameters, namely the rotation angle
θ
, the direction of translation
ϕ
v
, and the length
ρ
of the translation vector (Fig.3(a)). However, for the particular case of circular motionand when the vehicle’s origin is chosen along the nonsteering axle as in Fig. 3(a), we have the interesting property that
ϕ
v
=
θ/
2
. This property can be trivially veriﬁed by trigonometry. Accordingly, if the camera referenceframe coincides with the car reference frame, we have thatthecameramustverifythesameconstraint
ϕ
c
=
θ/
2
. However, this constraint is no longer valid if the camera has anoffset
L
with the vehicle’s srcin as shown in Fig. 3(b). Inthis case, as we will show in the next section, a more complex constraint exists which relates
ϕ
c
to
θ
through the offset
L
and the vehicle’s displacement
ρ
. Since
L
is constantand can be measured very accurately, we will show that itis then possible to estimate
ρ
(in the absolute scale) by justknowing
ϕ
c
and
θ
from point correspondences.
Figure 2. General Ackermann steering principle
.
x
v
≡
x
c
O
v
≡
O
c
θ
ICR
θ
ρ
z
v
z
c
≡
x'
v
x'
c
≡
O' O'
v c
≡
z'
v
z'
c
≡
φ θ = /2
c v
=φ
θ φ θ
c
/2
≠
ICR
θ
L
λ
x
v
O
v
z
v
x
c
O
c
z
c
x
v
O
v
z
v
x
c
O
c
z
c
(a)
L
= 0
(b)
L
= 0
Figure 3. Camera and vehicle motion under circular motion constraint. When camera and vehicle reference systems coincide
ϕ
c
=
ϕ
v
=
θ/
2
(a). When the camera has an offset
L
withthe vehicle’s srcin, we have still
ϕ
v
=
θ/
2
but
ϕ
c
=
θ/
2
(b)
.
4. Absolute scale computation
4.1. Camera undergoing planar circular motion
Figure 3 shows the camera and vehicle coordinate systems. Both coordinate systems are aligned so that there isno additional rotation between them. The camera is denotedby
P
1
and it is located at
C
1
= [0
,
0
,L
]
in the vehicle coordinate system. The camera matrix
P
1
is therefore
P
1
=
1 0 0 00 1 0 00 0 1
−
L
(1)The camera
P
1
and the vehicle now undergo the followingcircular motion denoted by the rotation
R
and the translation
T
(see also Fig. 3).
R
=
cos(
θ
) 0
−
sin(
θ
)0 1 0sin(
θ
) 0 cos(
θ
)
(2)
T
=
ρ
sin
θ
2
0cos
θ
2
(3)The transformed camera
P
2
is then
P
2
= [
R
2
t
2
] =
P
1
R
−
RT
0
1
(4)To compute the motion between the two cameras
P
2
and
P
1
the camera
P
2
can be expressed in the coordinate system of
P
1
. Let us denote it by
P
′
2
.
P
′
2
= [
R
′
2
t
′
2
] =
P
2
P
1
0 0 0 1
−
1
(5)The rotation part
R
′
2
equals
R
2
(which equals
R
) and thetranslation part
t
′
2
is
t
′
2
=
ρ
sin
θ
2
−
L
sin(
θ
)0
L
cos(
θ
)
−
ρ
cos
θ
2
−
L
(6)Then, the essential matrix
E
for our setup describing therelative motion from camera
P
1
to
P
′
2
is deﬁned as
E
=[
t
′
2
]
×
R
′
2
and can be written as:
E
=
0
L
+
ρ
cos
θ
2
−
L
cos(
θ
) 0
L
−
ρ
cos
θ
2
−
L
cos(
θ
) 0
ρ
sin
θ
2
+
L
sin(
θ
)0
ρ
sin
θ
2
−
L
sin(
θ
) 0
(7)Finally, the essential matrix can also be expressed interms of the absolute distance
λ
between the two cameracenters, and the camera relative motion
(
θ,ϕ
c
)
. Thus, using the previous expression of
R
′
2
and
t
′
2
(but now in terms
λ
,
θ
, and
ϕ
c
) we obtain:
E
=
λ
0 cos(
θ
−
ϕ
c
) 0
−
cos(
ϕ
c
) 0 sin(
ϕ
c
)0 sin(
θ
−
ϕ
c
) 0
(8)These two expressions for
E
will be used in the next sections.
4.2. Computing
ρ
and
λ
from rotation and translation angles
To recap, the parameter
ρ
is the absolute distance between the two vehicle positions (Fig. 3(a)), while
λ
is theabsolute distance between the two camera centers which is
λ
=

t
′
2

(Fig. 3(b)).It is convenient to be able to express
ρ
and
λ
in termsof the rotation angle
θ
and the directional angle
ϕ
c
of thecamera translation vector because these parameters can beestimated from feature correspondences. For this we equatethe camera center
C
′
2
=
−
R
T
t
′
2
with a parameterization in
ϕ
c
. From this we can get the equations for
ρ
and
λ
in termsof
θ
and
ϕ
c
.
ρ
=
L
sin(
ϕ
c
) +
L
sin(
θ
−
ϕ
c
)
−
sin(
θ
2
−
ϕ
c
)
(9)
λ
=
−
2
L
sin(
θ
2
)sin(
θ
2
−
ϕ
c
)
(10)Note, expressions (9) and (10) are exactly the core of this
paper, that is, we can actually compute the absolute distance between the vehicle/camera centers as a function of the camera offset
L
and the camera relative motion
(
θ,ϕ
c
)
.In the next subsection we will give a minimal and a leastsquare solution to compute the
(
θ,ϕ
c
)
directly from a set of point correspondences. Finally, note that (9) and (10) are
valid only if
θ
= 0
. Thus, we can only estimate the absolutescale if the vehicle rotates. The accuracy on the absolutescale estimates will be evaluated in Section 6.1.Observe that we can also write an equation for
L
. Thisallows us to compute the offset of the camera from the rearaxis of the vehicle from ground truth data (GPS, wheelodometry, etc.), i.e. to calibrate the camera to the vehiclecoordinate frame. By solving (9) with respect to
L
we have:
L
=
ρ
−
sin(
θ
2
−
ϕ
c
)sin(
ϕ
c
) + sin(
θ
−
ϕ
c
)
(11)
4.3. Leastsquares solution: the 3point algorithm
In this section, we provide a leastsquares solution tocompute
θ
and
ϕ
c
from a set of good feature correspondences. Two corresponding points
p
= (
x,y,z
)
T
and
p
′
= (
x
′
,y
′
,z
′
)
T
must fulﬁll the epipolar constraint
p
′
T
Ep
= 0
(12)Using the expression (8) of the essentialmatrix, the epipolar
constraint expands to:
−
xy
′
cos(
ϕ
c
) +
yx
′
cos(
θ
−
ϕ
c
) +
zy
′
sin(
ϕ
c
) +
yz
′
sin(
θ
−
ϕ
c
) = 0
.
(13)Given
m
image points,
θ
and
ϕ
c
can be computed indirectlyusing singular value decomposition of the coefﬁcient matrix
[
xy
′
,yx
′
,zy
′
,yz
′
]
being
[
h
1
,h
2
,h
3
,h
4
]
the unknown vector which is deﬁned by:
h
1
=
−
cos(
ϕ
c
)
, h
2
= cos(
θ
−
ϕ
c
)
h
3
= sin(
ϕ
c
)
, h
4
= sin(
θ
−
ϕ
c
)
.
(14)Note, as the solution is valid up to a scale, we actually needat least 3 point correspondences to ﬁnd a solution.Finally, the angles
θ
and
ϕ
c
can be derived by means of a fourquadrant inverse tangent. However, as the elementsof the unknown vector are not independent from each other,nonlinear optimization could be applied to recover more accurateestimations. Thenextsectioncovershowtodealwithit.
4.4.Minimalsolution: nonlinear2pointalgorithm
As shown by Eq. (13), the epipolar constraint can be
reduced to a nonlinear equation
f
(
θ,ϕ
c
) = 0
which can besolved by Newton’s iterative method. This method is basedon a ﬁrst order Taylor expansion of
f
, that is,
f
(
θ,ϕ
c
)
≈
f
(
θ
0
,ϕ
c
0
) +
J
f
(
θ
0
,ϕ
c
0
)
(
θ
−
θ
0
)(
ϕ
−
ϕ
c
0
)
(15)where
f
(
θ
0
,ϕ
c
0
)
can be computed from (13) and the Jaco
bian
J
f
(
θ
0
,ϕ
c
0
)
can be written as:
J
f
(
θ
0
,ϕ
c
0
) =
−
yx
′
sin(
θ
0
−
ϕ
c
0
) +
yz
′
cos(
θ
0
−
ϕ
c
0
)
xy
′
sin(
ϕ
c
0
) +
yx
′
sin(
θ
0
−
ϕ
c
0
)
−
yz
′
cos(
θ
0
−
ϕ
c
0
) +
zy
′
cos(
ϕ
c
0
)
(16)Newton’s method is an iterative method which startsfrom an initial seed and converges to the solution throughsuccessive approximations which are computed as:
θ
i
+1
ϕ
ci
+1
=
J
f
(
θ
i
,ϕ
ci
)
−
1
f
(
θ
i
,ϕ
ci
) +
θ
i
ϕ
ci
(17)In all the experimental results we had convergence by taking the point
(
θ
0
,ϕ
c
0
) = (0
,
0)
as initial seed. The algorithm converged very quickly (34 iterations). Since onlytwo unknowns are determined, two is the minimum number of matches required by this algorithm to compute thesolution.A comparison of the performance between the linear 3point and the nonlinear 2point algorithm is given in theexperimental section 6.1.
5. Circular motion detection
The equations for absolute scale estimation give onlycorrect results if the motion is circular. Thus we haveto identify sections of circular motion in a camera pathprior to computing the absolute scale. For circular motion
θ/
2
−
ϕ
v
= 0
, the idea is therefore to look for motion that satisﬁes this condition. The ﬁrst step is to compute
ϕ
v
from the camera motion
(
θ,ϕ
c
)
.
ϕ
v
is a function of
L,ρ,ϕ
c
, but
ρ
is unknown. We therefore proposeto search for
ϕ
v
(
L,ρ,ϕ
c
)
that minimizes the criterion forcircular motion
θ/
2
−
ϕ
v
by varying
ρ
. This is a 1D optimization over
ρ
that can be solved with Newton’s method.The optimization converges very quickly and returns
ϕ
v
that minimizes our circular motion condition. If the motion is circular

θ/
2
−
ϕ
v

gets very small; for noncircularmotion the condition is not exactly satisﬁed. To distinguishbetween circular and noncircular motion we introduce thethreshold
thresh
circ
. A motion is classiﬁed as circular if

θ/
2
−
ϕ
v

< thresh
circ
and noncircular otherwise.
05101520253005101520253035404550Absolute scale relative error (%) vs. rotation angleRotation angle (deg)
A b s o l u t e s c a l e r e l a t i v e e r r o r ( % )
mean non−linear 2−pointmean linear 3−pointstd non−linear 2−pointstd linear 3−point
Figure 4. The relative error
%
of the absolute scale estimate asa function of the rotation angle
θ
. Comparison between the linear 3point method (circles) and the nonlinear 2point method(squares).
6. Experiments
6.1. Synthetic data
We investigated the performance of the algorithms in geometrically realistic conditions. In particular, we simulateda vehicle moving in urban canyons where the distance between the camera and facades is about 10 meters. We setthe ﬁrst camera at the srcin and randomized scene pointsuniformly inside several different planes, which stand forthe facades of urban buildings. We used overall 1600 scenepoints. The second camera was positioned according to themotion direction of the vehicle which moves along circular trajectories about the instantaneous center of rotation.Therefore, the position of the second camera was simulatedaccording to the previous equations by taking into accountthe rotation angle
θ
, the vehicle displacement
ρ
, and the offset
L
of the camera from the vehicle’s srcin. To make ouranalysis more general, we considered an omnidirectionalcamera (with the same model used in the real experiments),therefore the scene points are projected from all directions.Finally, we also simulated feature location errors by introducing a
σ
= 0
.
3
pixel Gaussian noise in the data. Theimage resolution was set to a
640
×
480
pixels.In this experiment, we want to evaluate the accuracy of the estimated absolute scale as a function of the rotationangle
θ
. As observed in equation (9), the estimate of the
absolute scale
ρ
from the camera relative motion is possibleonly for
θ
= 0
. Therefore, we can intuitively expect that theabsolutescaleaccuracyincreaseswith
θ
. Inthisexperiment,we performed many trials (one hundred) for different valuesof
θ
(varying from 0 up to 30 deg). The results shown inFig. 4 are the average. As observed, the accuracy improveswith
θ
, with an error smaller than
5%
for
θ
larger than 10
•
Algorithm:
–
Compute camera motion estimate up to scale
–
Identify sections for which the circular motion issatisﬁed
∗
Transform camera translation vector into vehicle translation vector
∗
Check circular motion criterion (

θ
2
−
ϕ
v

<thresh
circ
)
–
Compute absolute scale (
λ,ρ
) from
θ,ϕ
c
,L
forthe detected sections
Figure 5. Outline of the absolute scale algorithm
deg. Theperformanceofthelinearandnonlinearalgorithmare similar when
θ >
10
deg, while the nonlinear methodperforms better for smaller
θ
.
6.2. Real data
In this section we demonstrate the absolute scale computation on an image sequence acquired by a car equippedwith an omnidirectional camera driving through a city in a3Km tour. A picture of our vehicle (a Smart) is shown inFig. 1. The omnidirectional camera is composed of a hyperbolic mirror (KAIDAN 360 One VR) and a digital colorcamera (SONY XCDSX910, image size
1280
×
960
pixels). The camera was installed as shown in Fig. 1(b). The
offset of the camera from the rear axle is
L
=0.9m. Thecamera system was calibrated using the toolbox from Scaramuzza [9, 8]. Images were taken at an average framerate of
10Hzatavehiclespeedrangingfrom0to45km/h. Inaninitial step, up to scale motion estimation has been performed.Wedidthisfortheall4000framesofthedataset. Inadditionto the visual measurements, we have the wheel odometrymeasurements of the car. We will use the odometry measurements as baseline to which we compare our absolutescale values. Here it should be noted that the wheel odometry does not represent exactly the same measurements asour estimated absolute scale. The wheel odometry represents the length of the arc the wheels were following whilethe absolute scale represents the direct distance between thelocations at which frames were captured. To identify sections of circular motion we look at the motion of neighboring frames. If the motion between neighboring frames istoo small we look ahead to frames that are further out. Inthe experiments we maximally look ahead 15 frames. Foreach frame pair we check if it represents circular motion bychecking if

θ/
2
−
ϕ
v

< thresh
circ
as described in section 5. The basic outline of the algorithm is described inFig. 5. Fig. 6(a) shows a single curve from the path. The
section apparently is partly a circular motion. It is quitereasonable if you look at it. In the picture, sections of cir