Working paper: The QuasiNewton Least SquaresMethod: A New and Fast Secant MethodAnalyzed for Linear Systems
Rob Haelterman
∗
Joris Degroote
†
Dirk Van Heule
†
Jan Vierendeels
‡
Abstract
We present a new quasiNewton method that can solve systems of equations of which no information is known explicitly and which requiresno special structure of the system matrix, like positive deﬁniteness orsparseness. The method builds an approximate Jacobian based on inputoutput combinations of a black box system, uses a rankone update of thisJacobian after each iteration, and satisﬁes the secant equation. While ithas srcinally been developed for nonlinear equations we analyze its properties and performance when applied to linear systems. Analytically, themethod is shown to be convergent in
n
+1 iterations (
n
being the numberof unknowns), irrespective of the nature of the system matrix. The performance of this method is greatly superior to other quasiNewton methodsand comparable with GMRes when tested on a number of standardizedtestcases.
Disclaimer
For the published version see here:
http://epubs.siam.org/doi/abs/10.1137/070710469
1 Introduction
In this paper we start from a system of equations given by
K
(
p
) = 0
,
(1)where
K
:
D
K
⊂
IR
n
×
1
→
IR
n
×
1
has continuous ﬁrst partial derivativesin
D
K
, a single solution
p
∗
∈
D
K
, and a nonsingular Jacobian
K
′
(
p
∗
).
∗
Department of Mathematics, Royal Military Academy, Renaissancelaan 30, B1000 Brussels, Belgium (Robby.Haelterman@rma.ac.be, Dirk.Van.Heule@rma.ac.be).
†
Department of Flow, Heat, and Combustion Mechanics, Ghent University, St.Pietersnieuwstraat 41, B9000 Gent, Belgium (Joris.Degroote@ugent.be,Jan.Vierendeels@ugent.be).
1
We will solve (1) in an iterative way starting from an initial value
p
o
. Oneof the most widely used methods to do so is Newton’s method:
K
′
(
p
s
)
d
s
=
−
K
(
p
s
)
,
(2)
p
s
+1
=
p
s
+
θ
s
d
s
(3)(
s
= 0
,
1
,
2
,...
), where
θ
s
is a scalar parameter. Newton’s method hasbeen studied extensively and exhibits superlinear convergence whenever
p
o
is close enough to the exact solution
p
∗
of (1) as speciﬁed by theNewton–Kantorivich theorem [22]; if
K
′
(
p
) is also Lipschitz continuousfor all
p
close enough to
p
∗
, then the convergence is quadratic.If the matrix of the system is not known, the Jacobian is unavailableor too expensive to compute, then a number of matrixfree methods areat our disposal that use only information derived from the consecutiveiterates and that build an approximation ˆ
K
′
s
∈
IR
n
×
n
of
K
′
(
p
s
) based onthose values. This approach falls under the general framework of
quasiNewton methods
.QuasiNewton methods have been used intensively for solving linearand nonlinear systems and for minimization problems [20]. Their mainattraction is that they avoid the cumbersome computation of derivativesfor the Jacobians. Recently, interest in quasiNewton methods has waned,as automatic diﬀerentiation has become available [10, 14], except for a
recent algorithm by Eirola and Nevanlinna [11, 12], the research performed
by Deuﬂhard (e.g., [8, 9]) and Spedicato, Xia, and Zhang [25], and work
on secantrelated Krylov–Newton solvers used by Dawson et al. [3].We are mainly interested in quasiNewton methods because
•
we do not have access to the Jacobian as we are working with blackbox systems, which also makes automatic diﬀerentiation impossible;
•
the cost of a function evaluation is suﬃciently high so that numericaldiﬀerentiation becomes prohibitive. For this reason we will judgeperformance of the method by the number of function evaluations itneeds to converge.In this paper we propose a new quasiNewton method that has its origins in [29, 30] where an approximate Jacobian was needed for the strong
coupling of ﬂuidstructure interaction problems using two commerciallyavailable codes (for the structure and the ﬂuid) that are considered blackboxes. The approximate Jacobian was constructed based on inputoutputcombinations of each black box in a least squares sense. Introducing a newblack box system
H
such that
H
(
p
) =
K
(
p
)+
p
, we are able to apply thismethod to solve (1) in a matrixfree manner based on the values (
p
i
,H
(
p
i
))(
i
= 0
,
1
,
2
,...,s
) that arise during the iteration process when solving (1).It is shown in this paper that the method can be written as a quasiNewtonmethod with a rankone update of the approximate Jacobian after eachiterate.While the method has its origin as a nonlinear solver, we consideronly linear systems in this paper. Studying quasiNewton methods forlinear problems is important not only because many problems are linearor nearly linear but also because the properties of a method in the linear
2
case often deﬁne the local convergence behavior of the method in the nonlinear case. This can be understood by observing that, close to a solutionof (1) where the Jacobian is nonsingular, the linear approximation of
K
(
p
)tends to be dominant. Hence, the generated sequence
p
s
tends to behavelike in the linear case. This is the main reason why the local convergenceof Newton’s method is quadratic [22].When studying the analytical properties of the method for linear systems, we will pose
H
(
p
) =
Ap
−
b
, where
A
∈
IR
n
×
n
and
p,b
∈
IR
n
×
1
andwhere
A
−
I
is assumed to be nonsingular but without further requirementslike positive deﬁniteness or sparseness. While most matrixfree solvers assume that
Ap
can be formed, we assume we can form only
H
(
p
). Finding
b
by computing
H
(0) is a possibility, but, as we assumed that a “call” of
H
is very expensive, it is to be avoided.We show that linesearches cannot improve the method’s performanceand that it converges in at most
n
+1 iterations. This answers one of thequestions that Martinez left open in [20], namely, the question of whetherBroydenlike methods exist that converge in fewer than 2
n
steps for linearsystems.The performance of this new method is compared with that of otherknown methods that use rankone updates and that can be used withblack boxes, for instance Broyden’s ﬁrst and second method [1, 2], the
column updating method [17, 19, 20], and the inverse column updating
method [16, 18].
We also compare the method with GMRes. Even though our methodhas not been srcinally developed as a linear solver, the results showthat the method has comparable, but slightly lower, performance to GMRes when the Euclidean norm of the residual is used as the criterionand far better overall performance than the other quasiNewton methods.The paper is organized as follows: in section 2 we start with somegeneral deﬁnitions, conventions, and theorems; in section 3 we analyzethe construction of the approximate Jacobian that was proposed in [29]after adaptation to our purposes and present some theorems regarding itsproperties and convergence; in sections 4 and 5 a review is given of otherknown quasiNewton methods and the GMRes method, respectively; insection 6 the relative performance of the diﬀerent methods is comparedon the heat equation, a testcase proposed by Deuﬂhard [8], and on anumber of standardized testproblems from the Matrix Market repository[21].
2 Introductory deﬁnitions and theorems
We will use
e
=
p
−
p
∗
,
e
s
=
p
s
−
p
∗
, and so on for the errors. All matrixnorms that are used are natural matrix norms, unless otherwise stated.
<
·
,
·
>
denotes the standard scalar product between vectors.
Deﬁnition 2.1
A natural matrix norm is a matrix norm induced by a vector norm in the following manner (with
M
∈
IR
n
×
m
):
M
= sup
x
=0
Mx
x
or equivalently
M
= sup
x
=1
Mx
.
(4)
3
Deﬁnition 2.2 (see [7])
Let
{
x
s
}
s
∈
IN
be a sequence with
x
s
and
x
∗
∈
IR
n
×
1
. We say that the sequence
{
x
s
}
converges superlinearly
1
towards
x
∗
with order
α >
1
if
∃
ǫ >
0 :
x
s
+1
−
x
∗
≤
ǫ
x
s
−
x
∗
α
(5)
for any arbitrary norm
·
in
IR
n
×
1
.
Deﬁnition 2.3 (see [7])
Let
{
x
s
}
s
∈
IN
be a sequence with
x
s
and
x
∗
∈
IR
n
×
1
. We say that the sequence
{
x
s
}
converges superlinearly towards
x
∗
if
lim
s
→∞
x
s
+1
−
x
∗
x
s
−
x
∗
= 0 (6)
for any arbitrary norm
·
in
IR
n
×
1
.
Deﬁnition 2.4 (see [7])
Let
f
: Ω
⊂
IR
n
×
1
→
IR
m
×
1
.
f
is Lipschitz continuous on
Ω
if
∃
ǫ >
0
(the Lipschitz constant) such that
∀
p
1
,p
2
∈
Ω :
f
(
p
1
)
−
f
(
p
2
)
≤
ǫ
p
1
−
p
2
.
(7)
(If
ǫ <
1
, then
f
is called a contraction mapping with respect to the chosen norm.)
Deﬁnition 2.5
Any
n
+ 1
vectors
x
o
,x
1
,...,x
n
∈
IR
n
×
1
are in general position if the vectors
x
n
−
x
j
(
j
= 0
,...,n
−
1
) are linearly independent.
Lemma 2.1
∀
u,v
∈
IR
n
×
1
: det(
I
+
uv
T
) = 1 +
u,v
.
Proof
Let
P
=
I
+
uv
T
.For
u
= 0 or
v
= 0 the proof is trivial.If
u,v
= 0 and
u,v
= 0, then any vector orthogonal to
v
is aright eigenvector of
P
(corresponding to an eigenvalue 1) and any multiple of
u
is also a right eigenvector of
P
(corresponding to an eigenvalue1 +
u,v
= 0). As there are
n
−
1 vectors orthogonal to
v
and as thealgebraic multiplicity of an eigenvalue is larger than or equal to its geometric multiplicity, we see that the algebraic multiplicity of the eigenvalue1 is at least
n
−
1. As there is another eigenvalue diﬀerent from 1, thealgebraic multiplicity of the eigenvalue 1 must be equal to
n
−
1. As thedeterminant of a matrix equals the product of its eigenvalues, we havethat det
P
= 1 +
u,v
.If
u,v
= 0 and
u,v
= 0, then 1 +
u,v
= 1. But then(
P
−
I
)
2
=
uv
T
2
=
uv
T
uv
T
= 0
.
So the space of generalized eigenvectors corresponding to the eigenvalue1 has dimension
n
. Hence, the algebraic multiplicity of the eigenvalue1 is
n
and det
P
= 1.
1
These deﬁnitions are based on “qsuperlinearity” as detailed in [15]; as we use only thistype of convergence criterion, we will simply use the term “superlinear.”
4
Theorem 2.1 (fundamental theorem of linear algebra
2
)
Let
M
∈
IR
n
×
m
; then
N
(
M
) = (
R
(
M
T
))
⊥
, where
N
(
M
)
is the kernel (or null space) of
M
,
R
denotes the range, and
(
·
)
⊥
gives the orthogonal complement of a vectorspace.
For a proof of this theorem we refer to [26].
Theorem 2.2 (the Sherman–Morrison theorem)
Let
M
∈
IR
n
×
n
be nonsingular, and let
u,v
∈
IR
n
×
1
be vectors such that
v
T
M
−
1
u
=
−
1
;then
M
+
uv
T
is nonsingular and
M
+
uv
T
−
1
=
M
−
1
−
M
−
1
uv
T
M
−
1
1 +
v
T
M
−
1
u .
(8)For the proof we refer to [24].
Theorem 2.3
Assume that
•
K
:
IR
n
×
1
→
IR
n
×
1
is diﬀerentiable in an open set
D
K
⊂
IR
n
×
1
;
•
the equation
K
(
p
) = 0
has a solution
p
∗
∈
D
K
;
•
K
′
:
D
K
→
IR
n
×
n
is Lipschitz continuous with Lipschitz constant
κ
;
•
K
′
(
p
∗
)
is nonsingular.Assume that we use a Newton method (equation
(2)
and
(3)
) where we replace
K
(
p
s
)
by
ˆ
K
s
=
K
(
p
s
)+
E
s
and
K
′
(
p
s
)
by
ˆ
K
′
s
=
K
′
(
p
s
)+
E
s
; then there exist
ǫ,δ
, and
τ
such that if
p
s
∈ B
(
p
∗
,δ
)
and
E
s
≤
τ
, then the following properties hold:
1. ˆ
K
′
s
is nonsingular.
2.
e
s
+1
≤
ǫ
e
s
2
+
E
s
e
s
+
E
s
.
For the proof of this theorem we refer to [15].
Lemma 2.2
Let
V
∈
IR
n
×
s
be a matrix with columnrank
s
; then
V
V
T
V
−
1
V
T
=
L
s
L
T s
,
(9)
with
L
s
= [¯
L
1

¯
L
2

...

¯
L
s
]
, where
¯
L
k
is the
k
th left (normalized) singular vector of
V
.
Proof
We can write the singular value decomposition
3
of
V
as
V
=
LSR
T
, where the singular values are given by
σ
i
(
i
= 1
,...,s
). According to the conventions of the singular value decomposition we have
S
ii
=
σ
i
and
S
ij
= 0 when
i
=
j
.Then we can write
V
V
T
V
−
1
V
T
=
LSR
T
RS
T
L
T
LSR
T
−
1
RS
T
L
T
.
(10)As
L
is a unitary matrix this simpliﬁes to
V
V
T
V
−
1
V
T
=
LSR
T
RS
T
SR
T
−
1
RS
T
L
T
.
(11)
3
Diﬀerent conventions exist for the singular value decomposition. We will use the onewhere
L
∈
IR
n
×
n
,S
∈
IR
n
×
s
, and
R
∈
IR
s
×
s
and where the singular values are ordered in anonincreasing way.
5