Description

Krylov subspace methods are used for solving systems of linear equations

All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Related Documents

Share

Transcript

A general Krylov method for solving symmetric systems of linear equations
Anders FORSGREN
∗
Tove ODLAND
∗
Technical Report TRITA-MAT-2014-OS-01Department of MathematicsKTH Royal Institute of TechnologyMarch 2014
Abstract
Krylov subspace methods are used for solving systems of linear equations
Hx
+
c
= 0. We present a general Krylov subspace method that can be appliedto a symmetric system of linear equations, i.e., for a system in which
H
=
H
T
.In each iteration, we have a choice of scaling of the orthogonal vectors thatsuccessively make the Krylov subspaces available. We deﬁne an extended rep-resentation of each such Krylov vector, so that the Krylov vector is representedby a triple. We show that our Krylov subspace method is able to determine if the system of equations is compatible. If so, a solution is computed. If not, acertiﬁcate of incompatibility is computed. The method of conjugate gradientsis obtained as a special case of our general method. Our framework gives a wayto understand the method of conjugate gradients, in particular when
H
is not(positive) deﬁnite. Finally, we derive a minimum-residual method based on ourframework and show how the iterates may be updated explicitly based on thetriples.
1. Introduction
An important problem in numerical linear algebra is to solve a system of equationswhere the matrix is symmetric. Such a problem may be posed as
Hx
+
c
= 0
,
(1.1)for
x
∈
R
n
, with
c
∈
R
n
and
H
=
H
T
∈
R
n
×
n
. Our primary motivation comesfrom KKT systems arising in optimization, where
H
is symmetric but in generalindeﬁnite, see, e.g., [5], but there are many other applications. Throughout,
H
isassumed symmetric, any other assumptions on
H
at particular instances will bestated explicitly. It is also assumed throughout that
c
= 0.A strategy for a class of methods for solving (1.1) is to generate linearly in-dependent vectors,
q
k
,
k
= 0
,
1
,...
until
q
k
becomes linearly dependent for some
k
=
r
≤
n
and hence
q
r
= 0. These methods will have ﬁnite termination properties.
∗
Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Tech-nology, SE-100 44 Stockholm, Sweden (
andersf@kth.se,odland@kth.se
). Research partially sup-ported by the Swedish Research Council (VR).
2 A general Krylov method for solving symmetric systems of linear equations
In this paper we consider a general Krylov subspace method in which the generatedvectors form an orthogonal basis for the Krylov subspaces generated by
H
and
c
,
K
0
(
c,H
) =
{
0
}
,
K
k
(
c,H
) =
span
{
c,Hc,H
2
c,...,H
k
−
1
c
}
, k
= 1
,
2
,....
(1.2)In our method, the process by which these vectors are generated bear much re-semblance to the Lanczos process, with the diﬀerence that the available scalingparameter is left explicitly undecided. The method of conjugate gradients is derivedas a special case of this general Krylov method. In addition, the minimum-residualmethod and explicit recursions for the minimum-residual iterates can be derivedbased on our framework.There have been very many contributions to the theory regarding the Lanczosprocess, the method of conjugate gradients and minimum-residual methods, see,e.g., [6] for an extensive survey of the years 1948-1976 and [7].We will assume exact arithmetic and the theory developed in this paper is basedon that assumption. In the end of the paper we brieﬂy discuss computational aspectsof our results in ﬁnite precision.The outline of the paper is as follows. In Section 2 we deﬁne and give a re-cursion for the Krylov vectors
q
k
∈ K
k
+1
(
c,H
),
k
= 0
,...,r
. In Section 3 wedeﬁne triples (
q
k
,y
k
,δ
k
) associated with
q
k
,
k
= 0
,...,r
, such that
q
k
=
Hy
k
+
δ
k
c
,
q
k
∈ K
k
+1
(
c,H
),
y
k
∈ K
k
(
c,H
) and
δ
k
∈
R
,
k
= 0
,...,r
. These triples are deﬁnedup to a scaling and recursions for them are given in Proposition 3.1. We then stateseveral results concerning the triples and using these results we devise an algorithmin Section 3.3 that either gives the solution, if (1.1) is compatible (in this case weshow that
δ
r
= 0), or gives a certiﬁcate of incompatibility (in this case
δ
r
= 0). Themain attribute is that if we do not require
δ
k
to attain a speciﬁc value or sign, thenwe do not have to require anything more than symmetry of
H
.Further, in Section 4, the method of conjugate gradients is illustrated in ourframework, which gives a better way of understanding its behavior, in particularwhen
H
is not a (positive) deﬁnite matrix. Finally, in Section 5, a minimum-residualmethod applicable also for incompatible systems is illustrated in this framework andexplicit recursions for the for the minimum-residual iterates are given. An algorithmfor this method is given in Section 5.1.
1.1. Notation
The letter
i
,
j
and
k
denote integer indices, other lowercase letters such as
q
,
y
and
c
denote column vectors, possibly with super- and
/
or subscripts. For a symmetricmatrix
H
,
H
0 denotes
H
positive deﬁnite. Analogously,
H
0 is used to denote
H
positive semideﬁnite.
N
(
H
) denotes the null space of
H
and
R
(
H
) denotes therange space of
H
. We will denote by
Z
an orthonormal matrix whose columns forma basis for
N
(
H
). If
H
is nonsingular, then
Z
is to be interpreted as an emptymatrix. When referring to a norm, the Euclidean norm is used throughout.
2. Background 3
2. Background
Regarding (1.1), the raw data available is the matrix
H
and the vector
c
and com-binations of the two, for example represented by the Krylov subspaces generatedby
H
and
c
, as deﬁned in (1.2). For an introduction and background on Krylovsubspaces, see, e.g., [18].Without loss of generality, the scaling of the ﬁrst Krylov vector
q
0
may be chosenso that
q
0
=
c
. Then one sequence of linearly independent vectors may be generatedby letting
q
k
∈ K
k
+1
(
c,H
)
∩K
k
(
c,H
)
⊥
,
k
= 1
,...,r
, such that
q
k
= 0, for
k
=0
,
1
,...,r
−
1 and
q
r
= 0 where
r
is the minimum index
k
for which
q
k
∈K
k
+1
(
c,H
)
∩K
k
(
c,H
)
⊥
=
{
0
}
. These vectors
{
q
0
,q
1
,...,q
r
−
1
}
form an orthogonal, hence linearlyindependent, basis of
K
r
(
c,H
). We will refer to these vectors as the Krylov vectors.With
q
0
=
c
, each vector
q
k
,
k
= 1
,...,r
−
1, is uniquely determined up to a scaling.A vector
q
k
∈K
k
+1
(
c,H
) may be expressed as
q
k
=
k
j
=0
δ
(
j
)
k
H
j
c,
(2.1)for some parameters
δ
(
j
)
k
,
j
= 0
,...,k
. In order to ensure that
q
k
= 0 if and onlyif
H
k
c
is linearly dependent on
c
,
Hc
, ...,
H
k
−
1
c
, it must hold that
δ
(
k
)
k
= 0. Alsonote that for
k < r
, the vectors
c
,
Hc
, ...,
H
k
−
1
c
are linearly independent. Hence,since
q
k
is uniquely determined up to a nonzero scaling, so are
δ
(
j
)
k
,
j
= 0
,...,k
. For
k
=
r
,
q
r
= 0, and since
δ
(
r
)
r
= 0, the same argument shows that
δ
(
j
)
r
,
j
= 0
,...,r
,are uniquely determined up to a common nonzero scaling. This is made precise inLemma A.1.The following proposition states a recursion for such a sequence of vectors wherethe scaling factors denoted by
{
α
k
}
r
−
1
k
=0
is left unspeciﬁed. The recursion in Propo-sition 2.1 is a slight generalization of the Lanczos process for generating mutuallyorthogonal vectors, see [10,11], in which the scaling of each vector
q
k
is chosen suchthat
||
q
k
||
= 1,
k
= 0
,...,r
−
1. For completeness, this proposition and its proof isincluded.
Proposition 2.1.
Let
r
denote the smallest positive integer
k
for which
K
k
+1
(
c,H
)
∩K
k
(
c,H
)
⊥
=
{
0
}
.
Given
q
0
=
c
∈K
1
(
c,H
)
, there exist vectors
q
k
,
k
= 1
,...,r
, such that
q
k
∈K
k
+1
(
c,H
)
∩K
k
(
c,H
)
⊥
, k
= 1
,...,r,
for which
q
k
= 0
,
k
= 1
,...,r
−
1
, and
q
r
= 0
. Each such
q
k
,
k
= 1
,...,r
−
1
, is uniquely determined up to a scaling, and a sequence
{
q
k
}
rk
=1
may be generated as
q
1
=
α
0
−
Hq
0
+
q
T
0
Hq
0
q
T
0
q
0
q
0
,
(2.2a)
q
k
+1
=
α
k
−
Hq
k
+
q
T k
Hq
k
q
T k
q
k
q
k
+
q
T k
−
1
Hq
k
q
T k
−
1
q
k
−
1
q
k
−
1
, k
= 1
,...,r
−
1
,
(2.2b)
4 A general Krylov method for solving symmetric systems of linear equations where
α
k
,
k
= 0
,...,r
−
1
, are free and nonzero parameters. In addition, it holds that
q
T k
+1
q
k
+1
=
−
α
k
q
T k
+1
Hq
k
, k
= 0
,...,r
−
1
.
(2.3)
Proof.
Given
q
0
=
c
, let
k
be an integer such that 1
≤
k
≤
r
−
1. Assume that
q
i
,
i
= 0
,...,k
, are mutually orthogonal with
q
i
∈ K
i
+1
(
c,H
)
∩K
i
(
c,H
)
⊥
. Let
q
k
+1
∈K
k
+2
(
c,H
) be expressed as
q
k
+1
=
−
α
k
Hq
k
+
k
i
=0
η
(
i
)
k
q
i
, k
= 0
,...,r
−
1
,
(2.4)In order for
q
k
+1
to be orthogonal to
q
i
,
i
= 0
,...,k
, the parameters
η
(
i
)
k
,
i
= 0
,...,k
,are uniquely determined as follows.For
k
= 0, to have
q
T
0
q
1
= 0, it must hold that
η
(0)0
=
α
0
q
T
0
Hq
0
q
T
0
q
0
,
hence obtaining
q
1
∈K
2
(
c,H
)
∩K
1
(
c,H
)
⊥
as in (2.2a), where
α
0
is free and nonzero.For
k
such that 1
≤
k
≤
r
−
1, in order to have
q
T i
q
k
+1
= 0,
i
= 0
,...,k
, it musthold that
η
(
k
)
k
=
α
k
q
T k
Hq
k
q
T k
q
k
, η
(
k
−
1)
k
=
α
k
q
T k
−
1
Hq
k
q
T k
−
1
q
k
−
1
,
and
η
(
i
)
k
= 0
, i
= 0
,...,k
−
2
.
The last relation follows by the symmetry of
H
. Hence, obtaining
q
k
+1
∈K
k
+2
(
c,H
)
∩K
k
+1
(
c,H
)
⊥
as in the three-term recurrence of (2.2b), where
α
k
,
k
= 1
,...,r
−
1,are free and nonzero.Since
q
1
is orthogonal to
q
0
, and since
q
k
+1
is orthogonal to
q
k
and
q
k
−
1
,
k
=1
,...,r
−
1, pre-multiplication of (2.2) with
q
T k
+1
yields
q
T k
+1
q
k
+1
=
−
α
k
q
T k
+1
Hq
k
, k
= 0
,...,r
−
1
.
Finally note that if
q
k
+1
is given by (2.2), then the only term that increases the powerof
H
is
α
k
(
−
Hq
k
). Since
α
k
= 0, repeated use of this argument gives
δ
(
k
+1)
k
+1
= 0if
q
k
+1
is expressed by (2.1). In fact,
δ
(
k
+1)
k
+1
= (
−
1)
k
+1
ki
=0
α
i
= 0. Hence, byLemma A.1,
q
k
+1
= 0 implies
K
k
+2
(
c,H
)
∩K
k
+1
(
c,H
)
⊥
=
{
0
}
, so that
k
+ 1 =
r
,as required.The choice of a sign in front of the ﬁrst term of (2.4) is arbitrary. Our choice of minus-sign is made in order to get coherence with existing theory as will be shownlater on.Many methods for solving (1.1) are based on using the Lanczos process, inwhich the scaling factors are chosen such that the generated vectors have norm oneand matrix-factorization techniques are used on the symmetric tridiagonal matrixobtained by putting the three-term recurrence on matrix form. For an introductionto how Krylov subspace methods are formalized in this way, see, e.g., [2,15]. Forour purposes we leave these available scaling factors unspeciﬁed.

Search

Similar documents

Tags

Related Search

A Novel Computing Method for 3D Doubal DensitA map matching method for GPS based real timeA novel comprehensive method for real time ViSociometry as a method for investigating peerA Practical Method for the Analysis of GenetiA calculation method for diurnal temperature A simple rapid GC-FID method for the determinLattice Boltzmann method for fluid dynamicsAnalitical Method for ODEs and PDEsMPPT method for photovoltaic

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks