School Work

A general Krylov method for solving symmetric systems of linear equations

Description
Krylov subspace methods are used for solving systems of linear equations
Categories
Published
of 22
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A general Krylov method for solving symmetric systems of linear equations Anders FORSGREN ∗ Tove ODLAND ∗ Technical Report TRITA-MAT-2014-OS-01Department of MathematicsKTH Royal Institute of TechnologyMarch 2014 Abstract Krylov subspace methods are used for solving systems of linear equations Hx + c  = 0. We present a general Krylov subspace method that can be appliedto a symmetric system of linear equations, i.e., for a system in which  H   =  H  T  .In each iteration, we have a choice of scaling of the orthogonal vectors thatsuccessively make the Krylov subspaces available. We define an extended rep-resentation of each such Krylov vector, so that the Krylov vector is representedby a triple. We show that our Krylov subspace method is able to determine if the system of equations is compatible. If so, a solution is computed. If not, acertificate of incompatibility is computed. The method of conjugate gradientsis obtained as a special case of our general method. Our framework gives a wayto understand the method of conjugate gradients, in particular when  H   is not(positive) definite. Finally, we derive a minimum-residual method based on ourframework and show how the iterates may be updated explicitly based on thetriples. 1. Introduction An important problem in numerical linear algebra is to solve a system of equationswhere the matrix is symmetric. Such a problem may be posed as Hx  +  c  = 0 ,  (1.1)for  x  ∈  R n , with  c  ∈  R n and  H   =  H  T  ∈  R n × n . Our primary motivation comesfrom KKT systems arising in optimization, where  H   is symmetric but in generalindefinite, see, e.g., [5], but there are many other applications. Throughout,  H   isassumed symmetric, any other assumptions on  H   at particular instances will bestated explicitly. It is also assumed throughout that  c  = 0.A strategy for a class of methods for solving (1.1) is to generate linearly in-dependent vectors,  q  k ,  k  = 0 , 1 ,...  until  q  k  becomes linearly dependent for some k  =  r ≤ n  and hence  q  r  = 0. These methods will have finite termination properties. ∗ Optimization and Systems Theory, Department of Mathematics, KTH Royal Institute of Tech-nology, SE-100 44 Stockholm, Sweden ( andersf@kth.se,odland@kth.se ). Research partially sup-ported by the Swedish Research Council (VR).  2 A general Krylov method for solving symmetric systems of linear equations  In this paper we consider a general Krylov subspace method in which the generatedvectors form an orthogonal basis for the Krylov subspaces generated by  H   and  c , K 0 ( c,H  ) = { 0 } ,  K k ( c,H  ) =  span { c,Hc,H  2 c,...,H  k − 1 c } , k  = 1 , 2 ,....  (1.2)In our method, the process by which these vectors are generated bear much re-semblance to the Lanczos process, with the difference that the available scalingparameter is left explicitly undecided. The method of conjugate gradients is derivedas a special case of this general Krylov method. In addition, the minimum-residualmethod and explicit recursions for the minimum-residual iterates can be derivedbased on our framework.There have been very many contributions to the theory regarding the Lanczosprocess, the method of conjugate gradients and minimum-residual methods, see,e.g., [6] for an extensive survey of the years 1948-1976 and [7].We will assume exact arithmetic and the theory developed in this paper is basedon that assumption. In the end of the paper we briefly discuss computational aspectsof our results in finite precision.The outline of the paper is as follows. In Section 2 we define and give a re-cursion for the Krylov vectors  q  k  ∈ K k +1 ( c,H  ),  k  = 0 ,...,r . In Section 3 wedefine triples ( q  k ,y k ,δ  k ) associated with  q  k ,  k  = 0 ,...,r , such that  q  k  =  Hy k  + δ  k c , q  k  ∈ K k +1 ( c,H  ),  y k  ∈ K k ( c,H  ) and  δ  k  ∈  R ,  k  = 0 ,...,r . These triples are definedup to a scaling and recursions for them are given in Proposition 3.1. We then stateseveral results concerning the triples and using these results we devise an algorithmin Section 3.3 that either gives the solution, if (1.1) is compatible (in this case weshow that  δ  r   = 0), or gives a certificate of incompatibility (in this case  δ  r  = 0). Themain attribute is that if we do not require  δ  k  to attain a specific value or sign, thenwe do not have to require anything more than symmetry of   H  .Further, in Section 4, the method of conjugate gradients is illustrated in ourframework, which gives a better way of understanding its behavior, in particularwhen  H   is not a (positive) definite matrix. Finally, in Section 5, a minimum-residualmethod applicable also for incompatible systems is illustrated in this framework andexplicit recursions for the for the minimum-residual iterates are given. An algorithmfor this method is given in Section 5.1. 1.1. Notation The letter  i ,  j  and  k  denote integer indices, other lowercase letters such as  q  ,  y  and c  denote column vectors, possibly with super- and / or subscripts. For a symmetricmatrix  H  ,  H    0 denotes  H   positive definite. Analogously,  H    0 is used to denote H   positive semidefinite.  N  ( H  ) denotes the null space of   H   and  R ( H  ) denotes therange space of   H  . We will denote by  Z   an orthonormal matrix whose columns forma basis for  N  ( H  ). If   H   is nonsingular, then  Z   is to be interpreted as an emptymatrix. When referring to a norm, the Euclidean norm is used throughout.  2. Background 3  2. Background Regarding (1.1), the raw data available is the matrix  H   and the vector  c  and com-binations of the two, for example represented by the Krylov subspaces generatedby  H   and  c , as defined in (1.2). For an introduction and background on Krylovsubspaces, see, e.g., [18].Without loss of generality, the scaling of the first Krylov vector  q  0  may be chosenso that  q  0  =  c . Then one sequence of linearly independent vectors may be generatedby letting  q  k  ∈ K k +1 ( c,H  ) ∩K k ( c,H  ) ⊥ ,  k  = 1 ,...,r , such that  q  k   = 0, for  k  =0 , 1 ,...,r − 1 and  q  r  = 0 where  r  is the minimum index  k  for which  q  k  ∈K k +1 ( c,H  ) ∩K k ( c,H  ) ⊥ = { 0 } . These vectors { q  0 ,q  1 ,...,q  r − 1 } form an orthogonal, hence linearlyindependent, basis of  K r ( c,H  ). We will refer to these vectors as the Krylov vectors.With  q  0  =  c , each vector  q  k ,  k  = 1 ,...,r − 1, is uniquely determined up to a scaling.A vector  q  k  ∈K k +1 ( c,H  ) may be expressed as q  k  = k   j =0 δ  (  j ) k  H   j c,  (2.1)for some parameters  δ  (  j ) k  ,  j  = 0 ,...,k . In order to ensure that  q  k  = 0 if and onlyif   H  k c  is linearly dependent on  c ,  Hc , ...,  H  k − 1 c , it must hold that  δ  ( k ) k   = 0. Alsonote that for  k < r , the vectors  c ,  Hc , ...,  H  k − 1 c  are linearly independent. Hence,since  q  k  is uniquely determined up to a nonzero scaling, so are  δ  (  j ) k  ,  j  = 0 ,...,k . For k  =  r ,  q  r  = 0, and since  δ  ( r ) r   = 0, the same argument shows that  δ  (  j ) r  ,  j  = 0 ,...,r ,are uniquely determined up to a common nonzero scaling. This is made precise inLemma A.1.The following proposition states a recursion for such a sequence of vectors wherethe scaling factors denoted by  { α k } r − 1 k =0  is left unspecified. The recursion in Propo-sition 2.1 is a slight generalization of the Lanczos process for generating mutuallyorthogonal vectors, see [10,11], in which the scaling of each vector  q  k  is chosen suchthat  || q  k || = 1,  k  = 0 ,...,r − 1. For completeness, this proposition and its proof isincluded. Proposition 2.1.  Let   r  denote the smallest positive integer   k  for which  K k +1 ( c,H  ) ∩K k ( c,H  ) ⊥ = { 0 } . Given   q  0  =  c ∈K 1 ( c,H  ) , there exist vectors   q  k ,  k  = 1 ,...,r , such that  q  k  ∈K k +1 ( c,H  ) ∩K k ( c,H  ) ⊥ , k  = 1 ,...,r,  for which   q  k   = 0 ,  k  = 1 ,...,r − 1 , and   q  r  = 0 . Each such   q  k ,  k  = 1 ,...,r − 1 , is uniquely determined up to a scaling, and a sequence   { q  k } rk =1  may be generated as  q  1  =  α 0  − Hq  0  +  q  T  0  Hq  0 q  T  0  q  0 q  0  ,  (2.2a) q  k +1  =  α k  − Hq  k  +  q  T k  Hq  k q  T k  q  k q  k  + q  T k − 1 Hq  k q  T k − 1 q  k − 1 q  k − 1  , k  = 1 ,...,r − 1 ,  (2.2b)  4 A general Krylov method for solving symmetric systems of linear equations where   α k ,  k  = 0 ,...,r − 1 , are free and nonzero parameters. In addition, it holds that  q  T k +1 q  k +1  = − α k q  T k +1 Hq  k , k  = 0 ,...,r − 1 .  (2.3) Proof.  Given  q  0  =  c , let  k  be an integer such that 1  ≤  k  ≤  r − 1. Assume that q  i ,  i  = 0 ,...,k , are mutually orthogonal with  q  i  ∈ K i +1 ( c,H  ) ∩K i ( c,H  ) ⊥ . Let q  k +1  ∈K k +2 ( c,H  ) be expressed as q  k +1  = − α k Hq  k  + k  i =0 η ( i ) k  q  i , k  = 0 ,...,r − 1 ,  (2.4)In order for  q  k +1  to be orthogonal to  q  i ,  i  = 0 ,...,k , the parameters  η ( i ) k  ,  i  = 0 ,...,k ,are uniquely determined as follows.For  k  = 0, to have  q  T  0  q  1  = 0, it must hold that η (0)0  =  α 0 q  T  0  Hq  0 q  T  0  q  0 , hence obtaining  q  1  ∈K 2 ( c,H  ) ∩K 1 ( c,H  ) ⊥ as in (2.2a), where  α 0  is free and nonzero.For  k  such that 1  ≤  k  ≤  r − 1, in order to have  q  T i  q  k +1  = 0,  i  = 0 ,...,k , it musthold that η ( k ) k  =  α k q  T k  Hq  k q  T k  q  k , η ( k − 1) k  =  α k q  T k − 1 Hq  k q  T k − 1 q  k − 1 ,  and  η ( i ) k  = 0 , i  = 0 ,...,k − 2 . The last relation follows by the symmetry of   H  . Hence, obtaining  q  k +1  ∈K k +2 ( c,H  ) ∩K k +1 ( c,H  ) ⊥ as in the three-term recurrence of (2.2b), where  α k ,  k  = 1 ,...,r − 1,are free and nonzero.Since  q  1  is orthogonal to  q  0 , and since  q  k +1  is orthogonal to  q  k  and  q  k − 1 ,  k  =1 ,...,r − 1, pre-multiplication of (2.2) with  q  T k +1  yields q  T k +1 q  k +1  = − α k q  T k +1 Hq  k , k  = 0 ,...,r − 1 . Finally note that if   q  k +1  is given by (2.2), then the only term that increases the powerof   H   is  α k ( − Hq  k ). Since  α k   = 0, repeated use of this argument gives  δ  ( k +1) k +1   = 0if   q  k +1  is expressed by (2.1). In fact,  δ  ( k +1) k +1  = ( − 1) k +1  ki =0  α i   = 0. Hence, byLemma A.1,  q  k +1  = 0 implies  K k +2 ( c,H  ) ∩K k +1 ( c,H  ) ⊥ =  { 0 } , so that  k  + 1 =  r ,as required.The choice of a sign in front of the first term of (2.4) is arbitrary. Our choice of minus-sign is made in order to get coherence with existing theory as will be shownlater on.Many methods for solving (1.1) are based on using the Lanczos process, inwhich the scaling factors are chosen such that the generated vectors have norm oneand matrix-factorization techniques are used on the symmetric tridiagonal matrixobtained by putting the three-term recurrence on matrix form. For an introductionto how Krylov subspace methods are formalized in this way, see, e.g., [2,15]. Forour purposes we leave these available scaling factors unspecified.
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks