Economy & Finance

The Quasi-Newton Least Squares Method: A New and Fast Secant Method Analyzed for Linear Systems

Description
The Quasi-Newton Least Squares Method: A New and Fast Secant Method Analyzed for Linear Systems
Published
of 27
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Working paper: The Quasi-Newton Least SquaresMethod: A New and Fast Secant MethodAnalyzed for Linear Systems Rob Haelterman ∗ Joris Degroote † Dirk Van Heule † Jan Vierendeels ‡ Abstract We present a new quasi-Newton method that can solve systems of equations of which no information is known explicitly and which requiresno special structure of the system matrix, like positive definiteness orsparseness. The method builds an approximate Jacobian based on input-output combinations of a black box system, uses a rank-one update of thisJacobian after each iteration, and satisfies the secant equation. While ithas srcinally been developed for nonlinear equations we analyze its prop-erties and performance when applied to linear systems. Analytically, themethod is shown to be convergent in  n +1 iterations ( n  being the numberof unknowns), irrespective of the nature of the system matrix. The perfor-mance of this method is greatly superior to other quasi-Newton methodsand comparable with GMRes when tested on a number of standardizedtest-cases. Disclaimer For the published version see here: http://epubs.siam.org/doi/abs/10.1137/070710469 1 Introduction In this paper we start from a system of equations given by K  (  p ) = 0 ,  (1)where  K   :  D K  ⊂  IR n × 1 →  IR n × 1 has continuous first partial derivativesin  D K , a single solution  p ∗ ∈  D K , and a nonsingular Jacobian  K  ′ (  p ∗ ). ∗ Department of Mathematics, Royal Military Academy, Renaissancelaan 30, B-1000 Brus-sels, Belgium (Robby.Haelterman@rma.ac.be, Dirk.Van.Heule@rma.ac.be). † Department of Flow, Heat, and Combustion Mechanics, Ghent Univer-sity, St.-Pietersnieuwstraat 41, B-9000 Gent, Belgium (Joris.Degroote@ugent.be,Jan.Vierendeels@ugent.be). 1  We will solve (1) in an iterative way starting from an initial value  p o . Oneof the most widely used methods to do so is Newton’s method: K  ′ (  p s )  d s  =  − K  (  p s ) ,  (2)  p s +1  =  p s  + θ s d s  (3)( s  = 0 , 1 , 2 ,... ), where  θ s  is a scalar parameter. Newton’s method hasbeen studied extensively and exhibits superlinear convergence whenever  p o  is close enough to the exact solution  p ∗ of (1) as specified by theNewton–Kantorivich theorem [22]; if   K  ′ (  p ) is also Lipschitz continuousfor all  p  close enough to  p ∗ , then the convergence is quadratic.If the matrix of the system is not known, the Jacobian is unavailableor too expensive to compute, then a number of matrix-free methods areat our disposal that use only information derived from the consecutiveiterates and that build an approximation ˆ K  ′ s  ∈  IR n × n of   K  ′ (  p s ) based onthose values. This approach falls under the general framework of   quasi-Newton methods  .Quasi-Newton methods have been used intensively for solving linearand nonlinear systems and for minimization problems [20]. Their mainattraction is that they avoid the cumbersome computation of derivativesfor the Jacobians. Recently, interest in quasi-Newton methods has waned,as automatic differentiation has become available [10, 14], except for a recent algorithm by Eirola and Nevanlinna [11, 12], the research performed by Deuflhard (e.g., [8, 9]) and Spedicato, Xia, and Zhang [25], and work on secant-related Krylov–Newton solvers used by Dawson et al. [3].We are mainly interested in quasi-Newton methods because •  we do not have access to the Jacobian as we are working with blackbox systems, which also makes automatic differentiation impossible; •  the cost of a function evaluation is sufficiently high so that numericaldifferentiation becomes prohibitive. For this reason we will judgeperformance of the method by the number of function evaluations itneeds to converge.In this paper we propose a new quasi-Newton method that has its ori-gins in [29, 30] where an approximate Jacobian was needed for the strong coupling of fluid-structure interaction problems using two commerciallyavailable codes (for the structure and the fluid) that are considered blackboxes. The approximate Jacobian was constructed based on input-outputcombinations of each black box in a least squares sense. Introducing a newblack box system  H   such that  H  (  p ) =  K  (  p )+  p , we are able to apply thismethod to solve (1) in a matrix-free manner based on the values (  p i ,H  (  p i ))( i  = 0 , 1 , 2 ,...,s ) that arise during the iteration process when solving (1).It is shown in this paper that the method can be written as a quasi-Newtonmethod with a rank-one update of the approximate Jacobian after eachiterate.While the method has its origin as a nonlinear solver, we consideronly linear systems in this paper. Studying quasi-Newton methods forlinear problems is important not only because many problems are linearor nearly linear but also because the properties of a method in the linear 2  case often define the local convergence behavior of the method in the non-linear case. This can be understood by observing that, close to a solutionof (1) where the Jacobian is nonsingular, the linear approximation of   K  (  p )tends to be dominant. Hence, the generated sequence  p s  tends to behavelike in the linear case. This is the main reason why the local convergenceof Newton’s method is quadratic [22].When studying the analytical properties of the method for linear sys-tems, we will pose  H  (  p ) =  Ap − b , where  A  ∈  IR n × n and  p,b  ∈  IR n × 1 andwhere  A − I   is assumed to be nonsingular but without further requirementslike positive definiteness or sparseness. While most matrix-free solvers as-sume that  Ap  can be formed, we assume we can form only  H  (  p ). Finding b  by computing  H  (0) is a possibility, but, as we assumed that a “call” of  H   is very expensive, it is to be avoided.We show that line-searches cannot improve the method’s performanceand that it converges in at most  n +1 iterations. This answers one of thequestions that Martinez left open in [20], namely, the question of whetherBroyden-like methods exist that converge in fewer than 2 n  steps for linearsystems.The performance of this new method is compared with that of otherknown methods that use rank-one updates and that can be used withblack boxes, for instance Broyden’s first and second method [1, 2], the column updating method [17, 19, 20], and the inverse column updating method [16, 18]. We also compare the method with GMRes. Even though our methodhas not been srcinally developed as a linear solver, the results showthat the method has comparable, but slightly lower, performance to GM-Res when the Euclidean norm of the residual is used as the criterionand far better overall performance than the other quasi-Newton meth-ods.The paper is organized as follows: in section 2 we start with somegeneral definitions, conventions, and theorems; in section 3 we analyzethe construction of the approximate Jacobian that was proposed in [29]after adaptation to our purposes and present some theorems regarding itsproperties and convergence; in sections 4 and 5 a review is given of otherknown quasi-Newton methods and the GMRes method, respectively; insection 6 the relative performance of the different methods is comparedon the heat equation, a test-case proposed by Deuflhard [8], and on anumber of standardized test-problems from the Matrix Market repository[21]. 2 Introductory definitions and theorems We will use  e  =  p −  p ∗ ,  e s  =  p s −  p ∗ , and so on for the errors. All matrixnorms that are used are natural matrix norms, unless otherwise stated. <  · , ·  >  denotes the standard scalar product between vectors. Definition 2.1  A natural matrix norm is a matrix norm induced by a vector norm in the following manner (with   M   ∈  IR n × m ):  M    = sup x  =0  Mx  x   or equivalently    M    = sup  x  =1  Mx  .  (4) 3  Definition 2.2 (see [7])  Let   { x s } s ∈ IN   be a sequence with   x s  and   x ∗ ∈ IR n × 1 . We say that the sequence   { x s }  converges superlinearly  1 towards  x ∗ with order   α >  1  if  ∃ ǫ >  0 :   x s +1  − x ∗  ≤  ǫ  x s  − x ∗  α (5)  for any arbitrary norm   ·  in   IR n × 1 . Definition 2.3 (see [7])  Let   { x s } s ∈ IN   be a sequence with   x s  and   x ∗ ∈ IR n × 1 . We say that the sequence   { x s }  converges superlinearly towards   x ∗ if  lim s →∞  x s +1  − x ∗  x s  − x ∗   = 0 (6)  for any arbitrary norm   ·  in   IR n × 1 . Definition 2.4 (see [7])  Let   f   : Ω  ⊂  IR n × 1 →  IR m × 1 .  f   is Lipschitz continuous on   Ω  if   ∃ ǫ >  0  (the Lipschitz constant) such that  ∀  p 1 ,p 2  ∈  Ω :   f  (  p 1 ) − f  (  p 2 )  ≤  ǫ   p 1  −  p 2  .  (7) (If   ǫ <  1 , then   f   is called a contraction mapping with respect to the chosen norm.) Definition 2.5  Any   n  + 1  vectors   x o ,x 1 ,...,x n  ∈  IR n × 1 are in general position if the vectors   x n  −  x j  (   j  = 0 ,...,n  −  1 ) are linearly indepen-dent. Lemma 2.1  ∀ u,v  ∈  IR n × 1 : det( I   + uv T  ) = 1 +  u,v  . Proof   Let  P   =  I   + uv T  .For  u  = 0 or  v  = 0 the proof is trivial.If   u,v   = 0 and   u,v   = 0, then any vector orthogonal to  v  is aright eigenvector of   P   (corresponding to an eigenvalue 1) and any multi-ple of   u  is also a right eigenvector of   P   (corresponding to an eigenvalue1 +   u,v   = 0). As there are  n  −  1 vectors orthogonal to  v  and as thealgebraic multiplicity of an eigenvalue is larger than or equal to its geo-metric multiplicity, we see that the algebraic multiplicity of the eigenvalue1 is at least  n  −  1. As there is another eigenvalue different from 1, thealgebraic multiplicity of the eigenvalue 1 must be equal to  n − 1. As thedeterminant of a matrix equals the product of its eigenvalues, we havethat det P   = 1 +  u,v  .If   u,v   = 0 and   u,v   = 0, then 1 +  u,v   = 1. But then( P   − I  ) 2 =  uv T   2 =  uv T  uv T  = 0 . So the space of generalized eigenvectors corresponding to the eigenvalue1 has dimension  n . Hence, the algebraic multiplicity of the eigenvalue1 is  n  and det P   = 1. 1 These definitions are based on “q-superlinearity” as detailed in [15]; as we use only thistype of convergence criterion, we will simply use the term “superlinear.” 4  Theorem 2.1 (fundamental theorem of linear algebra 2 )  Let   M   ∈ IR n × m ; then   N  ( M  ) = ( R ( M  T  )) ⊥ , where   N  ( M  )  is the kernel (or null space) of   M  ,  R  denotes the range, and   ( · ) ⊥ gives the orthogonal comple-ment of a vector-space. For a proof of this theorem we refer to [26]. Theorem 2.2 (the Sherman–Morrison theorem)  Let   M   ∈  IR n × n be nonsingular, and let   u,v  ∈  IR n × 1 be vectors such that   v T  M  − 1 u   =  − 1 ;then   M   + uv T  is nonsingular and   M   + uv T   − 1 =  M  − 1 −  M  − 1 uv T  M  − 1 1 + v T  M  − 1 u .  (8)For the proof we refer to [24]. Theorem 2.3  Assume that  •  K   :  IR n × 1 →  IR n × 1 is differentiable in an open set   D K  ⊂  IR n × 1 ; •  the equation   K  (  p ) = 0  has a solution   p ∗ ∈  D K ; •  K  ′ :  D K  →  IR n × n is Lipschitz continuous with Lipschitz constant  κ ; •  K  ′ (  p ∗ )  is nonsingular.Assume that we use a Newton method (equation   (2)  and   (3) ) where we replace   K  (  p s )  by   ˆ K  s  =  K  (  p s )+ E  s  and   K  ′ (  p s )  by   ˆ K  ′ s  =  K  ′ (  p s )+ E  s ; then there exist   ǫ,δ  , and   τ   such that if   p s  ∈ B  (  p ∗ ,δ  )  and    E  s  ≤  τ  , then the  following properties hold: 1. ˆ K  ′ s  is nonsingular. 2.   e s +1  ≤  ǫ   e s  2 +  E  s  e s  + E  s   . For the proof of this theorem we refer to [15]. Lemma 2.2  Let   V   ∈  IR n × s be a matrix with column-rank   s ; then  V   V  T  V   − 1 V  T  =  L s L T s  ,  (9) with   L s  = [¯ L 1 | ¯ L 2 |  ...  | ¯ L s ] , where   ¯ L k  is the   k th left (normalized) singular vector of   V  . Proof   We can write the singular value decomposition 3 of   V   as  V   = LSR T  , where the singular values are given by  σ i  ( i  = 1 ,...,s ). Ac-cording to the conventions of the singular value decomposition we have S  ii  =  σ i  and  S  ij  = 0 when  i   =  j .Then we can write V   V  T  V   − 1 V  T  =  LSR T   RS  T  L T  LSR T   − 1  RS  T  L T   .  (10)As  L  is a unitary matrix this simplifies to V   V  T  V   − 1 V  T  =  LSR T   RS  T  SR T   − 1  RS  T  L T   .  (11) 3 Different conventions exist for the singular value decomposition. We will use the onewhere  L  ∈  IR n × n ,S   ∈  IR n × s , and  R  ∈  IR s × s and where the singular values are ordered in anonincreasing way. 5
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks