Design

A Successive Overrelaxation Back Propagation Algorithm for Neural Network Training

Description
A variation of the classical Back--Propagation algorithm for neural network training is proposed and convergence is established using the perturbation results of Mangasarian and Solodov [1]. The algorithm is similar to the Successive Overrelaxation
Categories
Published
of 17
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  IEEETRANSACTIONSONNEURALNETWORKS,VOL.XX,NO.Y,MONTH19991 ASuccessiveOverrelaxationBackPropagation AlgorithmforNeuralNetworkTraining  RenatoDeLeone,RosarioCapparuccia,EmanuelaMerelli Abstract |AvariationoftheclassicalBack{Propagation algorithmforneuralnetworktrainingisproposedandcon-vergenceisestablishedusingtheperturbationresultsofMangasarianandSolodov1].ThealgorithmissimilartotheSuccessiveOverrelaxation(SOR)algorithmforsystemsoflinearequationsandlinearcomplementaryproblemsin usingthemostrecentlycomputedvaluesoftheweightstoupdatethevaluesontheremainingarcs. Keywords |BackPropagation,SuccessiveOverrelaxation. I.Introduction  I   Nrecentyearsgreatattentionhasbeendevotedto neuralnetworkalgorithmsforthesolutionofdicultrealworldproblems,fromvisualpatternrecognition2]toEnglishtextpronunciation3],fromproteinsecondary structureprediction4]tospeechrecognition5].Based onabiologicalanalogy6],neuralnetworkstrytoemulatethehumanbrainabilitytolearnfromexamples,learnfrom incompletedataandespeciallytogeneralizeconcepts.Aneuralnetworkiscomposedofasetofnodescon- RenatoDeLeone,RosarioCapparuccia:DipartimentodiMatem-aticaeFisica,UniversitadiCamerino,Camerino,ITALY,email:deleone@camserv.unicam.it,mc3155@mclink.it.EmanuelaMerelli:ComputingSchool,UniversitadiCamerino,Camerino,ITALY,email:merelli@camcic.unicam.itResearchoftheauthorswassupportedinpartbyagrantfromItalianNationalResearchCouncil. nectedbyarcs.Toeacharcisassociatedarealnumberthatrepresentstheweightontheconnection.Inthispa-perweareonlyinterestedinfeed{forwardnetworks7].Weassumethatthenodesarepartitionedinthreelayersorlevels:theinputlayer,themiddleorhiddenlayerand theoutputlayer.Connectionsexistonlybetweennodesintheinputandthehiddenlayerandbetweennodesin thehiddenandoutputlayer.Despitethesimplicityofthemodel,anyBooleanfunctioncanberepresentedinthismodel8],9].Thetwomostimportantproblemsthatmustbesolved inusingneuralnetworksarethedenitionofthetopol-ogyofthenetwork(i.e.,thenumberofnodesineachlevelandtheconnections)10,Chapter10]andtheweightsontheconnectingarcs.Fortheproblemofdetermining theseweights(alsocalledthelearningproblem)manyal-gorithmshavebeenproposed.TheBack{Propagational-gorithmisprobablythemostlyknownofthealgorithms11].Itconsistsoftwophases.Intherstphase,called theforwardphase,aninputispresentedtothenetwork,thenpropagatedforwardandoutputvaluesarecomputed.Thesevaluesarecomparedwiththeexpectedoutputvaluesandinthesecondphase(thebackwardphase)theweights  2IEEETRANSACTIONSONNEURALNETWORKS,VOL.XX,NO.Y,MONTH1999 ontheconnectionsaremodiedaccordingtothedier-encebetweentheexpectedandthecomputedvalue.TheBack{Propagationalgorithmscanbeconsideredasimpli-ed(orpartial)gradientdescentalgorithms.Despitethesimplicityofthealgorithm,untilrecentlywasonlyknown that,understochasticassumptions,eitherthesequenceproducedbytheBack{Propagationalgorithmdivergesorconvergesalmostsurely12].DeterministicconvergenceoftheBack{PropagationsequencewasprovedbyMangasar-ianandSolodov1],Grippo13]andLuoandTseng14].Inparticular,in1]itisshownthatthealgorithmcanbeviewedasanordinaryperturbedgradient{typealgorithm forunconstrainedminimization.In13]aregularization termisaddedtotheerrorfunctionwhilein14]apro- jectioninaboxisintroducedtoensurethattheiteratesproducedbythealgorithmremainboundedandthereforeconvergencecanbeestablished.InthispaperwewilldiscussandproveconvergenceofavariantoftheBack{Propagationalgorithmsimilarin spirittotheSuccessiveOverrelaxation(SOR)algorithm forsystemoflinearequationsandlinearcomplementar-ityproblems15],16].WecallouralgorithmtheBPSOR algorithmtodistinguishitfromthestandardalgorithm (thatwecallBPJOR).Theconvergenceprooffollowsquitecloselytheproofgivenin1].Thepaperisorganizedasfollows.InSectionIIweintro-ducethenotationsandtheoptimizationproblemweneed tosolvetodeterminetheweights.ThestandardBack{ PropagationalgorithmandtheproposedBPSORarepre-sentedinSectionIII.InSectionIV,weintroduceamoregeneraloptimizationproblemandanalgorithmthatre-ducestotheBPSORalgorithmwhenappliedtothemin-imizationproblemintroducedinSectionII.ConvergenceofthisalgorithmisshowninSectionV.Finally,inSection VI,weshowthattheassumptionsneededintheconver-gencetheoremaresatisedforthefunctionsusedinneuralnetworks.Webrieydescribeournotationnow.Allvectorsarecolumnvectors.Subscriptsindicatecomponentsofavec-tor,whilesuperscriptsareusedtoidentifydierentvectors.Thesymbol k x  k indicatestheeuclideannormofavector x  .Superscript T  indicatestranspose.Therstderivativeofarealfunction  h  ofasinglevariablewillbeindicateby  h  0 .Thegradientofafunction  f  fromIR  n toIRwillbedenotedby  r  f  ,thederivativewithrespectasinglecom-ponent x  i willbeindicateby  @f@x i .Theset LC  1 L  (IR  n )isthesetofallLipschitzcontinuousfunctionsfromIR  n toIR withLipschitzconstant L  . II.Problemnotationanddefinition  Inthissectionweintroducethenotationsforourneuralnetworkproblemandtheoptimizationproblemweneedto solveinordertodeterminetheweightsontheconnections.Weassumethatanitenumberofinstances(patterns)ofinputsandcorrespondingoutputvaluesaregiven.With   k ;k  =1 ;:::;K  wedenotetheinputsforaspecicpattern    .Thecorrespondingoutputvaluesforthesamepattern willbeindicatedby    i ;i =1 ;:::;I  .Weassumethat M  patternsarepresented.Theneuralnetworkweconsider  DELEONE,MERELLI,CAPPARUCCIA:MODIFIEDBACKPROPAGATION3 isalayeredfeed{forwardnetworkwiththreelevelsand  J  hiddenunits.Thetotalnumberofnodesistherefore K  +  I  +  J  .Wewillindicatedwith  w  jk theweightsforthearcsfromtheinputleveltothehiddenlevelnodes, j =1 ;:::;J;k  =1 ;:::;K  , W  ij theweightsforthearcsfromthehiddenlevelto theoutputlevelnodes, i =1 ;:::;I;j =1 ;:::;J  .Foreachhiddenunit j =1 ;:::;J  dene: h  j :=  K  X  k =1 w  jk  k (1)and  o  j :=  g j ( h  j )=  g j ( K  X  k =1 w  jk  k )(2)where g j istheactivationfunctionforthe j th nodeinthehiddenlayer.Similarly,foreachoutputunit i =1 ;:::;I  ,dene: H  i :=  J  X  j =1 W  ij o  j (3)and  O  i :=  G  i ( H  i )=  G  i ( J  X  j =1 W  ij o  j )(4)where G  i istheactivationfunctionforthe i th nodeintheoutputlayer.Theobjectivefunctionwewanttominimizeis:   ( w;W  ):= 12 M  X   =1 I X  i =1 (   i ?  O  i ) 2 (5)where O  i isgivenby(4).Theindividualerrorfunctionsaredenedby:    ( w;W  ):= 12 I X  i =1 (   i ?  O  i ) 2 : (6)andthereforewehave   ( w;W  )=  M  X   =1    ( w;W  ) : Thecomponentsofthegradientoftheindividualerrorfunctions    aregivenby: @  ( w;W  ) @W  qr =  ?  2 4    q ?  G  q 0 @  J  X  j =1 W  qj g j   K  X  k =1 w  jk  k ! 1 A 35 G  0 q 0 @  J  X  j =1 W  qj g j ? K  X  k =1 w  jk  k 1 A  g r   K  X  k =1 w  rk  k !  (7)and  @  ( w;W  ) @w  st =  ?  I X  i =1 ( 2 4    i ?  G  i 0 @  J  X  j =1 W  ij g j   K  X  k =1 w  jk  k ! 1 A 35 G  0 i 0 @  J  X  j =1 W  ij g j ? K  X  k =1 w  jk  k 1 A  W  is g 0 s   K  X  k =1 w  sk  k !   t )  (8)where G  0 : ()and  g 0 : ()aretherstderivativeof G  : ()and  g : ()respectively. III.TheBack{Propagationalgorithm  WearenowreadytostatethestandardBack{ Propagationalgorithm.WecallthisalgorithmBPJORto distinguishitfromthemodiedalgorithmwewillintroducelaterinthissection.InthestandardBack{Propagationalgorithmtoobtain thenewweights( w  new  ;W  new )fromthepreviousweights( w  old  ;W  old )weneedto: (i) chooseapattern(   ); (ii) updatetheweightsusingtheformula: 2 664  w  new  W  new  3775 =  2 664  w  old  W  old  3775 ?    r     0 BB@ 2 664  w  old  W  old  37751 C C A   4IEEETRANSACTIONSONNEURALNETWORKS,VOL.XX,NO.Y,MONTH1999 ThegenericstepoftheBPJORalgorithmcanberealized asfollows:(a)chooseapattern(   );(b)compute h  j using(1)forall j =1 ;:::;J  ;(c)compute H  i using(3)forall i =1 ;:::;I  ;(d)updatetheweightsforthearcsfromhiddentoout-putlayer: W  new  rq =  W  old  rq +       q ?  G  q ? H  q  G  0 q ? H  q  g r ( h  r )for r =1 ;:::;J  and  q =1 ;:::;I  ;(e)updatetheweightsforthearcsfrominputtohidden layer: w  new  st =  w  old  st +    I X  i =1    i ?  G  i ( H  i )] G  0 i ( H  i ) W  old  is g 0 s ( h  s )  t for s =1 ;:::;J  and  t =1 ;:::;K  .ThevariantoftheBack{Propagationalgorithmweintro-ducehere,diersfromthestandardalgorithmintheuseofthemostrecentinformationforupdatingoftheweights.ForthisreasonwecallthisalgorithmBPSOR.ThegenericstepoftheBPSORalgorithmcanberealized asfollows:(a)chooseapattern(   );(b)compute h  j using(1)forall j =1 ;:::;J  ;(c)compute H  i using(3)forall i =1 ;:::;I  ;(d)updatestheweightsforthearcsfromhiddento outputlayerandthequantities H  q : 8 > > > < > > > :  W  new  rq =  W  old  rq +       q ?  G  q ? H  q  G  0 q ? H  q  g r ( h  r ) H  q   H  q +    W  new  rq ?  W  old  rq  g r ( h  r )for r =1 ;:::;J  and  q =1 ;:::;I  ;(e)updatestheweightsforthearcsfrominputtohid-denlayerandthequantities h  s : 8 > > > > > > > > < > > > > > > > > :  w  new  st =  w  old  st +    I X  i =1    i ?  G  i ( H  i )] G  0 i ( H  i ) W  new  is g 0 s ( h  s )  t h  s   h  s +    w  new  st ?  w  old  st  xi t for s =1 ;:::;J  and  t =1 ;:::;K  ;Concerningtheabovealgorithmwemakethefollowing observations:  theupdatingoftheweightsusesalwaysthemostre-centlycomputedinformationabouttheweightsontheotherarcs;forthisreasonwecallthisalgorithmBP-SOR;  foreachnodeatthetop(output)level,theupdating fortherstoftheenteringarcsisidenticalforboththeBPSORandtheBPJOR.Thetwoalgorithmsdierin theupdatingoftheotherarcs.  foreachnodeinthemiddle(hidden)layer,theupdat-ingfortherstoftheenteringarcsintheBPSORand theBPSORdierintheuseof W  old  is (forBPJOR)insteadof W  new  is (forBPSOR). IV.Thegeneralproblem  Inthissectionweintroduceageneralminimizationprob-lemandshowconvergenceforanewminimizationalgo-rithmthatreducestotheBPSORalgorithm.Ourobjectiveistheunconstrainedminimizationproblem minimize f  ( x  )  DELEONE,MERELLI,CAPPARUCCIA:MODIFIEDBACKPROPAGATION5 where f  :IR  n !  IRisacontinuoslydierentiablefunction givenbythesummationofanitenumberoffunctions f  j ( x  ): f  ( x  )=  N  X  j =1 f  j ( x  ) : (9)Thealgorithmweproposehereisavariationoftheserialalgorithmstudiedin1]andtheconvergenceprooffollowscloselytheMangasarian{Solodovproof.SUM{SORAlgorithm  (i) Startwithany  x  0 2  IR  n ,set i =0.Having  x  i ,stop if 5  f  ( x  i )=0,elsecompute x  i +1 asfollows: (ii) Choose   i 2  (0 ; 1).Set z i; 1 =  x  i and  j =1 (iii) For j =2 ;:::;N  +1,having  z i;j compute z i;j +1 asfollows:foreach  k  =1 ;:::;n  computethe k  th compo-nentof z i;j +1 fromtheformula  z i;j +1 k =  z i;jk ?    i @f  j @x  k ( z i;j;k )where z i;j;k =  2 666666666666666664  z i;j +11 ... z i;j +1 k ?  1 z i;jk ... z i;jn 3777777777777777775 (iv) Set x  i +1 =  z i;N  +1 .Insteps(ii)and(iii)aboveweintroducedthevectors z i;j and  z i;j;k .Foraxedvalueof j wecomputethenew  k  ?  th  componentofthevector z i;j +1 byastepalongthegradi-entofthefunction  f  j ( : )evaluatedatthepoint z i;j;k thathastherst k  ?  1componentsequaltothecomponentsof z i;j +1 andtheremainingcomponentsequaltothoseof z i;j .Therefore,themostrecentinformationisusedinobtaining thenewercomponentofthevector z i;j +1 . V.ConvergenceoftheSUM{SORalgorithm  ToshowconvergenceofthealgorithmpresentedintheprevioussectionweshowthatitsatisesthecriteriaofTheorem2.1andCorollary2.1in1].Inparticularwemustshow(similarlytoTheorem3.1in1])thatanap-propriatechoiceoftheparameter   i (thelearningratein neuralnetworkterminology)existsthatguaranteethatforeachaccumulationpoint x  ofthesequence f x  i g wehave r  f  ( x  )=0. TheoremV.1: Let S    IR  n beanyboundedset.Supposethatthefunction  f  satisestheconditions: f  2  LC  1 L  (IR  n ) ; kr  f  ( x  ) k  M;f  ( x  )   ~ f; 8  x  2  IR  n ; (10)forsome M>  0andsome~ f  .Ifthelearningrates   i arechoseninsuchawaythat 1  X  i =0   i =  1  ; 1  X  i =0   i 2 <  1  ; (11)then,foranysequence f x  i g  S  ,generatedbytheSUM{SORAlgorithm,itfollowsthat f f  ( x  i ) g converges, f5  f  ( x  i ) g!  0,andforeachaccumulationpoint x  ofthesequence f x  i g , 5  f  ( x  )=0. Proof: WeshallshowthattheassumptionsofTheo-rem2.2in1]aresatised.
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks