Documents

HTM_CLA_CS_Generative Models for Discovering Sparse Distributed Representations (CV NN SDR).pdf

Description
Generative Models for Discovering Sparse Distributed Representations Geo rey E. Hinton and Zoubin Ghahramani Department of Computer Science University of Toronto Toronto, Ontario, M5S 1A4, Canada hinton@cs.toronto.edu, zoubin@cs.toronto.edu May 9, 1997 A modi ed version to appear in Philosophical Transactions of the Royal Society B, 1997. Abstract We describe a hierarchical, generative model that can be viewed as a non-linear generalization of factor analysis and can be implemented in a neural
Categories
Published
of 25
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  GenerativeModelsforDiscoveringSparseDistributed Representations  GeoreyE.HintonandZoubinGhahramaniDepartmentofComputerScience UniversityofToronto Toronto,Ontario,M5S1A4,Canada hinton@cs.toronto.edu,zoubin@cs.toronto.edu May9,1997  Amodiedversiontoappearin  PhilosophicalTransactionsoftheRoyalSociety  B  ,1997. Abstract Wedescribeahierarchical,generativemodelthatcanbeviewedasanon-lineargener-alizationoffactoranalysisandcanbeimplementedinaneuralnetwork.Themodelusesbottom-up,top-downandlateralconnectionstoperformBayesianperceptualinferencecor-rectly.Onceperceptualinferencehasbeenperformedtheconnectionstrengthscanbeupdated usingaverysimplelearningrulethatonlyrequireslocallyavailableinformation.Wedemon-stratethatthenetworklearnstoextractsparse,distributed,hierarchicalrepresentations. 1Introduction  Manyneuralnetworkmodelsofvisualperceptionassumethatthesensoryinputarrivesatthebottom,visiblelayerofthenetworkandisthenconvertedbyfeedforwardconnectionsintosuc-cessivelymoreabstractrepresentationsinsuccessivehiddenlayers.Suchmodelsarebiologically unrealisticbecausetheydonotallowfortop-downeectswhenperceivingnoisyorambiguousdata (Mumford,1994;Gregory,1970)andtheydonotexplaintheprevalenceoftop-downconnectionsincortex.Inthispaper,wetakeseriouslytheideathatvisionisinversegraphics(Horn,1977)andso westartwithastochastic, generative  neuralnetworkthatusestop-downconnectionstoconvertanabstractrepresentationofasceneintoanintensityimage.Thisneurallyinstantiatedgraphicsmodelislearnedandthetop-downconnectionstrengthscontainthenetwork'svisualknowledgeoftheworld.Visualperceptionconsistsofinferringtheunderlyingstateofthestochasticgraphicsmodelusingthefalsebutusefulassumptionthattheobservedsensoryinputwasgeneratedbythemodel.Sincethetop-downgraphicsmodelisstochasticthereareusuallymanydierentstatesofthehiddenunitsthatcouldhavegeneratedthesameimage,thoughsomeofthesehiddenstatecongurationsaretypicallymuchmoreprobablethanothers.Forthesimplestgenerativemodels,1  itistractabletorepresenttheentireposteriorprobabilitydistributionoverhiddencongurationsthatresultsfromobservinganimage.Formorecomplexmodels,weshallhavetobecontentwith aperceptualinferenceprocessthatpicksoneorafewcongurationsroughlyaccordingtotheirposteriorprobabilities(HintonandSejnowski,1983).Oneadvantageofstartingwithagenerativemodelisthatitprovidesanaturalspecicationofwhatvisualperceptionoughttodo.Forexample,itspeciesexactlyhowtop-downexpectationsshouldbeusedtodisambiguatenoisydatawithoutundulydistortingreality.Anotheradvantageisthatitprovidesasensibleobjectivefunctionforunsupervisedlearning.Learningcanbeviewedasmaximizingthelikelihoodoftheobserveddataunderthegenerativemodel.Thisismathematically equivalenttodiscoveringecientwaysofcodingthesensorydata,becausethedatacouldbecommunicatedtoareceiverbysendingtheunderlyingstatesofthegenerativemodelandthisisanecientcodeifandonlyifthegenerativemodelassignshighprobabilitytothesensorydata.Inthispaperwepresentasequenceofprogressivelymoresophisticatedgenerativemodels.Foreachmodel,theproceduresforperformingperceptualinferenceandforlearningthetop-downweightsfollownaturallyfromthegenerativemodelitself.Westartwithtwoverysimplemodels,factoranalysisandmixturesofGaussians,thatwererstdevelopedbystatisticians.Manyoftheexistingmodelsofhowcortexlearnsareactuallyevensimplerversionsofthesestatisticalapproachesinwhichcertainvarianceshavebeensettozero.WeexplainfactoranalysisandmixturesofGaussiansinsomedetail.Toclarifytherelationshipsbetweenthesestatisticalmethodsandneuralnetworkmodels,wedescribethestatisticalmethodsasneuralnetworksthatcanbothgeneratedatausingtop-downconnectionsandperformperceptualinterpretationofobserveddatausingbottom-upconnections.Wethendescribeahistoricalsequenceofmoresophisticatedhierarchical,non-lineargenerativemodelsandthelearningalgorithmsthatgowith them.Weconcludewithanewmodel,therectiedGaussianbeliefnet,andpresentexampleswhereitisveryeectiveatdiscoveringhierarchicalsparsedistributedrepresentationsofthetypeadvocatedbyBarlow(1989)andOlshausenandField(1996).Thenewmodelmakesstrong suggestionsabouttheroleofbothtop-downandlateralconnectionsincortexanditalsosuggestswhytopographicmapsaresoprevalent. 2MixturesofGaussians  AmixtureofGaussiansisamodelthatdescribessomerealdatapointsintermsofunderlying Gaussianclusters.Therearethreeaspectsofthismodelwhichweshalldiscuss.First,given parametersthatspecifythemeans,variancesandmixingproportionsoftheclusters,themodeldenesa  generativedistribution  whichassignsaprobabilitytoanypossibledatapoint.Second,giventheparametervaluesandadatapoint,theperceptualinterpretationprocessinferstheposteriorprobabilitythatthedatacamefromeachoftheclusters.Third,givenasetofobserved datapoints,thelearningprocessadjuststheparametervaluestomaximizetheprobabilitythatthegenerativemodelwouldproducetheobserveddata.Viewedasaneuralnetwork,amixtureofGaussiansconsistsofalayerofvisibleunitswhosestatevectorrepresentsadatapointandalayerofhiddenunitseachofwhichrepresentsacluster(seegure1).Togenerateadatapointwerstpickoneofthehiddenunits, j ,withaprobability    j andgiveitastate s j =1.Allotherhiddenstatesaresetto0.Thegenerativeweightvectorofthehiddenunit, g  j ,representsthemeanofaGaussiancluster.Whenunit j isactivateditsends2  atop-downinputof g ji toeachvisibleunit, i .Local,zero-mean,Gaussiannoisewithvariance   2 i isaddedtothetop-downinputtoproduceasamplefromanaxis-alignedGaussianthathasmean  g  j andacovariancematrixthathasthe   2 i termsalongthediagonalandzeroelsewhere.Theprobabilityofgeneratingaparticularvectorofvisiblestates, d  withelements d  i ,istherefore: p  ( d  )=  X  j   j Y  i 1 p  2   i e   ( d i   g ji ) 2 = 2  2 i (1)  ji g  ji hidden unitsvisible units Figure1:AgenerativeneuralnetworkformixturesofGaussians.Interpretingadatapoint, d  ,consistsofcomputingtheposteriorprobabilitythatitwasgen-eratedfromeachofthehiddenunits,assumingthatitmusthavecomefromoneofthem.Each hiddenunit, j ,rstcomputestheprobabilitydensityofthedatapointunderitsGaussianmodel: p  ( d  j s j =1)=  Y  i 1 p  2   i e   ( d i   g ji ) 2 = 2  2 i (2)Theseconditionalprobabilitiesarethenweightedbythemixingproportions,   j ,andnormal-izedtogivetheposteriorprobabilityor responsibility  ofeachhiddenunit, j ,forthedatapoint.ByBayestheorem: p  ( s j =1 j d  )=    j p  ( d  j s j =1) P  k   k p  ( d  j s k =1)(3)Thecomputationof p  ( d  j s j =1)inEq.2canbedoneverysimplybyusing  recognition  connections, r ij ,fromthevisibletothehiddenunits.Therecognitionconnectionsaresetequaltothegenerativeconnections, r ij =  g ji .ThenormalizationinEq.3couldbedonebyusing directlateralconnectionsorinterneuronstoensurethatthetotalactivityinthehiddenlayerisa constant.Learningconsistsofadjustingthegenerativeparameters g  ,   ,   soastomaximizetheproductoftheprobabilitiesassignedtoalltheobserveddatapointsbyEq.1.Anecientwaytoperform thelearningistosweepthroughalltheobserveddatapointscomputing  p  ( s j =1 j d  )foreach hiddenunitandthentoresetallthegenerativeparametersinparallel.Anglebracketsareused todenoteaveragesoverthetrainingdata. g  j ( new  )=  h p  ( s j =1 j d  ) d  i = h p  ( s j =1 j d  ) i (4)   2 i ( new  )=  * X  j p  ( s j =1 j d  )( d  i   g ji ) 2 +  (5)   j ( new  )=  h p  ( s j =1 j d  ) i (6)3  Thisisaversionofthe\ExpectationandMaximization algorithm(Dempsteretal.,1977)and isguaranteedtoraisethelikelihoodoftheobserveddataunlessitisalreadyatalocaloptimum.Thecomputationoftheposteriorprobabilitiesofthehiddenstatesgiventhedata( i.e. perceptualinference)iscalledtheE-stepandtheupdatingoftheparametersiscalledtheM-step.InsteadofperforminganM-stepafterafullsweepthroughthedataitispossibletousean onlinegradientalgorithmthatusesthesameposteriorprobabilitiesofhiddenstatesbutupdateseachgenerativeweightusingaversionofthedeltarulewithalearningrateof  :  g ji =  p  ( s j =1 j d  )( d  i   g ji )(7)Thek-meansalgorithm(aformofvectorquantization)isthelimitingcaseofamixtureofGaussiansmodelwherethevariancesareassumedequalandinnitesimalandthe   j areassumed equal.UndertheseassumptionstheposteriorprobabilitiesinEq.3gotobinaryvalueswith  p  ( s j =1 j d  )=1fortheGaussianwhosemeanisclosestto  d  and0otherwise.Competitivelearningalgorithms( e.g. RumelhartandZipser,1985)cangenerallybeviewedaswaysoftting mixtureofGaussiansgenerativemodels.Theyareusuallyinecientbecausetheydonotusea fullM-stepandslightlywrongbecausetheypickasinglewinneramongthehiddenunitsinstead ofmakingthestatesproportionaltotheposteriorprobabilities.Kohonen'sself-organizingmaps(Kohonen,1982),DurbinandWillshaw'selasticnet(1987),andthegenerativetopographicmap(Bishop  etal. ,InPress)arevariationsofvectorquantization ormixtureofGaussianmodelsinwhichadditionalconstraintsareimposedthatforceneighboring hiddenunitstohavesimilargenerativeweightvectors.TheseconstraintstypicallyleadtoamodelofthedatathatisworsewhenmeasuredbyEq.1.Sointhesemodels,topographicmapsarenotanaturalconsequenceoftryingtomaximizethelikelihoodofthedata.Theyareimposedon themixturemodeltomakethesolutioneasiertointerpretandmorebrain-like.Bycontrast,thealgorithmwepresentlaterhastoproducetopographicmapstomaximizethedatalikelihoodina sparselyconnectednet.Becausetherecognitionweightsarejustthetransposeofthegenerativeweightsandbecausemanyresearchersdonotthinkintermsofgenerativemodels,neuralnetworkmodelsthatperform competitivelearningtypicallyonlyhavetherecognitionweightsrequiredforperceptualinference.Theweightsarelearnedbyapplyingtherulethatisappropriateforthegenerativeweights.Thismakesthemodelmuchsimplertoimplementbuthardertounderstand.Neuralnetmodelsofunsupervisedlearningthatarederivedfrommixtureshavesimplelearn-ingrulesandproducerepresentationsthatareahighlynon-linearfunctionofthedata,butthey suerfromadisastrousweaknessintheirrepresentationalabilities.Eachdatapointisrepresented bytheidentityofthewinninghiddenunit( i.e. theclusteritbelongsto).Sofortherepresentation tocontain,onaverage, n  bitsofinformationaboutthedata,theremustbeatleast2 n hidden units. 1 1 Thispointisoftenobscuredbythefactthattheposteriordistributionisavectorofreal-valuedstatesacrossthehiddenunits.ThisvectorcontainsalotofinformationaboutthedataandsupervisedRadialBasisFunctionnetworksmakeuseofthisrichinformation.However,fromthegenerativeorcodingviewpoint,theposteriordistributionmustbeviewedasaprobabilitydistributionacrossdiscreteimpoverishedrepresentations,notareal-valuedrepresentation. 4 
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks