BusinessLaw

Bertin's Graphics and Multidimensional Data Analysis

Description
Bertin's Graphics and Multidimensional Data Analysis
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  t zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC hapter NMLKJIHGFEDCB Bertin's GraphiesandMultidimensionalData Analysis Jean Hugues Chauchat andAlban Risson  Introduction TheobjectiveofthischapteristoshowhowBertin'sgraphiesareastraigluforwar.'undaccuraternethodfarcornrnunicatingtheresultsofsornemultidimensionalsiilii:;licalrnethodssuchasprincipalcomponentsanalysis,correspondenceanalysis,andctusteranalysis.Thesegraphiesremaintruetotheoriginaldata,usingonlypernlLd.mio'lsofrowsandcolumnsofthedatamatrix.Theic1eaofperrnutingtherowsandcolumnsofarnatrixl'orthepurposeofrevealinghiddenstructureinadatamatrixisanoldone:thepioneering wr.rl. W~ S done by SirW.M.FlindersPetriealmostacenturyago.Hewas 10okitJ :; rora sequenceinprehistoricremains, thatis,achronological seriation.'As noticc.d byArabie QPONMLKJIHGFEDC tal. (1978),Caraux(1984),andMarcotorchino(1987),thisideaislravinganincreasinginfluenceinappliedmathematics,especiallyinthebehavioral ;:cif:<tœs. Bertin(1967,1981)laidhistogramssidebysicle,usingan appt opriatescale,andpermutedtheelementstorevealunclerlyingstructuresinthedata.Weconsidertwotypesofstatisticalmethodsthalcanhelpus fo discover nividly thebestpairofpermutationsoftherowsandcolumnsofthetableamongtli( n >( pl possiblesolutions: (l) identificationofac1iagonalpatternwhen il exisis,for €Or ,upie, apredominantfactorincorrespondenceanalysisorprincipalcomponents ,ulal l'sis,and (2) classificationofrowsandcolumnsbyclusteranalysis. 37  38 QPONMLKJIHGFEDCBA h pter 3. Bertin sGraphiesandMultidimensionalDataAnalysis Thefirsttypeofsolution(diagonalpattern)isknownas seriation or ordi-nation, thesecondtypeas blockseriation or cliques. lnbothcases,Bertiri's graphie:; givcs aneasily understandablevisualrepresentationoftheresultsofthe st8ti::tir:al dataanalysis;eachbitofinformation,eachentry of thetableispresentedin srqponmkjhgfedcbaZYXWVUTSRQPONMLKJIHGFEDCBA l srcinal form, with onlytheorderoftherowsandcolumnschanged. Vire present il new exploratory methodthat integratesmultidimensionaldata ,II1<lly::;'; ,1 Il ri. grapllicalmethodsandisimplementedintbesoftwareprogramAMADO(lZi~;:oo etal. 1994).Thismethodologycanheappliedtoanymatrixconsistingof positive values: contingencytables, logical tablesrepresenting a responsepatternora i ,'i IJl.l, syrnrnerrictablesofsimilarities,andsoon. 2Uertin'sRules NMLKJIHGFEDCB fGraphicalSyntax Con.rarytoatable,withwhichtheaimistomakeeverycellavailabletothereader,agrap).sliouldbercadinaninstant-similaritiesanddifferencesshouldbeimmediatelyappmcnt.Lettherowsofatablebethehorizontaldimension(say X) andthecolumnsof thG tablebetheverticaldimension(sayY).Acolorvariationorshadinginlightiutcnsitymûinduceathirdvisibledimension(sayZ);thisthirddimensionisusedtorepresentthenurnericalvaluesofthedatatable. DHriug the i 960s,Bertinandhis team workedwithgroupsofwoodencubescovcrcIwill paperonwhichwereclrawnrectanglesfromhistograms;rows(or COI lllllS) wercthenmoved by handuntiladiagonalstructure,ora blockmodel, l'la: obtained.Later,theliseofnumericalmultivariatedescriptivestatisticalanalysismcthods(Lehart etal. 1984)replacedthispurelyvisualapproach. l .ooking fOl aunidirllensionaJordering,onemayusecorrespondence analysis HGFEDBA  CI\ 1.0 Iuulthe optimal ranking oftherowandcolumnvariables.Thefirstaxisofthe   ~I\ solutiongivesthenurnericalscalefortherowsandcolumnssothateachindividualmaybecharacterizedinascatterplotbythecoorclinatesoftheindividual's ~   1 ~   Tahk .:The(0/l)matrix,logictablethatrepresentsJanDeLeeuw's ueLA statistics pfI .:';f'ml graph - -'- ------- BorninBornoutMathematicsSociolog)'StatlsticsPsychologyEducationtheU.S.A.oftheU.S.A. Pcr;,:;,ltf,f'll 1 000010 Li l 00000 l Yl~;is;lkcr 1000010Be~ 1·~ 0 l 000 1 0 DGL,~ 'JW 0 0 1 000 l IV 'n' 0 l 000 1 0 i, - I1cl1ller () 00 1 001 Nh,théJl 0 () 00 l 0 l JC1Plrjr;h. 1 00 () 0 J 0 ~ ,; - .-_'~'_-_._--- 3,ASimpleExampleofBeriin sGraphies 39 categories   and   Thereare nij individualsatthesameposition,sothenumber ni; canbeusedasthethirddimensionZ.Thecorrelationcoefficientbctweenthetwoscaledrowandcolumnvariablesisthesquarerootoftheeigenvalueassociatedwiththisfirstprincipalaxis(Nishisato, 1980, chap.3; Tenenhaus andYoung,1985),alsocalledthecanonicalcorrelation(see,forexample,Greenacre,1993a,chap.  1 , Bertin'sgraphies(seeFigure 3) canbeseenasatypeofscatterplot:coordinatesfromCAbecorneranks,andtheareaofeachrectangleisproportionaltothenumberofobservations/caseswiththoseranks.Withthisinterpretation,thebestpermutationofrowsandcolumnswouldrnaxirnizetheSpearrnanrankcorrelationcoefficient.Lookingforablockseriation,onemayuseanyappropriateclusteranalysismethodonrowsand/orcolumns. 3A SimpleExampleof Bertin's Graphies lnChapter1ofthisbook,deLeeuwpresentsasmalldataselontheUCLAstatisticsdepartrnent,ThedataaregiveninTable1;Figure1showsthecorrespondingdisplayusingBertiri'sgraphies.Fromthisdisplayitiseasytoseethatbethsociologistsas r QPONMLKJIHGFEDCB _ U) o c-, >- ~ 0> t:: 0> 0 CI) 00 E '0 < > :~ '0 œ .c: ~  0   0 :§ U cr; c-, ::J 0 ' 0 co :::;;: o, ô) UJ IIIIII Il III Il - III Il BerkMasanFergusonJennrichYlviaskerLiSentier III DeLeeuw Il Muthen Figure1:Bertin'sgraphie from de Leeuws graph,  ;1·0 QPONMLKJIHGFEDCBA h pier 3. Bertin »GraphiesandMu1tidimensional HGFEDCBA I ta Analysis 'NdlasthrceofthefourmathematicianswerebornintheUnitedStates,whereasthe rOl'1 th mathcmaucian, thepsychologist, thestatistician, andtheec1ucatorwerebornq 'sid8 01' theUnitedStates. ~~:LivestockSlaughtered NMLKJIHGFEDCB nthe EuropeanConnnunity in 1995 Thedata in Table 2 wereobtainedfromtheEuropeanCo mm unity StatisticalOf- .lice (EUROSTAJ')in Luxembourg.Here,we consider the number oflivestock (in tl.IOW;<IIlds) slaughlerecl in 1995 inEECcountries. 4:.1CorrespondenceAnalysisandBertin'sGraphies Suchacontingencytablecaubedisplayedviacorrespondenceanalysis.ThefirstIq(;torialplaneisshowninFigure2.Thismapmightbehardtaread:manypeople'.vil .readthat  Adult Bovines are mostly found in Italy,becausethesetwopointsappearuearoneanotherOJl~heplot,orthattherearemorepigsinFinlandthaninDcnmarkbccausetheformeris closer to  Pigs than thelatter,Theseerroneousc<Jnchlsionsarequitecornmon. 1: Table2:LivestockslaughteredintheEuropeanCOn1ll1Lmity fi 1995 1000 animais) srqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDCB _----_. __   Austria BclgillmDcnmarkFinland France Germ<lnyGreece Heifcrs 6967575157767431 :\.lu Il bovines 53371170338239684251235 \':;,Ivci; 13033655 10 2042 SOI 80 ;)igs 4954l129419873206624859393532268 :;hecp 280226974769620577712 C>.l.]JriJ\cs 00011058124819 r. Ireland ltaly   Portugal Spain Sweden UK Ikifers 4157558485359152940.. .'\uult bovines 151434lll.18132519655013266 ()dvcs 0 l321119871253026  )-igs :100211992l8616420927539374314376 :';l\.ccp L\·298 796062610832008518919311 (;>tpriJJcs 0 483l7205U391030 .. _------------ ---- ')f)[.rce: EUROSTAT 4.LivestoekSlaughteredintheEuropeanCommuniiuin 1995 41 1,.2 = 0.11 (25.9%)  .5 +Hejfers +Ireland+UnitedKingdom .·Sheeps +AdultBovines +Italy+Spain 11 t--- 5-_ ---j-_ ;. = °3 91(65.7 %) .1. -France +Finland + Pigs +Sweden il.. -. Germ~ny +Calves +PogaG.~e Caln, es~ +Austna __+Belg.um_-Netherlands•+Oenmark Figure2: The European Communityslivestock correspondence.uialysis. Such~rrorsareimpossiblewithgraphiesinFigures3and4.Figure3 dcpicts therawdataafterrowsandcolumnshavebeenpermutedwithrespecttotheirorcieronthefirst CA axisThesc.\ler:I--A~r_l:OO'I'fI--aREI~asîerrret)ltfllltesDenmark,Belgium, Netherlands, Germany,Finland,Sweden,and Austria), wherepigs andbeef arethemostimportantproducts,areopposedtothewesternandsouthern countries(Portugal, France,Italy,Spain,Ireland,UnitedKingdorn,andGreece),whichproducesheepandgoatsratherthanpigs.lnFigure4therowandcolumnreorderingismaintainedbutthe conditional distributionsareshawn:firstofcountries,givenspecies (i .e.,rowprofiles),andsecondofspecies,givencountries(i.e.,columnprofiles).Figure4ashowsthatthelargerpartofpigsproducedintheEuropeanCornrnunitycaillesfromGermanyandthcnfromSpain,France,TheNetherlands,andsaou,whereasFigure4bshowsthatthecountriesinwhichpigsarethemainproduciare1)enmark,Belgium,TheNetherlands,Germany,andsoon.TheseBertin'sgraphiesrepresenttheoriginaldataperfectly,butcontrarytacorrespondenceanalysis,theyarelimitedintheirabilitytodisplaymorethanonefactoratatime.Figure5issimilartoFigure4 after permutationofrowsandcolumnswithrespecttothesecondprincipalaxis;beef-andsheep-producingcountriesareopposedtothoseproducingpigsorgoats.Thesegraphiesshowtheadditionalinfor-mationcarriedbythesecondmus,aswellasthepeculiarpositionof Greece and goals. 4.2ClusterAnalysisandBertin'sGraphies Usually,resultsfromhierarchicalclusteringaredepictedbya rree ;thetrceshowshowtheclusterswereforrnedbut it distortsthedistancesbetweentheclusteredrowsorcolumnsinto ultrarnetric distances.Moreover,thetreedoesnotgiveinformationonwhytworows,ortwocolurnns,werefoundtobe close or distant. Bertiu's   i? QPONMLKJIHGFEDCBA hapier zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC . Bertin sGraphiesandMultidùnensionalDataAnalysis IHGFEDC <n ID c;   <Il 0 <Il<Il ID v> œ Qj 0. c; œ Q)   ~ <n >  3 Q) 0 ro  0  iD sz: cu   (J <t: l Cf) Ü , •. ---- ~-_._-- DenmarkBelgium ~~~_N.L, .......,..,.,..lI lIIIR France_~_~_Italy -------~ ------- --=--SpainIreland ~~,... ,..-~~  • - U IR~r~,-- _U.I\. ~1IiII IID_ Greece FigllŒ 3:TheEuropeanCommunity'slivestockBertin'sgraphieafterreclassification'J,:cordingtutheranking of valuesonthefirstaxisofcorrespondenceanalysis. gl<1phicscaubeappliedtotheoriginaldata,whererowsorcolumnsareranked8ccording ID theirlocationinthetree.Clusteringofthecountriesisperformedby'.ymc 'smethod(Ward,1963)usingthechi-squareddistance(Greenacre,1988a;Jarubu,1989).Figure6showsthehierarchicalclusteri.ngtree,revealingthemainünograpbicandculturalensemblesofEurope:theBritishIsles,theRomanworld(Italy,France,Spain,Portugal),Greeceonitsown,andnortheastemEuropearoundGerniany,wheretheihreecountriesBelgium,Netherlands,andDenmarkstandout.Figure6b SllOWS theprofileforeachcountry,auditisnowapparentwhat iinks countriesorthesamec1usterandwhatseparatesthoseindifferentclusters.j\jorthe<lstemcountriesproducepigs,nogoats,andhardlyanysheep;largebovines 'lild. sheepareproclucedintheBritishIsles,butnogoatsorcalves,andsoon.Onesees il' 4:aConditionalFrequenciesofCountries, GivenLivestock, FirstAxisofCorrespondenccAnalysis. III ~   s 1 '§ ~ .~ ü ~ x c> _____Denmatk 1PI' IIU I1IR ~ Belgium _IB N.L. ________Germany ___________________•Finland____Sweden ___Auslria _______________R__ Portugal_____HI IIIII IIFrance QPONMLKJIHGFEDC ••••••••••_1III III1I IIII I I1I II IIII l ' Ilaly_______Spain ~I IBI~ Ireland  ;'__~U.K.   Greece Sheeps 4.bConditionalFrequcncicsofLivesrock, Givcn  oltlHries FirstAxisofCorrespondcncc Alla lysis. ~.§ c::.2l-l t3   ~ jJjjJj~lj~M Figure4:ConditionalfrequenciesBertin'sgraphieafterreclassificationaccordingtatherankingofvaluesonthefirstaxisofcorrespondenceanalysis. IIIIIIIIIII ~ nm_____~_  f lIlDlIIl.ll lmlil•••••IIIIiII_ Adull Bovines ____C71O__ -...... .,~_H1~_ Heifers _______1111111 S.aConditionalFrequcncicsofCountries, GivenLivestock: SecondAxis ofCorrespondcnceAnalysis. œ [ .~ ~~ cc :fi ~  5 1fi c l ~ t5 x _______Il 9ll IIIII Greeœ ______•__Oenmark __........,.N.L.___________Belgium __________'A.ustria __________________Portugal__••.•••••••••_~_II II IIIGermany______________________Swedeo _________________________Finland •••••_IÏIIIIIIIIFrance ___~Spain---....••fl.'lRilllI\IIII l IIllaly ____••••••••••U.K __~~ItmImRIlreland 5.bConditionalPrequcncicsofLivcstcck.GiW~1 onntrics. SecO/ldAxis ofCorrespondenecAualysis. jl~j'jJ'Jjlj~If~~ 1 EJ__1 lmR t....,..._ __._n~ur.I g,__ m_ 1 NMLKJIHGFEDCBA  .....0 CaprinesCalves n ____mIHll 1 Figure5:ConditionalfrequenciesBertin'sgraphieafterreclassificationaccordingtotherankingofvaluesonthesecondaxisofcorrespondenceanalysis. ___~_.r::=~__r::v'IrnHeilels   4 QPONMLKJIHGFEDCBA luipter zyxwvutsrqponmlkjihgfedcbaZYXWVUTSRQPONMLKJIHGFEDC . Beriin sGraphiesandMultidil1le/1siona/DataAnalysis 1   ~ QPONMLKJIHGFEDCBA  T/ l (PORTUGALGREEeEAUS1RIA GERMAtlYDErn 1ARK FRAtlCESPAINFINLfdW V/EOEIIN.l.BELGIUI.i U K I ~EI\I~U CIl ~~~~ c- @~; ~ .Y Q) 01 ro Q) OiJ-E o :::J 'C u E  i E :::J C t en Q) -.i c ;5>- CIl 0 :::J i' NMLKJIHGFEDCBA  1   Q) Qi ù:: D... .---t. t .,~I,;_~~ rlf .' 1~ I.ma  igs ~~..J:(p.,..l.'l&. ~ A ..Jl. o_..iWf Calves   rilli il~.~•.~ dultBovines J i.. ..J:(p.,~ A ki. Jl ... IHGFEDCBA .Q s, -llP_ Heilers v 1 s ::.,  i'~ ; AI 11 0 ~O 0,l,....1. ....0 r ;lA Jê~ MD ~I I _Q - A Sheeps  î J~   0%~ >~~; Jr Lq 0%0%  aprines Figure6:(a)Hierarchica1tree for EuropeanCorn mu nity countries.(b) ClusterBertin' sgrajJhle. 5.Conclusion 45 howthetwomethodscomplementeachother:clustersare tound  autoIllal.ically and give agoodclassificationofcountries,andBertin'sgraphiesassistintheinterpretationoftheclustersbyreturningtotheoriginaldata.  Conclusion Bertin's graphies provide a visual complement tothe solutions ofcorrcspondence andclusteranalyses.Datamatricesarerepresentedbyamauixofhistograms,ailonthesamescale,whererowsandcolumnsareoptimally permuted. ThispenlllllaliOIlisdefinedlnterrnsofeitherprogressivevariationorseriaiion,or by hornogcneous groupsdistinct from oneanother,or blockmodeling. These permutation criteria,whichBertindefineclempirically,aretheverycriteria of the multivariarc sratisticalmethods:thediagonalseriationcorrespondstothe maximum correlationperruutarionofrowsandcolumnslnCA,andtheblockcriteriacorrepondtothellol1\ogCl1eilyofgroups incluster analysis,forexarnple,Ward'sminimizationof intraclustcr variance.Here,we canquete Arabie etal., 1978): . lt is intuitively convincing thatrow-cohnnnpermutations of a rnatrix leavetherawdatafarmorechastethandodataanalysistechniquesrequiringapriorireplacementoraggregation,e.g.taking ranks, orre-placingsubsetsofthedatabyvarionssurnmarystatistics(...).Forthisreason,permutationmethodsareanimportantmernberofthe small but-growll1gfamllyof data analysis methods followingtheplliJosoph.y thut aggregation is 1:0 beinferredattheendoftheanalysis, not impos-xl :1[ thebeginning. SoftwareNote AMADO TheprogramAMApO is anImplementationofBertin's method. AMADOisdis-t·ributedIDWindowsandMacintoshversionsby CISlA (1av.Herbillon,9416USaint-Mandé,France).A user s guide(Risson etal., 1994)isavailahle in French, and roiTianandEhg1isllveT.'WJlswillbecorneavailabteinI1j9&- Acknowledgments We would liketothankJeanDumais,visitingprofesserillUniversitéLumière,l'or'bisassistance.
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x