A study of two graph algorithms in topic-driven summarization

A study of two graph algorithms in topic-driven summarization
of 4
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Workshop on TextGraphs, at HLT-NAACL 2006  , pages 29–32,New York City, June 2006. c  2006 Association for Computational Linguistics A Study of Two Graph Algorithms in Topic-driven Summarization Vivi Nastase 1 and  Stan Szpakowicz 1 , 21 School of Information Technology and Engineering,University of Ottawa, Ottawa, Canada 2 Institute of Computer Science,Polish Academy of Sciences, Warsaw, Poland { vnastase, szpak } Abstract We study how two graph algorithms ap-ply to topic-driven summarization in thescope of Document Understanding Con-ferences. The DUC 2005 and 2006 taskswere to summarize into 250 words a col-lection of documents on a topic consist-ing of a few statements or questions.Our algorithms select sentences for ex-traction. We measure their performanceon the DUC2005 test data, using the Sum-mary Content Units made available afterthe challenge. One algorithm matches agraph representing the entire topic againsteach sentence in the collection. Theother algorithm checks, for pairs of open-class words in the topic, whether they canbe connected in the syntactic graph of each sentence. Matching performs bet-ter than connecting words, but a combi-nation of both methods works best. Theyalso both favour longer sentences, whichmakes summaries more fluent. 1 Introduction The DUC 2005 and 2006 summarization challengeswere motivated by the desire to make summariza-tion relevant to real users. The task was focussed byspecifying an information need as a  topic : one or afew statements or questions (Dang, 2005). Systemsusually employ such data as a source of key wordsor phrases which then help rank document sentencesby relevance to the topic.We explore other information that can be ex-tracted from a topic description. In particular, welook at connections between open-class words. Adependency parser, MiniPar (Lin, 1998), builds adependency relation graph for each sentence. Weapply such graphs in two ways. We match a graphthat covers the entire topic description against thegraph for each sentence in the collection. We alsoextract all pairs of open-class words from the topicdescription, and check whether they are connectedin the sentence graphs. Both methods let us rank sentences; the top-ranking ones go into a summaryof at most 250 words. We evaluate the summarieswith the summary content units (SCU) data madeavailable after DUC 2005 (Nenkova and Passon-neau, 2004; Copeck and Szpakowicz, 2005). Theexperiments show that using more information than just keywords leads to summaries with more SCUs(total and unique) and higher SCU weight.We present related work in section 2, and the dataand the representation we work with in section 3.Section 4 shows the algorithms in more detail. Wedescribe the experiments and their results in section5, and draw a few conclusions in section 6. 2 Related work Erkan and Radev (2004), Mihalcea (2004), Mihal-cea and Tarau (2004) introduced graph methodsfor summarization, word sense disambiguation andother NLP applications.The summarization graph-based systems imple-ment a form of sentence ranking, based on the ideaof prestige or centrality in social networks. In thiscase the network consists of sentences, and signifi-cantly similar sentences are interconnected. Variousmeasures (such as node degree) help find the most central  sentences, or to score each sentence.In topic-driven summarization, one or more sen-tences or questions describe an information needwhich the summaries must address. Previous sys-tems extracted key words or phrases from topicsand used them to focus the summary (Fisher et al.,2005).Our experiments show that there is more to topicsthan key words or phrases. We will experiment withusing grammatical dependency relations for the task of extractive summarization.In previous research, graph-matching using gram-matical relations was used to detect textual entail-ment (Haghighi et al., 2005). 3 Data 3.1 Topics We work with a list of topics from the test data inthe DUC 2005 challenge. A topic has an identifier,category (general/specific), title and a sequence of statements or questions, for example: 29  d307bspecificNew Hydroelectric ProjectsWhat hydroelectric projects are plannedor in progress and what problems areassociated with them? We apply MiniPar to the titles and contentsof the topics, and to all documents. The out-put is post-processed to produce dependency pairsonly for open-class words. The dependency pairsbypass prepositions and subordinators/coordinatorsbetween clauses, linking the corresponding open-class words. After post-processing, the topic willbe represented like this: QUESTION NUMBER: d307bLIST OF WORDS:associate, hydroelectric, in, plan,problem, progress, project, new, themLIST OF PAIRS:relation(project, hydroelectric)relation(project, new)relation(associate, problem)relation(plan, project)relation(in, progress)relation(associate, them) The parser does not always produce perfectparses. In this example itdid notassociate the phrase in progress  with the noun  projects , so we missed theconnection between  projects  and  progress .In the next step, we expand each open-class wordin the topic with all its  WordNet   synsets and one-stephypernyms and hyponyms. We have two variantsof the topic file: with all open-class words from thetopic description  Topics all , andonly withnouns andverbs  Topics NV    . 3.2 Documents For each topic, we summarize a collection of up to50 news items. In our experiments, we build a filewith all documents for a given topic, one sentenceper line, cleaned of XML tags. We process each filewith MiniPar, and post-process the output similarlyto the topics. For documents we keep the list of de-pendency relations but not a separate list of words.This processing also gives one file per topic, eachsentence followed by itslist of dependency relations. 3.3 Summary Content Units The DUC 2005 summary evaluation included ananalysis based on  Summary Content Units . SCUsare manually-selected topic-specific summary-worthy phrases which the summarization systemsare expected to include in their output (Nenkova andPassonneau, 2004; Copeck and Szpakowicz, 2005).The SCUs for 20 of the test topics became availableafter the challenge. We use the SCU data to mea-sure the performance of our graph-matching andpath-search algorithms: the total number, weightand number of unique SCUs per summary, andthe number of negative SCU sentences, explicitlymarked as not relevant to the summary. 4 Algorithms 4.1 Topic ↔ sentence graph matching (GM) We treat a sentence and a topic as graphs. The nodesare the open-class words in the sentence or topic (wealso refer to them as keywords), and the edges arethe dependency relations extracted from MiniPar’soutput. In order to maximize the matching score, wereplace a word  w S   in the sentence with  w Q  from thequery, if   w S   appears in the  WordNet   expansion of words in  w Q .To score a match between a sentence and a graph,we compute and then combine two partial scores: S  N   (node match score) the node (keyword) overlapbetween the two text units. A keyword countis equal to the number of dependency pairs itappears with in the document sentence; S  E   (edge match score) the edge (dependency rela-tion) overlap.The overall score is  S   =  S  N   + WeightFactor  ∗  S  E  , where  WeightFactor  ∈{ 0 , 1 , 2 ,..., 15 , 20 , 50 , 100 } . Varying the weightfactor allows us to find various combinationsof node and edge score matches which work best for sentence extraction in summarization.When  WeightFactor  = 0 , the sentence scorescorrespond to keyword counts. 4.2 Path search for topic keyword pairs (PS) Here too we look at sentences as graphs. We onlytake the list of words from the topic representation.For each pair of those words, we check whether theyboth appear in the sentence and are connected inthe sentence graph. We use the list of   WordNet  -expanded terms again, to maximize matching. Thefinal score for the sentence has two components:the node-match score  S  N  , and  S  P  , the number of word pairs from the topic description connected bya path in the sentence graph. The final score is S   =  S  N  + WeightFactor ∗ S  P  .  WeightFactor , inthe same range as previously, is meant to boost thecontribution of the path score towards the final scoreof the sentence. 5 Experiments and results We produce a summary for each topic and each ex-perimental configuration. We take the most highlyranked (complete) sentences for which the totalnumber of words does not exceed the 250-wordlimit. Next, we gather SCU data for each sentence ineach summary from the SCU information files. Fora specific experimental configuration – topic repre-sentation, graph algorithm – we produce summariesfor the 20 documents with the weight factor values 0 , 1 , 2 ,..., 15 , 20 , 50 , 100 . Each experimental con-figuration generates 19 sets of average results, one 30   10 11 12 13 14 15 16 17 18 0 5 10 15 20    A  v  e  r  a  g  e   (  p  o  s   i   t   i  v  e   )   S   C   U  w  e   i  g   h   t  s   /  s  u  m  m  a  r  y Pair/connections weight factorGraph match methodNVall 10 11 12 13 14 15 16 0 5 10 15 20    A  v  e  r  a  g  e   (  p  o  s   i   t   i  v  e   )   S   C   U  w  e   i  g   h   t  s   /  s  u  m  m  a  r  y Pair/connections weight factorPath search methodNVall Figure 1: Average SCU weights for graph matching(GM) and path search (PS) with different topic rep-resentationsper weight factor. For one weight factor, we gen-erate summaries for the 20 topics, and then averagetheir SCU statistics, including SCU weight, numberofunique SCUsand total number of SCUs. Inthe re-sults which follow we present average SCU weightper summary. The number of unique SCUs andthe number of SCUs closely follow the presentedgraphs. The overlap of SCUs (number of SCUs / number of unique SCUs) reaches a maximum of 1.09. There was no explicit redundancy elimination,mostly because the SCU overlap was so low.We compare the performance of the two algo-rithms, GM and PS, on the two topic representations– with all open-class words and only with nouns andverbs. Figure 1 shows the performance of the meth-ods in terms of average SCU weights per summaryfor each weight factor considered  1 .The results allow us to make several observations. •  Keyword-only match performs worse that ei-ther GM or PS. The points corresponding tokeyword (node) match only are the points forwhich the weight factor is 0. In this case thedependency pairs match and paths found in thegraph do not contribute to the overall score. •  Both graph algorithms achieve better perfor-mance for only the nouns and verbs from the 1 The summary statisticslevel off above a certain weight fac-tor, so we include only the non-flat part of the graph. topic than for all open-class words. If, how-ever, the topic requests entities or events withspecific properties, described by adjectives oradverbs, using only nouns and verbs may pro-duce worse results. •  GM performs better than PS for both types of topic descriptions. In other words, looking atthe same words that appear in the topic, con-nected in the same way, leads to better resultsthan finding pairs of words that are “somehow”connected. •  Higher performance for higher weight factorsfurther supports the point that looking for wordconnections, instead of isolated words, helpsfind sentences with information content morerelated to the topic.For the following set of experiments, we use thetopics with the word list containing only nouns andverbs. We want to compare graph matching andpath search further. One issue that comes to mind iswhether a combination of the two methods will per-form better than each of them individually. Figure 2plots the average of SCU weights per summary.  11 12 13 14 15 16 17 18 19 0 5 10 15 20    A  v  e  r  a  g  e   (  p  o  s   i   t   i  v  e   )   S   C   U  w  e   i  g   h   t  s   /  s  u  m  m  a  r  y Pair/connections weight factorGM & PSGMPS Figure 2: Graph matching, path search and theircombinationWe observe that the combination of graph match-ing and path search gives better results than ei-ther method alone. The sentence score com-bines the number of edges matched and the num-ber of connections found with equal weight fac-tors for the edge match and path score. Thisraises the question whether different weights forthe edge match and path would lead to betterscores. Figure 3 plots the results produced us-ing the score computation formula  S   =  S  N   + WeightFactor E   ∗  S  E   +  WeightFactor P   ∗  S  P  ,where both  WeightFactor E   and  WeightFactor P  are integers from 0 to 30.The lowest scores are for the weight factors 0,when sentence score depends only on the keywordscore. There is an increase in average SCU weights 31   0 5 10 15Pair weight factor 5 10 15 20 25 30Path weight factor 15.5 16 16.5 17 17.5 18 18.5 19 19.5Avg weight of SCUs Figure 3: Graph match and path search combinedwith different weight factorstowards higher values of weight factors. A transpar-ent view of the 3D graph shows that graph match hashigher peaks toward higher weight factors than pathsearch, and higher also than the situation when pathsearch and graph match have equal weights.The only sentences in the given documents taggedwith SCU information are those which appeared inthe summaries generated by the competing teamsin 2005. Our results are therefore actually a lowerbound – more of the sentences selected may includerelevant information. A manual analysis of the sum-maries generated using only keyword counts showedthat, for these summaries, the sentences not contain-ing SCUs were not informative. We cannot check this for all the summaries generated in these ex-periments, because the number is very large, above1000. Anaverage summary had 8.24 sentences, with3.19 sentences containing SCUs. We cannot saymuch about the sentences that do not contain SCUs.This may raise doubts about our results. Supportfor the fact that the results reflect a real increase inperformance comes from the weights of the SCUsadded: the average SCU weight increases from 2.5when keywords are used to 2.75 for path search al-gorithm, and 2.91 for graph match and the combina-tion of path search and graph match. This shows thatby increasing the weight of graph edges and pathsin the scoring of a sentence, the algorithm can pick more and better SCUs, SCUswhich more people seeas relevant to the topic. It would be certainly in-teresting to have a way of assessing the “SCU-less”sentences in the summary. We leave that for futurework, and possibly future developments in SCU an-notation. 6 Conclusions We have studied how two algorithms influence sum-marization by sentence extraction. They match thetopic description and sentences in a document. Theresults show that using connections between thewords in the topic description improves the accu-racy of sentence scoring compared to simple key-word match. Finding connections between querywords in a sentence depends on finding the corre-sponding words in the sentence. In our experiments,we have used one-step extension in  WordNet   (alongIS-A links) to find such correspondences. It is, how-ever, a limited solution, and better word matchesshould be attempted, such as for example word sim-ilarity scores in  WordNet  .In summarization by sentence extraction, otherscores affect sentence ranking, for example positionin the document and paragraph or proximity to otherhigh-ranked sentences. We have analyzed the effectofconnections inisolation, toreduce the influence of other factors. A summarization system would com-bine all these scores, and possibly produce better re-sults. Word connections or pairs could also be used just as keywords were, as part of a feature descrip-tion of documents, to be automatically ranked usingmachine learning. References Terry Copeck and Stan Szpakow-icz. 2005. Leveraging pyramids. Trang Dang. 2005. Overview of DUC 2005.¨unes¸ Erkan and Dragomir Radev. 2004. LexRank: Graph-based centrality as salience in text summarization.  Journalof Artificial Intelligence Research , (22).Seeger Fisher, Brian Roark, Jianji Yang, and BillHersh. 2005. Ogi/ohsu baseline query-directedmulti-document summarization system for duc-2005. Haghighi, Andrew Ng, and Christopher Manning. 2005.Robust textual inference via graph matching. In  Proc. of  HLT-EMNLP 2005 , Vancouver, BC, Canada.Dekang Lin. 1998. Dependency-based evaluation of MiniPar.In  Workshop on the Evaluation of Parsing Systems , Granada,Spain.Rada Mihalcea and Paul Tarau. 2004. Textrank: Bringing orderinto texts. In  Proc. of EMNLP 2004 , Barcelona, Spain.Rada Mihalcea. 2004. Graph-based ranking algorithmsfor sen-tence extraction, applied to text summarization. In  Proc. of  ACL 2004 , Barcelona, Spain.Ani Nenkova and Rebecca Passonneau. 2004. Evaluating con-tent selection in summarization: the pyramid method. In Proc. of NAACL-HLT 2004 . 32
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks