Advertisement

A natural language processing based Internet agent

Description
Abstract Searching for useful information is a difficult job by the virtue of the information overloading problem. With technological advances, notably the World-Wide Web (WWW), it allows every ordinary information owner to offer information online
Categories
Published
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  TitleA Natural Language Processing Based Internet AgentAuthor(s)Yang, MH; Yang, CCC; Chung, YMCitationComputational Cybernetics and Simulation, IEEEInternational Conference on Systems, Man, andCybernetics Conference Proceedings, Orlando, Florida,USA, 12-15 October 1997, v. 1, p. 100-105Issue Date1997URLhttp://hdl.handle.net/10722/45588Rights ©1997 IEEE. Personal use of this material is permitted.However, permission to reprint/republish this material foradvertising or promotional purposes or for creating newcollective works for resale or redistribution to servers orlists, or to reuse any copyrighted component of this workin other works must be obtained from the IEEE.  A Natural Language Processing Based Internet Agent Ming-Hsuan Yang Christopher C. Yang Yi-Ming ChungBeckman Instit ut e Department of Computer Science NCSAThe University of Hong KongUrbana, IL 61801 Hong Kong Urbana, IL 61801University of IllinoisUniversityofIllinoisEmail:myangl@uiuc.eduEmail: yang@cs.hku.hk Email:ychung@ncsa.uiuc.edu  Abstract Searching for useful information is a difficult ob bythe virtue of information overloading problem. Withthe technological advances, notably World-Wide Web(WWW), it allows every ordinary information ownerto offer information on line for others to access andretrieve. However, it also makes up a global informa-tion system that is extremely large-scale, diverse anddynamic. Internet agents and Internet search engineshave been used to deal with such problems. But thesearch results are usually not quite relevant to whata user wants since most of them use simple keywordmatching.In this paper, we propose a natural language pro-cessing based agent (NIAGENT) that understands auser’s natural query. NIAGENT not only cooperateswith a meta Internet search engine in order to in-crease recall of web pages but also analyzes the con-tents of the referenced documents to increase preci- sion. Moreover, the proposed agent is autonomous,light-weighted, and multithreaded. The architecturaldesign also represents an interesting application of dis-tributed and cooperative computing paradigm. A pro-totype of NIAGENT, implemented in Java, shows itspromise to find useful information than keyword basedsearching. 1 Introduction With the phenomenal growth of Internet and WorldWide Web, the information overload problem is moreserious and inevitable. Although a lot of information is available on the Internet, t is usually difficult to findparticular pieces of information efficiently. To addressthis problem, several tools have been developed to helpsearch relevant information more effectively by eitherassisted browsing or keyword/phrase based searching.Assisted browsing, such as WebWathcer [6] and Syskill & Webert [lo], guides/suggests a user along an appro-priate path through the web based on its knowledgeof the user’s interests, of the location and relevance ofvarious items in the collection, and the way in whichothers have interacted with the collection in the past.Internet search engines, such as AltaVista and Lycos,sends out spiders or robots to index any visited webpages and allow keyword or phrase based search. Onecharacteristic of these approaches is they all rely oncentral sever to solve the problem. Also, these ap-proaches are either time consuming or the search re-sults are often not quite relevant to what a user wants.The agent concept can be used to simplify thesolution of large problems by distributing them tosome collaborating problem solving units. This dis-tributed problem solving paradigm is particularly suit-able for information retrieval on the web. In this pa-per, we focus on developing an intelligent and e%-cient search agents. All the major search engines usedifferent schemes to index web pages by using key-word or phrase, and support Boolean operations inkeyword or phrase search. The major problem withthese search engines is that many irrelevant pieces ofinformation are also returned (i.e. low precision ofextracted information) since they use unordered key-word and phrase as indices. In other words, a docu-ment is deemed as relevant to a query if all the phrasesare matched. But since different phrases can appearin the same text with any relationship between them,the recalled web pages by keyword based matching areusually uninteresting or irrelevant to a user’s query.Also, each search engine usually returns different doc-uments for the same query because they use differentranking algorithm in indexing, different indexing cy- cles, and different resources. It has been shown in [12] that users could miss 77% of the references theywould find most relevant by relying on a single searchengine. MetaCrawler [13] is a meta search engine de-signed to address these problems by aggregating Inter-net search services. In order to achieve high recall in 0-7803-4053-1/97/$10.00 1997 IEEE 100  retrieved documents, it is necessary to cooperate with as many Internet search engines as possible. However,the search results of MetaCrawler are usually not rel-evant to the query because of the problem with key-word matching. Therefore, it is important to increase precision in the retrieved information.We propose a Natural language processing basedInternet AGENT, NIAGENT, that understands natu- ral queries. It cooperates with an NLP agent at MITMedia Lab to understand the input query. Mean-ingful noun phrases are extracted out of the natu-ral sentence and formated as appropriate query tosearch engines such as MetaCrawler or other searchengines. NIAGENT fetches back the web pages,based on the recalled references from search engines,and then cooperates with PARAGENT (PARAgraphAnalysis AGENT) for analyzing their text contents tosift out irrelevant documents. By cooperating withMetaCrawler and PARAGENT, NIAGENT increasesnot only recall but also precision in retrieved docu-ments. Our experiments show that NIAGENT is notonly more user-friendly but more effective in searchingfor relevant and useful information. 2 NIAGENT We agree with the arguments of Lewis and Jonesin [7] that “All the evidence suggests that for end-user searching, the indexing language should be natu-ral language, rather than controlled language oriented . . . For interactive searching, the indexing languageshould be directly accessible by the user for requestformulation; users should not be required to expresstheir needs in a heavily controlled and highly artificialintelligence.’’ and “Evidence also suggests that com-bining single terms into compound terms, representingrelatively fixed complex concepts, may be useful . . .” In light of this trend, we propose an intelligent agentthat cooperates with other agents to understand thenatural query, extract meaningful noun phrases, andanalyze the search results. Natural query can alleviateusers from the pains and efforts to learn strict formatsin different Internet search engines. Analysis of thetext, based on noun phrases, helps in extracting rel-evant information with high precision. With recentresearch results in artificial intelligence and naturallanguage processing, mature technologies are ready tohelp in designing intelligent information filtering sys-tems.Figure 5 shows the architectural design of NIA- GENT. The design represents an example of dis-tributed and cooperative computing in that NIA-GENT cooperates with other agents to understand theuser’s interests and to search for relevant references. A user makes a natural query without the need to learndifferent formats in various Internet search engines.NIAGENT cooperates with Chopper, a natural lan-guage understanding agent, to figure out the interestsof the user by extracting meaningful phrases. Appro-priate queries to Internet search engines or spiders arethen made by NIAGENT. Based on the returned hy-perlinks from search engines, NIAGENT fetches backthe referred documents and ask PARAGENT to siftout irrelevant documents. Finally, the relevant docu-ments are returned to user. sentence /A Figure 1: NIAGENT architecture 2.1 Understanding a Natural Query In order to build an intelligent system as a soci-ety of interacting agents, each having their own spe-cific competence, to “do the right thing” [B], NI- AGENT cooperates with Chopper to understand auser’s natural query. Chopper, developed by the Ma-chine Understanding Group at MIT, is a natural lan-guage analysis engine that generates an analysis ofthe phrase structure and parts of speech in the in-put sentence. This parser consists of three parts: seg-mentation identifies individual words and some propernames, tagging determines part of speech informationusing a hand-coded probabilistic grammar; phrasingdetermines (sometimes overlapping) phrase bound-aries based on sentence tagging [4]. 101  Instead of learning to form a query based onBoolean operations and phrases for a specific Inter-net search engine, a user interested in how we can usesearch techniques of constraint satisfaction to developan intelligent agent can ask NIAGENT a question, innatural English, like “How can weuse constraint to de-velop intelligent agents?” NIAGENT cooperates withChopper to understand the interests of the user andthen extract the meaningful nouns phrases to form ap-propriate queries for other agents such as MetaCrawleror various Internet spiders [l]. Figure 2 shows theanalysis result of the natural query discussed above.Figure 2: Analysis results of MIT Chopper It has been shown in [2] that an occurrence of thenoun phrase in a document usually provides muchmore evidence than other phrases for the concept ofthe text. For the query described just described, onlythe noun phrase, L1constraint”and “agent”, capturethe concept of the natural query. Other phrases such as “how”, “can”, “we”, “use”, etc, do not carry usefulinformation in the query. Based on this analysis, NIA-GENT extract meaningful noun phrases from the an-alyzed results suggested by Chopper and understandsthe user’s natural query. 2.2 Cooperating with Internet Search En-gines The World Wide Web can be viewed as an infor-mation food chain where the maze of web pages andhyperlinks are at the very bottom of the chain [3]. Inthis analogy, the Internet search engines such as Al-taVista or Lycos are information herbivores that theygraze on web pages and regurgitate them as searchableindices. MetaCrawler is developed to be one of theinformation carnivores that hunt and feast on herbi-vores. NIAGENT is also on the top of the informationfood chain that it works with MetaCrawler in orderto intelligently hunt for useful information. Concept-contained noun phrases, generated by NIAGENT andChopper, are passed to MetaCrawler as directives tohunt for information herbivores. Finally, the caughtpreys are then forwarded back to NIAGENT. 2.3 Analyzing Web Pages Most search engines return a user with hyperlinksthat contain the queried phrases. However, a recalledweb page that has keywords in the text is not neces-sarily relevant to the query. It has been shown thatbasic compound phrases would not typically be furthercombined into frames, templates, or other structuredunits unless there is a syntactic or semantic relation-ship between them [7]. For example, a web page mighthave “constraint satisfaction problem” and “agent”in a list of research interests or different paragraphsabout problem solving techniques. Most Internet ser-vices would think, based on keyword matching, thisweb page is relevant to the user’s query discussed pre-viously since the key phases “constraint” and “agent”are matched. But the contents of the web page are ac-tually not related to the interests of the user. Figure 3 shows a paragraph of a web page that consists of alist of conferences including the queried phrases. 0 AGENT THEORIES, ARCHITECTURES, ANDLANGUAGES - Third International Workshop m. wooldridge@doc.mmu.ac.uk 0 Second Call for Papers, Constraint Programming96, August 19-22, 1996, Peter van Beek However, the description of this document is just anunordered set of phrases and individual words. There-fore, this web page should not be considered relevantto the user’s interest.The relevant information to be extracted is the rela-tionships between individual phrases. It is not enough 102  Figure 3: A web page with matched keywords(http:l/ www. dbai. tuwien.ac.at/ marchives1 sched-l/ in-dex.htm1)to identify isolated phrases, which can be done by a simple keyword search. In order to intelligently ex-tract relevant information from the World Wide Web,NIAGENT first fetches back the web pages recalledby the search engines and pass them to PARAGENTfor analyzing the syntactic relation of phrases. While a web page contains all the queried phrases is not nec-essarily relevant to a user’s interests, a web page isusually directly relevant to the query if those phrasesappear in the same logical segment. The rationalehere is that more complex structures for deeper un-derstanding of text is computationally expensive anddifficult whereas simple keyword match is not likely toprovide good precision in information retrieval.PARAGENT uses the page layout cues to divide a web page into coherent segments (usually paragraphs)as the first step in analyzing the contents. Then therelationships between noun phrases are analyzed byPARAGENT to determine the relevance of the webpage. Figure 4 shows an Web document that containsinformation interesting to the user. Note that all themeaningful noun phrases extracted by NIAGENT andChopper appear in the same paragraph.Figure 5 shows a prototype of NIAGENT based onour architectural design. A user can key in a natu-ral query and select a Internet search engine (such asFigure 4: A web page with matched keywords (http://www.ie.utoronto.ca/ EIL/profiles/ Chris/ cw-dai.abst ract . tml)MetaCrawler, AltaVista, etc). NIAGENT will showthe concept-contained keywords and send them as queries. The analyzed web pages that are relevant tothe input query are displayed in the text field. Somefeaturesofthe current implementation (in Java) aremultithreaded, light-weighted, and portable. 2.4 Experimental Results In order to compare the performance of NIAGENTwith other Internet search engines in terms of preci-sion, we conduct several experiments. For each exam-ple in the test set MetaCrawler returns 20 references(same amount of recall) and NIAGENT determinesthe precision based on the contents of the recalledweb pages. These web pages are then evaluated bytwo fellow colleagues to determine the relevance. Theresults in Table 1 summarizes the performance resultsof NIAGENT versus MetaCrawler.For the first test query, only 7 out of 20 web pagesrecalled by MetaCrawler are relevant to the user’s in-terest. On the other hand, NIAGENT achieves highprecision (90%) in this experiment. The result is notsurprising since NIAGENT takes one more step to an-alyze the contents of the referenced web pages ratherthan use simple keyword search. From our prelimi- 103
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks