Context based user ranking in forums for expert finding using WordNet dictionary and social network analysis

Context based user ranking in forums for expert finding using WordNet dictionary and social network analysis
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Context based user ranking in forums for expert finding usingWordNet dictionary and social network analysis Amin Omidvar  • Mehdi Garakani  • Hamid R. Safarpour Published online: 1 December 2013   Springer Science+Business Media New York 2013 Abstract  Currently, online forums have become one of the most popular collaborative tools on the Internet wherepeople are free to express their opinions. Forums supplyfacilities for knowledge management in which, theirmembers can share their knowledge with each other. In thisregard, The main problem regarding to the knowledgesharing on forums is the extensive amount of data on themwithout any mechanism to determine their validity. So, forknowledge seekers, knowing the expertise level of eachmember in a specific context is important in order to findvalid answers. In this research, a novel algorithm is pro-posed to determine people’s expertise level based on thecontext. AskMe forum is chosen for the evaluation processof the proposed method and its data has been processed inseveral stages. First of all, a special crawling program isdeveloped to gather data from AskMe forum. Then, rawdata is extracted, transformed, and loaded into a designeddatabase using SQL server integration services. After-wards, people’s expertise level for specified context iscalculated by applying the proposed method on the pro-cessed data. Finally, evaluation tests are applied in order tocalculate the accuracy of the proposed method and com-pare it with other methods. Keywords  Algorithms    Expert finding    Onlineforums    Link analysis    WordNet dictionary 1 Introduction Expertise sharing through Internet applications is consid-ered as the next step of knowledge management for orga-nizations by many scholars. Since available knowledgeresources and expertise are limited in organizations, thedemands for seeking knowledge from external sources suchas Internet are increasing. Nowadays, employees often seek knowledge from Internet applications for problem solving,especially in industries where finding the best solution ischallenging [1, 2]. Some of these applications like forums play an important role in knowledge sharing among theirmembers.Forumsareaspecialenvironmentwherepeoplearefreetoexpress their ideas by posting questions and answers. Someof online forums’ attributes like ease of use, usefulness,social influence, and ease of communication have causedthemtobewelcomedbymanyinternetusersandbecomeoneofthemostpopularandusefulwebapplications.Peoplehelpeach other in forums because of many reasons like reputa-tion-enhancementbenefits,directlearningbenefits,expectedreciprocity and altruism. Many popular companies such asMicrosoft, Oracle, TurboTax, Dell, Amazon, IBM, Yahooand others run forum for both customers and employees forknowledge sharing and technical support. Based on the lit-erature, an effective knowledge management system shouldprovide not only documented knowledge, but also expertswho can do social or given organizational task and havevaluable knowledge [3]. A user would be recognized as anexpert in a particular subject if he or she has high level of knowledge in that area. A. Omidvar ( & )    M. GarakaniDepartment of Computer Engineering and IT,Amirkabir University of Technology, 424 Hafez Ave,15875-4413 Tehran, Irane-mail: Garakanie-mail: R. SafarpourDepartment of Finance and Economics, Southern IllinoisUniversity of Edwardsville, Edwardsville, IL, USAe-mail:  1 3 Inf Technol Manag (2014) 15:51–63DOI 10.1007/s10799-013-0173-x  Because of importance of forums as a tool for knowl-edge sharing and the necessity of finding valuable answersamong huge number of posts, lots of methods have beenproposed to address such needs. Moreover, some draw-backs such as the unstructured nature of posts and highvolume of shared data are outstanding challenges thatshould be approached.The first important drawback of forums is the responsetime gap. There is a significant time difference betweenexperts’ questions and newbies’ questions in regard to thetime allocated to answering the questions. Experts’ ques-tions take more time in order to get the first reply. It is morecommon that in a question and answer (QA) forum, thelikelihood of solving a problem suggested by an expert islower relative to a problem questioned by a newbie. Thecomplicated questions are often lost in the flood of easyones. By calculating the knowledge level of each member,the system could direct the questions to users that are likelyto answer them.The second problem relates to the recognizing the bestanswer among all received posts. Since forums are floodedwith extensive amount of data, there should be a mecha-nism in order to determine the validity of the sent answersin which the questioner could distinguish the valid answersamong the received replies. Most of forums have amechanism to determine users’ reputation which is usuallyshown with duke stars. For example in Oracle java forum,the members’ knowledge level is shown with duke stars.The higher number of user’s duke stars is the more level of knowledge he or she has. One disadvantage of the men-tioned method is that its validity depends on user’s judg-ment. Moreover, duke stars could not represent in whichfields the users have knowledge. For example, one personis an expert in servlet programming, but he or she is anewbie in mobile programming. Therefore, an automaticmethod to calculate people’s knowledge based on contextsin QA forums is required.1.1 MotivationIn order to solve the aforementioned problems, manyresearches have been conducted so far. These researchescan be classified in two categories which are link analysisand information retrieval methods. Methods in both cate-gories have some drawbacks which some of them arementioned in this section.In link analysis methods, the users’ social network isextracted from their sent posts and then each user’sauthority score is calculated. One of the major problemsregarding to link analysis methods is that they do not utilizethe content of the posts. Moreover, link analysis methodsare not context based which means they are unable torecognize the expertize level of users in the specific topic.In addition, link analysis methods could not detect unre-lated sent replies to the asked question (e.g. some posts sentfor advertisement purposes).There are a number of information retrieval methodswhich could capture the users’ expertise automatically.Information retrieval methods can be used to find expertsbecause texts contain terms that are relevant to the users’expertise areas. Although information retrieval methods areeffective, they cannot calculate each user’s influence intheir social network. Social cognitive researches haveshown that social influence has an important role in theperception of expertise [4]. Also, most of the proposedinformation retrieval methods are unable to recognizesynonyms or even words that are related to one contextsuch as Java and Pascal in which both are computer pro-gramming languages.Finally, most previous methods in both categories haveone common weakness that the user-specific informationneeds is neglected, whereas our proposed method could findexperts dynamically based on topic queries. Such problemsare the motivation to propose a novel expert finding tech-nique that is able to process unstructured textual data alongwith the relations between members. Finding a solution foraforementioned challenges is the primary goal of this study.1.2 ContributionFirst of all, the novelty of the proposed method is findingsemantically relevant posts employing text mining tech-niques and semantic similarity function provided byWordNet dictionary. This technique figures out the rele-vance of users’ post to the specified context. Anotherunique feature of the proposed method is that the socialnetwork of users in an online forum could be extracted andweighted according to the calculated similarity values.Also, by employing customized link analysis algorithm, therelative importance of each user will be calculatedaccording to the specified context. The proposed methodhas an important advantage in comparison with prior expertfinding methods due to the ability to recognize synonymwords by using WordNet dictionary. The experimentalresults evidently shows that proposed method outperformsother approaches that only employ link analysis or contextanalysis techniques alone. Moreover, the context basedexpert finding approach leads to a higher accuracy valuefor expert finding.This paper is organized as following. In Sect. ‘‘2’’, themost related works are reviewed. Then in Sect. ‘‘3’’, ourmethodology containing a step by step explanation of itsstages is presented. In Sect. ‘‘4’’, the accuracy of the pro-posed method is calculated and compared with othermethods and at last, our study is rounded off with a con-clusion in Sect. ‘‘5’’. 52 Inf Technol Manag (2014) 15:51–63  1 3  2 Literature review Since 15 years ago, Expert Finding has been one of the topissues, which many researchers pay attention to. In the past,most researches in the field of expert finding were con-ducted to find experts in organizations. Now, the expertfinding tendency is more inclined toward finding experts onthe Internet. So far, knowledge sharing environments likeforums are helpful tools for sharing knowledge andestablishing relations between members. Expert findingmethods categorized in two groups that are informationretrieval and link analysis.2.1 Expert finding based on link analysisLink analysis methods widely used to find the experts inonline forums. Expert finding through link analysis meth-ods comprises of constructing a user social network, andthen utilizing some kind of ranking algorithms to figure outusers’ authority.Link analysis algorithms such as HITS and PageRank were used in order to determine the level of expert’sknowledge. This work was carried out as a research projectto rank transferred emails between IBM’s employees. Theyfound that using link analysis algorithms can have betterresults in comparison with content analysis methods.However, their research had some drawbacks such as smallsize of their network which could not show the charac-teristics of knowledge relations in real online communities[5, 6]. In 2007, Ackerman conducted a research in order to findexperts in sun java forum. They pre-processed extractedposts to create members’ social network. By leveraging asimulation technique, they examined the effects of network structure on the accuracy of their proposed algorithm.Then, they discovered structural features which couldaffect the performance of expert finding algorithms [7].QuME engine was proposed to match questioners withresponders on sun java forum. QuME engine couldexamine members of sun java forum to find the bestanswerers for each posted question. This engine has notbeen evaluated yet. Therefore there is no evidence that itcould work properly in online forums. Also QuME enginealong with other expert finding algorithms just calculate themembers’ expertise in java programming language andthey could not determine users’ level of expertise in dif-ferent sub-areas of java concept map [8].Adamic along with her team members proposed a novelmodel in order to find the best answers in Yahoo Answers(YA) [9]. YA is an active forum with a great diversity of knowledge being shared. All categories of this forum werestudied properly and then categorized based on interactionpatterns and content properties which exist among itsmembers. While interactions in some categories resembleexpertise sharing online communities, others incorporateeveryday advice, discussion and support. With suchdiversity of categories in which members can take part,they discovered that some members focus narrowly onspecific topics while others participate across various cat-egories. The entropy of members’ activities was depicted intheir research. They discovered that lower entropy corre-lates with higher rating answers. Also they predicted whichanswer will be chosen as the best answer by combininguser attributes with reply characteristics [9].SNPageRank algorithm was proposed in order to findexperts on social networks. A star schema data warehousewas employed to store vast amount of data which weregathered from FriendFeed website. Finally, results werecompared to the experts’ opinions utilizing spearman’scorrelation function [10]. In another research, it was shownthat social influence is a key factor in order to makesolutions become more broadly accepted [11].Komeda et al. [12] concluded that ‘‘cognitively centralmembers of a community can provide social validation forother members’ knowledge, and that, concurrently, theirknowledge is confirmed by other members, leading to theperception of well-balanced knowledge or expertise in thefocal task domain’’. Other researches showed that network centrality has a positive relation to technical innovation andadministrative in organizations. Like formal authority, thehigher network centrality expresses a higher degree of control and access over sensitive data [13, 14]. 2.2 Expert finding based on information retrievalThere are a number of information retrieval methods whichcould capture the users’ expertise automatically. Alterna-tively, Balog et al. [15] presented two models based onprobabilistic language modelling techniques. A textualrepresentation of the users’ knowledge according to theassociateddocumentswasbuiltinthefirstmodel.Therefore,by employing this representation, the candidates could beranked accordingly. In the second model, all documentswere ranked according to a given context and then it wasdetermined whether a candidate is an expert or not based onthe associated documents. In 2010, a novel method wasproposed in order to rank members in sun java forum basedon their estimated knowledge which was represented with anumeric score between 0 and 1. The proposed method uti-lizes the java forum’s posts in order to implicitly create aknowledge model for each participant [16].Abel et al. [17] proposed a rule-based recommendersystem for online communities. Their system employscommon users’ posts and users’ rating score information toextract some rules for its recommendations. By utilizingcollaborative techniques, a novel recommender system was Inf Technol Manag (2014) 15:51–63 53  1 3  proposed by Castro-Herrera [18]. According to the users’contribution in threads, their profiles would be created andthen exploited to find the similarity among users. More-over, user’s interests would be mined using main keywordsof their posts. Moreover, a novel expert recommendersystem for online communities was proposed by Zibernaand Vehovar [19].Balog and Rijke [20] proposed a new method whichaims at finding similar experts instead of using an explicitdescription of the demanded expertise. The impact of richquery modelling along with non-local evidence on expertfinding systems is investigated in another research [21].However, in the research which was conducted by Liu et al.[22], a method based on language models is presentedwhich automatically finds experts in online communities.The method proposed in this work has been evaluated onlarge scale real data.Answer Garden system analyses threads and categorizesthem into ontology. By navigating the ontology tree, onecould find experts as the leaf nodes. Ontology may not beavailable and building it is a cumbersome task. Moreover,this system needs predefined experts which cannot bechanged easily [23, 24]. Expert–Expert-Locator (EEL) system was proposed byStreeter and Lochbaum [25] which could request fortechnical information. This system is capable to buildsemantic space of organizations and terms by utilizingstatistical matrix decomposition in order to find term-basedsemantic similarity in textual data. ContactFinder wasdeveloped by Krulwich and Burkey [26] in order to matchbulletin board members to people who have requiredknowledge to help them based on historical data.2.3 Context similarity algorithmsFor the context based expert finding algorithm in thisresearch, there is need for a method in order to figure outthe similarity between two contexts. So, a broad researchwas conducted about context similarity functions. Theresults of this research indicate that there are two categoriesof methods to compute the similarity between two contextswhich these categories along with their subcategories areillustrated in Fig. 1.The first category consists of algorithms which calculatethe similarity between the contexts just through examininggrammatical and lexical structures of them. One of themost famous algorithms in this category is Levenshteinalgorithm [27]. One of the major disadvantages of themethods in this category is their weakness in finding thesynonyms. For example, Levenshtein algorithm recognizesthat fridge and refrigerator are different subjects but theywould be used interchangeably and ones can assume thatthey are synonyms.Methods in second category use a dictionary to computethe similarity between different contexts. In contrast withthe algorithms in the first category, they consider thesemantic of contexts to determine the amount of similaritybetween them. The algorithms in this category are classi-fied in three sub-categories. Some of them employ Word-Net ontology to find the similarity between contexts suchas OSS function [28].Ontology tree is referred to an approach in which thecontexts are represented in a context ontology tree, and in ahierarchical structure. Semantic similarity techniqueswhich use WordNet ontology tree can be classified in threecategories: edge based, node based, and hybrid approaches.The simplest similarity measurement is the edge basedapproach. The distance of two concepts is calculatedthrough numerating the edges between them. Resnik [29]subtracted the path length between two concepts from themaximum possible path length in order to compute thesimilarity of two concepts. Also another popular edgebased approach was proposed by Chodorow and Leacock [30]. They scaled the shortest path between two conceptsaccording to the maximum depth of the hierarchicalontology.In node based approach, the similarity of two concepts isdefined as the ratio between the amounts of informationneeded to state the commonality of them [31]. In hybridapproach, the aforementioned approaches are combined.Jiang and Conrath [32] proposed a hybrid approach anddefined the edge strength between two concepts as thedifference of information between them.One of the limitations of the mentioned approaches isthat the ontology tree may be constructed unfairly. Inparticular, one branch of a node may be split generallywhile the other branch is split in more details. Methods insecond sub-category use text mining approach to find thesimilarity between words through their meaning in thedictionary.To overcome the mentioned limitations for the methodsin the first sub-category, in [33] a novel method is pre-sented which uses the WordNet English lexical referencesystem. Using this approach, words that come out veryfrequently in the concept’s explanation in the WordNet, Context similarity algorithms lexical approachDictionary based Grammatical and approachBased on the ontology Based on the meanings of the wordsHybrid approach Fig. 1  Context similarity algorithms54 Inf Technol Manag (2014) 15:51–63  1 3  have no significant weights, while words that are used onlyfor few times have more weights. This is due to the factthat the frequent words are unlikely to discriminate con-cepts sufficiently. Finally, the similarity between twoarbitrary contexts of an ontology tree, which are two dif-ferent nodes, is computed according to the value of theirweighted similarity distance. WordNet dictionary contains155,287 words organized in 117,659 synsets for a total of 206,941 word-sense pairs. This dictionary provides anonline dictionary which is constructed not only in alpha-betical order, but in more conceptual way showingsemantic relationships in terms of similar meanings, part of relations, and subsumption relations among concepts [34].Since WordNet dictionary includes a lot of words and theirmeanings, so the context similarity function which wasproposed in [33] could be a good method to cover signif-icant amount of words which are in the shared informationon online forums. The last sub-category consists of meth-ods which employ ontology and words meaning in dictio-naries together in order to calculate context similarity. 3 Methodology In Fig. 2, the framework of the proposed methodology forexpertfindinginforumsisdepicted.So,thissectionisdividedaccording to the phases of the proposed methodology.3.1 DatasetMatthew Haughey founded MetaFilter forum in 1999. Thissite was programmed by its founder using Microsoft SQLServer and Macomedia ColdFusion. Currently, this onlinecommunity has international popularity between Internetusers. People who are the members of MetaFilter can sendtheir posts to this site and others may then comment onthese posts and also readers can mark other user’s com-ments as they like them. In the early years, membership of MetaFilter was free but after the year 2004, signups werereopened with a 5 USD life-time fee.AskMe forum is the most successful subsite of Meta-Filter that was launched in 2003. In this forum, membersare permitted to send their posts to the online communitywithout the link requirement. AskMe rapidly grew to astrong side community with slightly different etiquetterequirements. Today, threads in this forum cover a broadspectrum of topics. This online community has variouscategories where people can post and comment on differenttopics. Furthermore, questions have some assigned tagswhich are related to the context of the question. Also thebest answer among all answers for each resolved questionin AskMe forum is chosen by the user who asked thatquestion [35].In this study, this forum is used as a dataset since this isa well-established online community that could haveaccumulated high volume of posting-replying data withrich social interactions embedded in them. A web crawleris developed in order to crawl all pages in AskMe forumfrom the establishment of this forum in 2003 to the end of year 2010, with approximately eight years of data.3.2 CrawlerA web crawler is computer software that utilized in order todownload the web pages associated with the given seedURLs and could recursively extract their hyperlinks. Web Fig. 2  Proposed methodology for context based user rankingInf Technol Manag (2014) 15:51–63 55  1 3


Oct 14, 2019
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!