Word Search

A statistically-based ontology matching tool

Description
Ontologies have become a popular means of knowledge sharing and reuse. This has motivated development of large independent ontologies within the same or different domains with some overlapping information among them. In order to match such large
Categories
Published
of 23
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Distrib Parallel DatabasesDOI 10.1007/s10619-017-7206-0 A statistically-based ontology matching tool Peter Ochieng 1 ·  Swaib Kyanda 1 © Springer Science+Business Media, LLC 2017 Abstract  Ontologies have become a popular means of knowledge sharing and reuse.This has motivated development of large independent ontologies within the same ordifferent domains with some overlapping information among them. In order to matchsuch large ontologies, automatic matchers become an inevitable solution. This work explores the use of a predictive statistical model to establish an alignment betweentwo input ontologies. We demonstrate how to integrate ontology partitioning andparallelismintheontologymatchingprocessinordertomakethestatisticalpredictivemodelscalabletolargeontologymatchingtasks.Unlikemostontologymatchingtoolswhichestablish1:1cardinalitymappings,ourstatisticalmodelgeneratesone-to-manycardinality mappings. Keywords  Ontology · Supervised  · Parallel · Matching  · Machine learning 1 Introduction Ontologiesplayakeyroleinsharingandreusingofknowledgeamongsoftwareagents[1]. Due to their importance, great numbers of ontologies have been developed witheach ontology describing a given domain from its subjective view. This has resultedin the existence of multiple ontologies in different or same domain with some level of heterogeneityamongthem.Toresolvethisheterogeneity,ontologymatchingisusuallyperformed in order to find correspondences between semantically related entities of  B  Peter Ochiengonexpeters@gmail.comSwaib Kyandakswaibk@gmail.com 1 Makerere University, Kampala, Uganda  1 3  Distrib Parallel Databases different ontologies [2]. Consequently, a number of tools have been developed toperform ontology matching [3,4]. However, the existing ontology matching tools still have a number of limitations which restrict their effectiveness during the ontologymatching process.1. Most tools still cannot scale to large ontology matching tasks. Ontology matchingprocesshasahighspaceandtimecomplexitywhichatoolhastoreduceinorderforittobescalable.Forexample,consideranontologymatchingtaskofmatchingtwoinput ontologies where each ontology has  n  entities. To match the two ontologiesin a Cartesian product fashion will result in a space complexity of   O ( n 2 )  ( i.ethe number of comparison computations between the entities pairs of source andtarget ontology increases quadratically with the number of entities in the twoontologies). Therefore, a large amount of memory is required to hold severalsimilarity values between the entities. If we consider the efficiency of a matcherand assume that it requires time  t   to perform a single similarity computationbetween two entities, the total time complexity of a single matcher is  O ( n 2 × t  ) .Foracasewherelargeontologiesarebeingmatched(i.e., n  islarge),thisallagainstall fashion of matching ontologies can be expensive in terms of execution time.Mostontologymatchingtoolsdonothavetechniques tohandlethehighspaceandtime complexities associated with matching large ontologies hence making themunable to handle large matching tasks.2. Among the few scalable state of the art tools such as YAM++ [5], LogMap family[6], AML [7] and XMAP++ [8], they still suffer from the following challenges: – They restrict the mappings they establish to only 1:1 cardinality equiva-lence mappings ignoring the fact that an entity from the source ontologymay be involved in multiple relationships with a number of entities fromthe target ontology. For instance, consider the task of matching two inputontologies where one ontology has the concepts  Year  ,  Month  and  Day while the other one has a concept  Date . Ontology matching tools perform-ing one-to-one equivalence mapping will establish only one of the followingmappings  { Year   ≡  Date }  or  {  Month  ≡  Date }  or  {  Day  ≡  Date } . How-ever, the exact mappings that should hold are  { Year   ⊑  Date } ,  {  Month  ⊑  Date }  and  {  Day  ⊑  Date } . Therefore, restricting mappings to only one-to-one equivalence mappings omits many potential mappings or generatewrongs equivalence relationships. Furthermore, implicit mappings such as i.e. { Year  ,  Month ,  Date } ≡ {  Date } cannot be established.– Theyachievescalabilityattheexpenseofrecall.Thesetoolsstillleaveoutmanyexpected mappings when matching two large ontologies. Figure 1 comparesthevalueofthebestaverageprecisionandaveragerecalloverthelast5(2012–2016)yearsinthelargebiomedicaltrackatOAEI.FromFig.1,theaveragebestprecision values in all the five years are above 90%. When it comes to recall,the highest attained average value was 79.1% in 2012. The comparatively lowvalue of recall shows that the ontology matching tools that are scalable stillleave out most expected mappings when matching large ontologies.Thisworkexplorestheuseofastatisticalpredictivemodeltoperformontologymatch-ing. We specifically exploit the model to make the following contributions:  1 3  Distrib Parallel Databases Fig. 1  Showing average top precision, recall and f-measure produced by ontology matching tool over thelast 5 years 1. Identify all the possible mappings that exist between the entities of the source andtarget ontologies. By doing this we seek to improve the recall of the alignmentproduced between any two input ontologies.2. We address the challenges of high space and time complexities associated withlarge ontology matching by applying Single Instruction Multiple Data (SIMD)parallelismimplementation[9]duringtheloadingofinputontologiesandmatchingof entities.3. We propose a statistical predictive model where the training data is independentof the input ontologies being matched.Our proposed statistical model generates an alignment between two input ontologiesthatcontainsallplausiblerelationshipsthatholdbetweenentitiesofthetwoontologies.The model exploits the use of historical available trusted mappings as the basis of predicting new mappings of two input ontologies. We allow the model to first predicta set of possible alignments between two input ontologies after which it can select themost plausible alignment from the initial set predicted. 1.1 Ontology matching formalization Ontologymatchingistheprocessofdetectingsemanticsimilaritiesbetweenentitiesinheterogeneousontologies.Giventwoontologies,thesource O 1  andthetargetontology O 2 , let  e s  represent an entity from O 1  and  e t   from O 2 , a correspondence between O 1 and O 2  is a triple  e s , e t  , r   with r being semantic relation that holds between the enti-ties e s  and e t  .ForinstanceinFig.2,  Document,Documents, ≡ showsanequivalencerelationship that exist between a concept  Document   ∈ O 1  and  Documents  ∈ O 2 . Insomecases,acorrespondencemayberepresentedasa4-tuplei.e.  e s , e t  , r  ,v  where v istheconfidencevalueattachedtotheestablishedrelationshiprbetweentheentities e s and  e t  . The example from Fig. 2 can now represented as   Document  ,  Documents , ≡ , 0 . 96  . An alignment between two ontologies  O 1  and  O 2  is a set of all correspon-dences between them. In Fig. 1 a possible alignment A considering only equivalence  1 3  Distrib Parallel Databases Fig. 2  Showing examples of input ontologies relationships would be:  A  = { Person , Agent , ≡ , 0 . 77  ,  Document , Documents , ≡ , 0 . 96  ,  WrittenBy , hasWritten , ≡ , 0 . 90  ,   PaperReview,Review, ≡ , 0 . 84 } .In the remaining sections of the paper, Sect. 2 discusses related work, Sect. 3 gives a detailed framework of our proposed statistical predictive model, Sect. 4 discusseshowthestatisticalpredictivemodelwasimplementedaspartoftheontologymatchingtool and finally Sect. 5 contains the evaluation of the entire ontology matching tool. 2 Related work There are a number of tools have been proposed to perform ontology matching. How-ever, most of them are restricted to only establishing 1:1 cardinality mappings or arenot tailored to handle large ontology matching tasks. AgreementMaker Light(AML)[7] is an ontology matching tool that establishes 1:1 equivalence mappings betweenentities of an ontology. It employs the use of   HashMap  data structure to tacklethe problem of space complexity in large ontology matching problem. It performscross-searches between different  HashMaps  hence limiting the search space to O ( n ) . DKP-AOM [10] also performs ontology matching by producing 1:1 cardi-nality mappings. To match large ontologies, it partitions each input ontology along o w l  :  disjointWith  axioms that are modeled in the ontology. This partitioning tech-nique is however ineffective since most ontologies are modeled without explicitlyincluding  o w l  :  disjointWith  axioms. DKP-AOM fails to produce partitions of anontology in case the ontology does not have  o w l  :  disjointWith  axioms or it pro-duceslargepartitionsofanontologyincasetheontologyhasfew o w l  :  disjointWith hence not significantly reducing the search space. This may explain why DKP-AOMfails to complete tasks in large biomedical ontologies track of OAEI. CLONA [11] isan ontology matching system that is limited to matching multilingual ontologies. Itaddresses quadratic complexity problem by using Lucene to develop an index of theconcepts, relationships, data types and instances in the input ontologies. It then inter-  1 3  Distrib Parallel Databases sects the source and target ontologies indexes to establish initial candidate mappingsof entities. EXONA [12] is an ontology matching system solely for instance match-ing. It uses indexing to reduce space complexity of associated with large ontologymatching tasks. It indexes instances of input ontologies using their URIs. It then doesa cross index query of source and target ontology index to establish initial candidatemappings. LogMap [13] is a tool that uses logic reasoning to refine initially estab-lished candidate mappings. To generates candidate mappings, it creates an invertedindex of the source and target ontologies by using the entities’ labels names. It thenintersects the inverted indexes of input ontologies to establish candidate mappings. Itthen uses repair strategies to refine the candidate mappings. LogMap currently havethree variants LogMaplt, LogMapC, and LogMapBio. YAM++ [5,14] is an ontology matching tool that exploits different strategies to perform ontology matching. Basedon user configuration, YAM++ can perform matching based on information retrievalor machine learning technique. In machine learning, a user provides a set of mappingswhich is used to train a classifier. This technique is restricted by the fact that for largeontology matching tasks a user may not generate enough data required to train a clas-sifier. Other ontology matching tools that exploit the use of machine learning include[15–18] and [19]. These tools however cannot scale to large ontology matching tasks and are restricted to only producing 1:1 mappings. 3 Statistically-based ontology matching framework 3.1 Building statistical model Inthissection,wepresentthestatisticalmodelwhichweusetoestablishanalignmentbetweentwoinputontologies.Toestablishanalignmentbetweentwoinputontologies,we utilize historical mappings between ontologies to learn and predict mappings ina new task of ontology matching. The new ontology matching task may involve twoentirelynewinputontologiesoroneoftheontologiesthathasbeenusedinthelearningprocess (i.e. prediction is independent of the ontologies used in the training of themodel). In order to effectively use machine learning to predict the mappings that existbetween the source and target ontologies of the new task, the matching process of theontologies must be broken down into key events which can be modeled into a soundstatisticalmodel.Theparameterizationi.e.thebreakingdownofthematchingprocessinto key events is done such that it possesses two key properties:1.  Discriminative power   the parameters (events) should contain all information nec-essary for disambiguation of a decision.2.  Compactness  the model should have as few parameters as possible. The numberof parameters used in the model has relation with the amount of training datarequired. Given the limited availability of training data, it will be beneficial for themodel to have as few parameters as possible.Therefore, the key questions that guide the statistical model creation are:1. What are the key parameters that can be used to model the ontology matchingprocess such that model is compact and it possess great discriminative power foreach decision it faces?  1 3
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks