Press Releases

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Description
An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition
Categories
Published
of 47
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  BMC BioinformaticsBMC Bioinformatics This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formattedPDF and full text (HTML) versions will be made available soon. An overview of the BIOASQ large-scale biomedical semantic indexing and questionanswering competition BMC Bioinformatics   Sampledoi:10.1186/s12859-015-0564-6 George Tsatsaronis (george.tsatsaronis@biotec.tu-dresden.de)Georgios Balikas (georgios.balikas@imag.fr)Prodromos Malakasiotis (rulller@gmail.com)Ioannis Partalas (ioannis.partalas@imag.fr)Matthias Zschunke (mzschunke@transinsight.com)Michael R Alvers (malvers@transinsight.com)Dirk Weissenborn (dirk.weissenborn@gmail.com) Anastasia Krithara (akrithara@iit.demokritos.gr)Sergios Petridis (eserxio@gmail.com)Dimitris Polychronopoulos (dpolychr@gmail.com)Yannis Almirantis (yalmir@bio.demokritos.gr)John Pavlopoulos (annis@aueb.gr)Nicolas Baskiotis (nicolas.baskiotis@lip6.fr)Patrick Gallinari (Patrick.Gallinari@lip6.fr)Thierry Artiéres (thierry.artieres@lip6.fr) Axel Ngonga (axel.ngonga@gmail.com)Norman Heino (heino@informatik.uni-leipzig.de)Eric Gaussier (eric.gaussier@imag.fr)Liliana Barrio-Alvers (lalvers@transinsight.com)Michael Schroeder (ms@biotec.tu-dresden.de)Ion Androutsopoulos (ion@aueb.gr)Georgios Paliouras (paliourg@iit.demokritos.gr) Sample ISSN 1471-2105 Article type Research article Submission date 19 February 2014 Acceptance date 31 March 2015 Article URL http://dx.doi.org/10.1186/s12859-015-0564-6For information about publishing your research in BioMed Central journals, go to http://www.biomedcentral.com/info/authors/ © 2015 Tsatsaronis et al. ; licensee BioMed Central.This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), whichpermits unrestricted use, distribution, and reproduction in any medium, provided the srcinal work is properly credited. The Creative Commons Public DomainDedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.    (2015) 16:138  An overview of the BioASQ large-scale biomedicalsemantic indexing and question answeringcompetition George Tsatsaronis 1 ∗ ∗ Corresponding author Email: george.tsatsaronis@biotec.tu-dresden.deGeorgios Balikas 5 Email: georgios.balikas@imag.fr Prodromos Malakasiotis 4 Email: rulller@gmail.comIoannis Partalas 7 Email: ioannis.partalas@imag.fr Matthias Zschunke 2 Email: mzschunke@transinsight.comMichael R Alvers 2 Email: malvers@transinsight.comDirk Weissenborn 1 Email: dirk.weissenborn@gmail.comAnastasia Krithara 3 Email: akrithara@iit.demokritos.gr Sergios Petridis 3 Email: eserxio@gmail.comDimitris Polychronopoulos 3 Email: dpolychr@gmail.comYannis Almirantis 3 Email: yalmir@bio.demokritos.gr John Pavlopoulos 4 Email: annis@aueb.gr  Nicolas Baskiotis 5 Email: nicolas.baskiotis@lip6.fr Patrick Gallinari 5 Email: Patrick.Gallinari@lip6.fr   Thierry Artiéres 5 Email: thierry.artieres@lip6.fr Axel Ngonga 6 Email: axel.ngonga@gmail.com Norman Heino 6 Email: heino@informatik.uni-leipzig.deEric Gaussier  7 Email: eric.gaussier@imag.fr Liliana Barrio-Alvers 2 Email: lalvers@transinsight.comMichael Schroeder  1 , 2 Email: ms@biotec.tu-dresden.deIon Androutsopoulos 4 Email: ion@aueb.gr Georgios Paliouras 3 Email: paliourg@iit.demokritos.gr  1 Biotechnology Center, TU Dresden, Tatzberg 47-49, 01307 Dresden, Germany 2 Transinsight GmbH, Tatzberg 47-49, 01307 Dresden, Germany 3  NCSR Demokritos, Ag. Paraskevi, 60228 Athens, Greece 4 Athens University of Economics and Business, Patission 76, 10434 Athens, Greece 5 Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, 75005 Paris, France 6 Universität Leipzig, Augustusplatz 10, 04109 Leipzig, Germany 7 Université Joseph Fourier, 621 Avenue Centrale, 38041 Saint-Martin-d’Héres, France Abstract Background This article provides an overview of the first B IO ASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March andSeptember   2013 . B IO ASQ assesses the ability of systems to semantically index very large numbers of  biomedical scientific articles, and to return concise and user-understandable answers to given naturallanguage questions by combining information from biomedical articles and ontologies. Results The 2013 B IO ASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participantswere asked to automatically annotate new P UB M ED  documents with M E SH headings. Twelve teams   participated in Task 1a, with a total of   46  system runs submitted, and one of the teams performingconsistently better than the MTI indexer used by NLM to suggest M E SH headings to curators. Task 1b used benchmark datasets containing  29  development and  282  test English questions, along withgold standard (reference) answers, prepared by a team of biomedical experts from around Europeand participants had to automatically produce answers. Three teams participated in Task 1b, with  11 system runs. The B IO ASQ infrastructure, including benchmark datasets, evaluation mechanisms, andthe results of the participants and baseline methods, is publicly available. Conclusions A publicly available evaluation infrastructure for biomedical semantic indexing and QA has beendeveloped, which includes benchmark datasets, and can be used to evaluate systems that: assignM E SH headings to published articles or to English questions; retrieve relevant RDF triples fromontologies, relevant articles and snippets from P UB M ED  Central; produce “exact” and paragraph-sized “ideal” answers (summaries). The results of the systems that participated in the 2013 B IO ASQcompetition are promising. In Task 1a one of the systems performed consistently better from the NLM’s MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the“ideal” answers; hence, they produced high quality summaries as answers. Overall, B IO ASQ helpedobtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedicalexperts to obtain concise, user-understandable answers to questions reflecting their real informationneeds. Keywords B IO ASQ Competition, Hierarchical Text Classification, Semantic indexing, Information retrieval,Passage retrieval, Question answering, Multi-document text summarization Background B IO ASQ is an EU-funded support action [1] to set up a challenge on biomedical semantic indexing andquestion answering (QA). Participants are required to semantically index documents from large-scale biomedical repositories (e.g., MEDLINE) and to assemble information from multiple heterogeneoussources (e.g., scientific articles, ontologies) in order to compose answers to real-life biomedical Englishquestions. B IO ASQ addresses a central problem biomedical knowledge workers face: to synthesiseand filter information from multiple, large, fast-growing sources. Existing search engines (e.g.,P UB M ED , G O P UB M ED  [2,3]) only partially address this need. They focus on a limited range of resources (e.g., only MEDLINE articles and concepts from G ENE  O  NTOLOGY  or M E SH), whereasmultiple sources (e.g., including specialised drug databases and ontologies) often need to be combined.Furthermore, they mostly retrieve possibly relevant texts or structured information, which the usersthen have to study, filter, and combine by themselves to obtain the answers they seek. By contrast, QAsystems aim to directly produce answers [4]. Semantic indexing, i.e., annotating documents withconcepts from established semantic taxonomies or, more generally, ontologies, provides a means tocombine multiple sources and facilitates matching questions to answers. In recent years, many methodshave been developed that utilize existing ontology structures and concepts to index documents and perform semantic search [5]. Current semantic indexing, however, in the biomedical domain is largely performed manually, and needs to be automated to cope with the vast amount of new information that becomes available daily. At the same time, current semantic indexing and QA methods require moreresearch to reach a level of effectiveness and efficiency acceptable by biomedical experts. B IO ASQsets up ambitious, yet feasible and clearly defined challenge tasks, intended to lead to integrated,  efficient, and effective semantic indexing and QA methods for the biomedical domain. In addition,B IO ASQ helps in the direction of establishing an evaluation framework for biomedical semanticindexing and QA systems. It does so by developing realistic, high-quality benchmark datasets andadopting (or refining) existing evaluation measures for its challenge tasks.Figure 1 provides a general overview of biomedical semantic indexing and QA in B IO ASQ. Other recent approaches also follow a similar approach [4,6]. Starting with a variety of data sources (lower right corner of the figure), semantic indexing and integration brings the data into a form that can beused to respond effectively to domain-specific questions. A semantic QA system associates ontologyconcepts with each question and uses the semantic index to retrieve relevant texts (documents or abstracts, e.g., from P UB M ED  or P UB M ED  C ENTRAL ) to retrieve pieces of structured information(e.g., Linked Open Data triples) and relevant documents (or abstracts, e.g., from P UB M ED ). This isdepicted in the middle of the figure, by the processes included in the  Question Processing   and  Semantic Indexing and Integration  boxes. The retrieved information is then turned into a concise,user-understandable form, which may be, for example, a ranked list of candidate answers (e.g., infactoid questions, like  “What are the physiological manifestations of disorder Y?” ) or a collection of text snippets (ideally forming a coherent summary) jointly providing the requested information (e.g., in “What is known about the metabolism of drug Z?” ). More precisely, the B IO ASQ challenge evaluatesthe ability of systems to perform: (1) large-scale classification of biomedical documents onto ontologyconcepts, in order to automate semantic indexing, (2) classification of biomedical questions on thesame concepts, (3) integration of relevant document snippets, database records, and information(possibly inferred) from knowledge bases, and, (4) delivery of the retrieved information in a conciseand user-understandable form. Figure 1  Overview of semantic indexing and question answering in the biomedical domain. TheB IO ASQ challenge focuses in pushing systems towards implementing pipelines that can realize theworkflow shown in the figure. Starting with a variety of data sources (lower right corner of the figure),semantic indexing and integration brings the data into a form that can be used to respond effectivelyto domain specific questions. A semantic QA system associates ontology concepts with each questionand uses the semantic index of the data to retrieve the relevant pieces of information. The retrievedinformation is then turned into a concise user-understandable form, which may be, for example, a rankedlist of candidate answers (e.g., in factoid questions, like  “What are the physiological manifestations of  disorder Y?” ) or a collection of text snippets, ideally forming a coherent summary (e.g., in  “What isknown about the metabolism of drug Z?” ). The figure also illustrates how these steps are mapped to theB IO ASQ challenge tasks. With blue, Task 1a is depicted, while red depicts Task 1b.To realize the challenge, B IO ASQ organized two tasks, namely Task 1a (covering point number 1 fromthe aforementioned list) and Task 1b (covering the rest of the points from the aforementioned list). InTask 1a, named “  Large-scale online biomedical semantic indexing  ”, participants were asked to classifynew abstracts written in English, as they became available online, before MEDLINE curatorsannotated (in effect, classified) them manually; at any point in time there was usually a backlog of approximately  10 , 000  non-annotated abstracts. The classes came from the M E SH hierarchy, i.e., thesubject headings that are currently used to manually index the abstracts. As new manual annotations became available, they were used to evaluate the classification performance of participating systems(that classified articles before they were manually annotated), using standard information retrieval (IR)measures (e.g., precision, recall, accuracy), as well as hierarchical variants of these measures. In Task 1b, named “  Introductory biomedical semantic QA ”, participants were asked to annotate input naturallanguage questions with biomedical concepts, and retrieve relevant documents, snippets and triples(Phase A). Finally, participants were asked to find and report the answers to the questions (Phase B),given as additional input the golden responses of the Phase A. The answers of the systems werecompared against model answers in English constructed by biomedical experts, using evaluation
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x