Optimising the use of existing knowledge

1. Defragmentation: Maximising the Use of Existing Knowledge Jan Velterop — APE 2015 — Berlin 21 January 2015 2. Open Access… 3. …is not the goal 4. It is a means…
of 57
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  • 1. Defragmentation: Maximising the Use of Existing Knowledge Jan Velterop — APE 2015 — Berlin 21 January 2015
  • 2. Open Access…
  • 3. …is not the goal
  • 4. It is a means to reach the goal
  • 5. And the goal is…?
  • 6. Maximal usefulness of existing scientific research results in order to achieve: efficient, fast, and effective new knowledge creation and discovery i.e. highest possible return on public investment
  • 7. optimal dissemination… …of knowledge
  • 8. The ultimate goal, to which Open Access is merely a means, may not be widely understood – by publishers The ultimate goal, to which Open Access is merely a means, may not be widely understood
  • 9. That may be why there are a lot of different interpretations of what Open Access actually is (in spite of the clear definition given in the Budapest Open Access Initiative)
  • 10. The fact that not all published research is accessible to all researchers, leads to ‘lamp post research’
  • 11. Lamp post research
  • 12. Looking merely at the literature that one can access – which is not necessarily the literature that is potentially important to one’s research Lamp post research:
  • 13. Publicatarrh & Datarrhoea
  • 14. In year Cumulative Number of abstracts in PubMed 11,135,542
  • 15. In year Cumulative Number of abstracts in PubMed …averaging more than 2 abstracts added every minute in 2014…
  • 16. On the impossibility of being expert 341 doi: (Published 14 Dece More scientific and medical papers are being published now than ever before. Authors Alan G Fraser and Frank D Dunstan think that new strategies are needed to deal with this avalanche of information new strategies are needed
  • 17. How does a researcher decide what’s ‘relevant’ anyway?
  • 18. How are we filtering or choosing?
  • 19. Possible solutions?
  • 20. problemEvery has its solution
  • 21. problemEvery has itssolution
  • 22. Possible solutions? Publish fewer articles Don’t be ridiculous! Find better ways to decide what’s truly relevant Now you’re talking!
  • 23. First create an overview…
  • 24. …only then start digging
  • 25. We need the equivalent of aerial surveys — ‘knowledge drones’? — Some of my professors were already known as ‘knowledge drones’ :-)
  • 26. How might we create overviews?
  • 27. Getting the picture from a large number of data points ‘Whole-o-gram’
  • 28. Getting a better picture from even more data points
  • 29. Homing in on detail
  • 30. It’s not just about finding information It’s also – and possibly more – about the value & power of ‘recombinant knowledge’
  • 31. Saving significant time-to-knowledge After analysis in BRAIN: 4 minutes Arriving at this conclusion (review in Frontiers Immunology) after reading 221 papers: weeks 5 “Chronic immune activation is the primary driver in HIV pathogenesis”
  • 32. What stands in the way? different… • publishers • journals • platforms • licences • formats • silos • languages First of all: fragmentationAnd also, of course: access (lack of) Not to the whole article…but to the data and assertions buried in them
  • 33. Plenty of initiatives to find stuff: • PubChase – Open Access Biomedical Journal Reference Library • Paperity • SciLit – Database of Scientific and Scholarly Literature • Google Scholar • Et cetera Some go further: • Europe PubMed Central – offering semantic tools
  • 34. 0 1000000 2000000 3000000 4000000 Title Full-text in PMC of which with CC-licence all full-text articles in PubMedCentral (100%) all articles with CC-licences (11.9%) all articles with CC-BY licences (8.7%) 3,087,430 366,973 270,114 Europe-PMC, 19 December 2014“The majority of articles in PMC are subject to traditional copyright restrictions” Not many ‘true’ open access:
  • 35. What we need is information extracted from as many articles as possible The more we have, the ‘sharper’ the knowledge picture
  • 36. Fragmentation and lack of access are encumbrances to seamless knowledge- pattern-analyses and themed collection building (e.g. of graphs)… …which are fast becoming an absolute necessity due to the vast amounts of published material, growing every year, and, of course, in the aggregate
  • 37. “As the rate of publishing accelerates, the need for computational support to work out which articles to read, and how to interpret, reproduce and validate the claims they contain is growing.” Quote from ‘Lazarus’:
  • 38. Traditional publications are aimed at consumption by humans; “stories that persuade with data”* Not easily amenable to machine-processing * Anita de Waard, Elsevier
  • 39. In the life-science literature, we typically find: • drug-like molecules represented as illustrations; • biochemical properties as tables or graphs; • protein/DNA sequences buried amongst text; • references and citations with arcane formats; • other objects of biological interest being given ambiguous names. And, horrors like this (from PLOS, h/t Peter Murray-Rust): + (plus underscored) isn’t the same as ± (plus-minus)!
  • 40. • re-type figures from tables; • chase citations through digital libraries; • redraw molecules by hand; • et cetera. tedious, error-prone, wasteful scientists should be able to use their precious time better This creates the need to:
  • 41. ocuments Via UD, LAZARUS ‘resurrects’ knowledge from being buried in articles: • entities (‘concepts’, incl. synonyms, e.g. proteins) • phrases, statements, assertions (e.g. triples) • molecules (incl. Markush structure groups) • graphs • tables
  • 42. • entities (‘concepts’, incl. synonyms, e.g. proteins) • phrases, statements, assertions (e.g. triples) • molecules (incl. Markush structure groups) • graphs • tables These are captured – with their provenance, e.g. DOI – in a ‘Knowledge Graph’ of their relationships When assertions are captured, they are compared to the Knowledge Graph and labelled as ‘new’ (to the Graph) or ‘already found earlier’
  • 43. “Lazarus to harness the crowd reading life- science articles to resurrect the swathes of legacy data buried in charts, tables, diagrams and free-text, to liberate processable data into a shared resource that benefits the community.” “…activities currently carried out anyway by individuals for their own purposes (annotating, cross-referencing articles with databases, organising collections of articles).”
  • 44. VHL protein binds to HIF-α which is ubiquitinated and tagged for degradation in the proteasome.
  • 45. These ‘assertions’ form the ‘knowledge profile’ of an article, and are added to a growing ‘knowledge graph’ which can be analysed for trends, clusters, areas of intensive activity, et cetera.
  • 46. Some other initiatives to bring the open literature together so that it can be used for large scale semantic analyses:
  • 47. The goal of Libraccess is to aggregate, de-duplicate, clean and index scientific resources in open access repositories, from all countries, from all disciplines, and make them available to all, through a website and with APIs.
  • 48. Research Pad Open Access Journal Reference Library (
  • 49. Converting all that’s open (CC-BY) into ePub format for tablets and smartphones. What I find most interesting, however, is their plan* to make the whole body of all literature that’s openly accessible available in XML for semantic analysis† * being worked on as we speak, they confirmed to me † I hope they will add the ‘knowledge profiles’ of paywalled articles created by Lazarus
  • 50. Build collection of favouritesRead full textshare with othersInspect metrics
  • 51. technical inquiries:
  • 52. Thank you Jan Velterop — APE 2015 — Berlin 21 January 2015
  • Scribd

    Jul 23, 2017
    Similar documents
    View more...
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks