Science & Technology

BIG DATA IN BIOSCIENCES

Description
Increasingly bigger and complex data sets are generated within biosciences. To gain useful and actionable knowledge from them it is necessary that these are processed using the efficient " Big Data " approaches. This article provides an
Published
of 5
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  1   BIG DATA IN BIOSCIENCES* Sanjeev S. Tambe Artificial Intelligence Systems Group, Chemical Engineering and Process Development Division, CSIR-National Chemical Laboratory, Dr. Homi Bhabha Road, Pune 411008, India. Email: ss.tambe@ncl.res.in “From the dawn of civilization to 2003, humankind generated five exabytes [ 5 billion gigabytes ]  of data.  Now we produce five exabytes every two days… and the pace is accelerating.”  — Eric Schmidt, Executive Chairman at Google, quoted in The Human Face of Big Data; R. Smolan and J. Erwitt (eds), Against all Cost productions, USA (2012). Abstract: Increasingly bigger and complex   data sets are generated within biosciences. To gain useful and actionable knowledge from them it is necessary that these are processed using the efficient “Big Data” approaches. This article provides an overview of the Big Data concept and its applications in biosciences. 1.0 INTRODUCTION In the last few decades the humankind is witnessing an exponential growth in the volume of data being generated in every walk of life. Consequently, the term “  Big Data” has become a catch-word. Generally, it refers to extremely large-sized, complex, structured or unstructured data sets that are difficult to process using the conventional techniques. Big Data analytics is the process of examining large (i.e. “Big”) data sets containing a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful scientific, technological and commercial information [1]. The personnel who possess the skills in handling Big Data and analytics thereof, are termed “data scientists” and the process of discovering useful patterns from databases (whether small or big) is termed “data mining.” Conventionally, any data set containing petabytes (1024 terabytes) or exabites (1024 petabytes) is considered “Big Data.” Such a set is generated due to, for example, information and/or commercial transactions of millions of people and customers, mobile calls and their records, satellite sensing, patient records, internet * Invited article in the workshop titled “Insights in Biology,” organized by Maharashtra Academy of Sciences, October, 29, 2015 at CSIR-National Chemical laboratory, Pune, India, pp. 25-28 (2015).    2   traffic, social networking , industrial process monitoring, surveillance devices, R&D activities, etc. The specific difficulties encountered in processing large data sets include their capture, archival, transmission, search, classification, modelling, visualization, cleaning, sharing, dimensionality reduction, feature extraction, security and privacy. Notwithstanding these difficulties, Big Data need efficient and effective processing, which results in the improved quality and pace of the delivery of services, products, governance and effective and timely responses during calamities, emergencies and day to day working. The typical steps involved in the servicing and processing of Big Data are outlined in Figure 1. The software tools utilized in performing these steps are part of the disciplines, such as predictive analytics, data mining, text analytics and statistical analysis [2]. In the new technologies for handling Big Data, the raw data with extended metadata are aggregated in a data “lake” and artificial intelligence (AI) and machine learning formalisms are employed to do the tasks such as data clustering, classification and modelling. Big Data computing and analytics is often conducted using cloud computing. Since it involves processing of large data sets in real time, a platform that stores data across a distributed hardware cluster, such as  Hadoop ,   is utilized.  Hadoop  is an open source Java-based programming framework for storing large data sets and running applications on clusters of commodity hardware [2]. 2.0 BIG DATA IN BIOLOGICAL SCIENCES Till recently, biological sciences have been much less mathematics based than other science and technology disciplines. The scenario changed following sequencing of the human genome in the early 2000s [3]. A challenging situation has arisen in the genetic research due to the deluge of the continuously generated data. Biomedical research generates increasingly larger and more complex datasets, including genome sequences, proteomic profiles, in silico  drug screening, and high-res images [4]. An example of the scale of genetic diversity found in the human body is given by Singer [5]: “ The human genome contains roughly 3 billion DNA base pairs and about 20,000 genes. This seems trivial compared to the roughly 100 billion bases and millions of genes that make up the microbes found in the human body . “  3   Figure 1: Various elements of Big Data and analytics thereof The types of data that emanate in bioscience include text, video, audio, images and data registries. Data explosion in biosciences has become particularly severe in recent years since thousands of human genomes, together with those of thousands of other organisms, including plants, animals and microbes have been decoded. Biologists around the world are churning out roughly 15 petabases of sequence per year [5]. According to a fundamental tenet of all information theories, the higher the data volume, the lower is the information and knowledge content therein. This situation, which is often encountered in processing Big Data in biological sciences results in “Data-overload, Information-scarcity” problem. To overcome this difficulty, data are subjected to various remedial measures such as input dimensionality reduction, sensitivity analysis, etc. which aim at reducing the data overload.  4   Table 1: Varieties of Big Data in Biosciences [6] Data types Data srcin Cellular Metabolic and signalling pathways Sub-cellular Genetics, Proteomics, next-generation sequencing Genomics DNA, RNA, Nucleic acid sequences Transcriptomics mRNA, rRNA, tRNA Proteomics Proteins Lipidomics Lipids Ionomics Organisms and cellular ions Health-care Electronic medical/insurance records, pharmacy prescriptions, clinical trials Populations Epidemiology, Social media and networks Organisms Medicine, Diseases, Health insurance 2.1 Applications of Big Data in biosciences A major portion of the data generated in life sciences emanates from the activities of academics in the fields suffixed by “omics” —such as, genomics  and  proteomics  (see Table 1)[5]. In what follows, the major applications of Big Data in biosciences are described.    Big Data analytics is used in studying how different molecules in the environment and genes interact. It will allow the new knowledge of the critical gene networks to provide a better understanding of the biological basis of the health and disease [3].    Owing to the advancements in technology it is possible to analyze the functioning of not just a single gene but hundreds of them collectively [3].    The genetic information acquired globally about patients and diseases will enable the health-care providers to offer individual-specific, tailor made medicines.    The DNA-sequence data contain insights for the development of (a) superior, disease-resistant and high yielding crop varieties that are resistant to the climate change, and (b) drugs for cancer cure, HIV, or new strains of influenza [7].    While conventionally the data analysis is conducted to answer a specific question, creative mining of Big Data will allow the data to inspire questions; thus opening the door for hypothesis-generating as well as hypothesis-driven science [7].    Wearable sensor technology can continuously monitor the personal health parameters such as blood pressure, heart beats, blood glucose and body temperature. Sensors are also capable of monitoring the state of drugs in transit. Big Data analytics can be used to process the information generated by the sensors which can then be transmitted using “Internet-of-Things” technology for speedy and well-informed diagnosis and treatment.  5   3.0 CONCLUSION The humankind has irreversibly ushered in the era of Big Data. Although hardware and software technologies for big data analytics are being developed, the speed of their development needs to be faster to keep pace with the growth in the data generation. Often, Big Data challenges are expressed using four “v”s: volume, variety, velocity,  and veracity . Their effective management could unravel ground breaking new applications, and produce a new way of doing bioscience. The development in the big data analytics will immensely assist in, for instance, correlating genes with illnesses, drug discovery, patient pre-profiling, monitoring of adverse drug reactions, development of disease resistant and high yielding crops, fruits and vegetables, and dispensing personalized medicine. Finally, Big Data technologies are going to create a large number of skilled job opportunities. For a country like India with a large young population and significant information and biotechnology industries, Big Data offers a massive prospect for the innovation and, knowledge and wealth creation. REFERENCES [1] http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics. [2] M. Rouse  ,  “Big Data Analytics  , ”   TechTarget.com (2012), http://searchbusinessanalytics.techtarget.com/definition/big-data-analytics. [3] Big data in biosciences and health-care is focus of new UCLA research centre, http://newsroom.ucla.edu/releases/big-data-in-biosciences-and-health-care-is-focus-of-new-ucla-research-center. [4] http://www.bioohio.com/bioohio-event-to-highlight-big-data-in-bioscience-rd/. [5] E. Singer, Biology’s Big Problem: There’s Too Much Data to Handle, Quanta Magazine Science (2013), http://www.wired.com/2013/10/big-data-biology/  . [6] G. Mordret, Big Data in Science: Which Business Model is Suitable? (2015), http://adcreview.com/articles/big-data-in-science-which-business-model-is-suitable/, doi: 10.14229/jadc.2015.10.10.001. [7] E. S. McCulloch, Harnessing the Power of Big Data in Biological Research,  BioScience , 63: 715 (2013), doi:10.1525/bio.2013.63.9.4, http://bioscience.oxfordjournals.org/content/63/9/715.full.  
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks
SAVE OUR EARTH

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!

x