Math & Engineering

eDNA.pdf

Description
eDNA.pdf
Published
of 15
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
           1 3 Neural Computing and Applications  ISSN 0941-0643 Neural Comput & ApplicDOI 10.1007/s00521-016-2591-2 Detecting invasive species with a bio-inspired semi-supervised neurocomputing approach: the case of Lagocephalussceleratus Konstantinos Demertzis & LazarosIliadis           1 3 Your article is protected by copyright andall rights are held exclusively by The NaturalComputing Applications Forum. This e-offprintis for personal use only and shall not be self-archived in electronic repositories. If you wishto self-archive your article, please use theaccepted manuscript version for posting onyour own website. You may further depositthe accepted manuscript version in anyrepository, provided it is only made publiclyavailable 12 months after official publicationor later and provided acknowledgement isgiven to the srcinal source of publicationand a link is inserted to the published articleon Springer's website. The link must beaccompanied by the following text: "The finalpublication is available at link.springer.com”.  ENGINEERING APPLICATIONS OF NEURAL NETWORKS Detecting invasive species with a bio-inspired semi-supervisedneurocomputing approach: the case of   Lagocephalus sceleratus Konstantinos Demertzis 1 • Lazaros Iliadis 1 Received: 30 January 2016/Accepted: 6 September 2016   The Natural Computing Applications Forum 2016 Abstract  The need to protect the environment and biodi-versity and to safeguard public health require the devel-opment of timely and reliable methods for the identificationof particularly dangerous invasive species, before theybecome regulators of ecosystems. These species appear tobe morphologically similar, despite their strong biologicaldifferences, something that complicates their identificationprocess. Additionally, the localization of the broader spaceof dispersion and the development of invasive species areconsidered to be of critical importance in the effort to takeproper management measures. The aim of this research isto create an advanced computational intelligence systemfor the automatic recognition, of invasive or anotherunknown species. The identification is performed based onthe analysis of environmental DNA by employing machinelearning methods. More specifically, this research effortproposes a hybrid bio-inspired computational intelligencedetection approach. It employs extreme learning machinescombined with an evolving Izhikevich spiking neuronmodel for the automated identification of the invasive fishspecies ‘‘  Lagocephalus sceleratus ’’ extremely dangerousfor human health. Keywords  eDNA    Semi-supervised learning   Semi-supervised ELM    Izhikevich neuron model   Invasive species    Lagocephalus sceleratus 1 Introduction 1.1 Invasive species early detection Invasive species, as a potential impact of climatechange, pose a serious and rapidly worsening threat tonatural biodiversity and ecological balance of the planet,particularly regarding marine species [1]. Although notall alien and invasive species are harmful, the precau-tionary principle dictates that all incomers need to bedetected and that the competent bodies are obliged to beready to respond quickly and deal with any problemsthat may arise. Therefore, early detection of these spe-cies is a critical process, which can slow the uncon-trolled expansion of the problem, increase the likelihoodof eliminating the phenomenon before it is widelyestablished and ultimately avoid the need for costly andlong-term control efforts.The identification and classification of invasive spe-cies using exclusively phenotypic markers is an extre-mely difficult and uncertain process, as neither the bigdifferences in morphology nor the significant similaritiesreflect the level of affinity between the organizations(species problem) [2]. The effort of species identificationusing genetic methods, such as DNA barcoding or byperforming comparisons of biochemical or molecularmarkers, are the best choice for studies of intraspecificpopulations and subspecies. This is because high levelsof polymorphism can be used to describe the geneticdiversity, assessing the degree of genetic differentiationbetween populations [3].The  Lagocephalus sceleratus  is common in the tropicalwaters of the Indian and Pacific oceans. It is a characteristiccase of invasive species whose presence in the Mediter-ranean Sea causes serious problems. Its uncontrolled &  Konstantinos Demertziskdemertz@fmenr.duth.grLazaros Iliadisliliadis@fmenr.duth.gr 1 Department of Forestry and Management of the Environmentand Natural Resources, Democritus University of Thrace,193 Pandazidou St., 68200 Orestiada, Greece  1 3 Neural Comput & ApplicDOI 10.1007/s00521-016-2591-2  invasion and its reproduction threatens the marine envi-ronment with an irreparable imbalance. Its presence causesan intense competition with the native fish regarding theavailable food. Moreover, it is extremely poisonous if eatenbecause it contains tetrodotoxin in its ovaries and to alesser extent in its skin muscles and liver, which protects itfrom voracious predators. It becomes toxic as it eats bac-teria that contain the toxin. This deadly substance causesparalysis of voluntary muscles, which may cause its vic-tims to stop breathing or induce heart failure [4]. 1.2 Environmental DNA (eDNA) The environmental DNA (eDNA) is recovered from anenvironmental sample such as soil or water, rather than asingle body. This technique relies on the fact that all theanimals leave, in the area driven, DNA residues viafeces, urine and skin. Taking samples (e.g., water) andanalysis of finding eDNA, it is possible to demonstratethe presence of species without actually having thisspecies to be caught or seen. Such samples can beanalyzed by high-performance methods of DNAsequencing determination, for the rapid measurementand monitoring of biodiversity. The process of analyzingthese samples called metagenomics requires specializedequipment and personnel in specialist laboratories and isquite expensive [5, 6]. 1.3 Species detection by eDNA The methodology used involves a fairly complex processin which specific primers are used in the first stage(species specific primers—SSP) [7]. Primer is a short,synthesized oligonucleotide which is used in molecularsearch. It is designed to recognize the precise sequenceof DNA nucleotides, which is afterward used as a modelfor PCR and amplifies the specific part of the strand.One of the most important factors for successful DNAamplification is the proper design of primers that arespecies specific. The starters they interact only with theDNA of the target species sought. Then, the typicallyquite small amount of DNA of the target species that isdetected in the eDNA (if any) is amplified by the processof polymerase chain reaction (polymerase chain reac-tion—PCR).This fact formalizes the existence and identification of the target species. For this method, there is a compromisebetween the numbers of species that can be detected on thebasis of the available primers that may be used. Also whenprimers are targeted at too many species (Multi specificApproach), rare species may be ignored, which imposesfocused search to a particular group or species family[5, 6]. 1.4 DNA-based identification The procedure described in the paper starts by taking arandom sample from the environment (eDNA), whichcontains material from different species, maybe thousands.The target is the identification of the genetic material of fish to then identify the genetic material of   L. sceleratus .To accomplish this, we use the respective sequence-specific primers (SSP) with genetic material from thegroups (Algae, Cnidaria, Fishes, Mammals) which aremarine species and have a similar genetic form. The aim isto use them in the training of the semi-supervised ELMmodel, to isolate the desired groups. The reason for usingfour SSP is to create a realistic and highly complex dataset.The SSP serves as reagents which are activated as soon asthe corresponding DNA has been found. In this way, weisolate the genetic material of the fish of interest.Since we complete the first stage and the DNA isgrouped into four classes (algae, Cnidaria, fishes, mam-mals), the second phase of the proposed algorithmicapproach follows. In this stage, the class ‘‘fishes’’ obtainedfrom the previous process is considered as the initialdataset and thus pattern recognition is performed based onthe Izhikevich spiking neuron model. This process man-ages to achieve the final goal which is the detection of the  L. sceleratus  DNA. 1.5 Literature review Valentini et al. [8] tested if an eDNA metabarcodingapproach, using water samples, can be used for addressingsignificant questions in ecology and conservation. Two keyaquatic vertebrate groups were targeted: amphibians andbony fish. The reliability of this method was cautiouslyvalidated in silico, in vitro, and in situ. When comparedwith traditional surveys or historical data, eDNAmetabarcoding showed a much better detection probabilityoverall. For amphibians, the detection probability witheDNA metabarcoding was 0.97 (CI 0.90–0.99) versus 0.58(CI 0.50–0.63) for traditional surveys. For fish, in 89 % of the studied sites, the number of taxa detected using theeDNA metabarcoding approach was higher or identical tothe number detected using traditional methods.Research by Herder et al. [9] has shown that in thismethod it is possible to detect species without actuallyseeing or catching them. The method uses DNA-basedidentification, to detect species from extracellular DNA, orcell debris, that species leave behind in the environment.Dejean et al. [10] compare the sensitivity of traditionalfield methods, based on auditory and visual encountersurveys, with an eDNA survey for the detection of theAmerican bullfrog  Rana catesbeiana  =  Lithobates cates-beianus , which is invasive in south-western France. They Neural Comput & Applic  1 3  demonstrate that the eDNA method is valuable for speciesdetection and surpasses traditional amphibian surveymethods in terms of sensitivity and sampling effort. Thebullfrog was detected in 38 sites using the molecularmethod, compared with seven sites using the diurnal andnocturnal surveys, suggesting that traditional field surveyshave strongly underestimated the distribution of theAmerican bullfrog. Dejean et al. [11] estimated the time of DNA detection taking into account aquatic environmentconditions and DNA concentrations. Experimentation wasperformed on two different species: the American bullfrog(  Rana catesbeiana  =  Lithobates catesbeianus ) and theSiberian sturgeon (  Acipenser baerii ).On the other hand, in [12], Pan Yi discusses the use of machine learning methods with various advanced encodingschemes and classifiers to improve the accuracy of proteinstructure prediction. Also, in [13] a machine learningmethod is proposed for classifying DNA-binding proteinsfrom non-binding proteins based on sequence information.Finally, paper [14] introduces three ensemble machinelearning methods for analysis of biological DNA bindingby transcription factors (TFs). The goal is to identify bothTF target genes and their binding motifs. Subspace-valuedweak learners (formed from an ensemble of different motif finding algorithms) combine candidate motifs as probabil-ity weight matrices (PWM), which are then translated intosubspaces of a DNA k-mer (string) feature space. Assess-ing and then integrating highly informative subspaces bymachine methods gives more reliable target classificationand motif prediction. 2 Innovation of this research The most important innovation proposed by this research isthe use of machine learning methods to analyze and detectan invasive species through eDNA analysis. Although thereare several related analytical studies that make use of theeDNA [6, 8–11] (to the best of our knowledge), it is the first attempt in the literature that employs a spiking neuralnetworks machine learning approach.Also, an important innovation is the proposal of incor-poration of artificial intelligence, in digital machines thatcan identify invasive or rare species based on their geneticmaterial, easily quickly and at minimal cost [3]. This willgreatly enhance the planning and development of innova-tive biosecurity programs for the European Union [15] andother countries [16]. Also, by adding machine learningalgorithms in DNA identification systems, the process issimplified, and the time required to export the results of identification is reduced and minimized for the reason thata usual system can manage one sample at a time andgenerate the profile within 90 min [17]. Another innovativeaspect of this research is related to the collection andselection of the data, which emerged after extensive com-parisons between the primers based on the FASTA algo-rithm [18]. These data vectors were the training samples inthe learning process. Finally, the innovation is enhancedfurther by the development and use of a hybrid machinelearning model (HMLM). The method proposed hereincombines the semi-supervised classification (SSC) ELMalgorithm with a sophisticated classification approach thatemploys the Izhikevich neuron model, whose performanceis optimized with the differential evolution algorithm(DEA). The HMLM combines for the first time two veryfast and highly accurate algorithms of biologically inspiredmachine learning, to solve a multidimensional and com-plex genetic identification problem. 3 Methodologies 3.1 Semi-supervised learning The main drawback of classical learning methods with fullsupervision is that they need a large number of labeledtraining examples to construct a model with accept-able accuracy. The training is usually done manually by theinstructor, which is a tedious and time-consuming process.A key feature of learning with partial supervision (PSL) isthe use of pre-classified and at the same time unsortedcases (in the training process) to produce the final model.PSL uses first time seen examples, selected from theallocation followed in the real world, to enhance the effi-ciency of the learning process, using as few manually pre-classified data vectors as possible. Self-training, mixturemodels, graph-based methods, co-training and multiviewlearning are characteristic examples of PSL [19]. It shouldbe emphasized that the success of learning with partialsupervision depends on some basic assumptions imposedby each model or algorithm. 3.2 Semi-supervised ELM classification The ELMs are characterized by the possibility to establishthe parameters of hidden nodes randomly before they see thetraining data vectors; they are extremely fast and efficientand can handle a multitude of trigger functions withoutproblems such as stopping criterion, learning rate andlearning epochs [20]. The semi-supervised classificationELM approach works provided that the input patterns withand without data tags come from the same marginal distri-bution or follow a common classes structure. The unclassi-fied data vectors provide useful information to explore thedata structure of the overall dataset, whereas the sorted datacontribute to the success of the learning process. Neural Comput & Applic  1 3
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks