Screenplays & Play

A Novel Preprocessing Method Using Hilbert Huang Transform for MALDI-TOF and SELDI-TOF Mass Spectrometry Data

Description
A Novel Preprocessing Method Using Hilbert Huang Transform for MALDI-TOF and SELDI-TOF Mass Spectrometry Data
Published
of 15
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Novel Preprocessing Method Using Hilbert HuangTransform for MALDI-TOF and SELDI-TOF MassSpectrometry Data Li-Ching Wu 1,2 * , Hsin-Hao Chen 1 , Jorng-Tzong Horng 1,3,4 , Chen Lin 1 , Norden E. Huang 1,5 , Yu-CheCheng 6 , Kuang-Fu Cheng 7,8 1 Graduate Institute of System Biology and Bioinformatics, National Central University, Jhongli, Taiwan,  2 Research Center for Biotechnology and Biomedical Engineering,National Central University, Jhongli, Taiwan,  3 Department of Computer Science and Information Engineering, National Central University, Jhongli, Taiwan,  4 Departmentof Bioinformatics, Asia University, Wu-feng, Taiwan,  5 Research Center for Adaptive Data Analysis, National Central University, Jhongli, Taiwan,  6 Proteomics Laboratory,Cathay Medical Research Institute, Cathay General Hospital, Xizhi, Taiwan,  7 Graduate Institute of Statistics, National Central University, Jhongli, Taiwan,  8 GraduateInstitute of Statistics, China Medical University, Taichung, Taiwan Abstract Motivation:   Mass spectrometry is a high throughput, fast, and accurate method of protein analysis. Using the peaksdetected in spectra, we can compare a normal group with a disease group. However, the spectrum is complicated by scaleshifting and is also full of noise. Such shifting makes the spectra non-stationary and need to align before comparison.Consequently, the preprocessing of the mass data plays an important role during the analysis process. Noises in massspectrometry data come in lots of different aspects and frequencies. A powerful data preprocessing method is needed forremoving large amount of noises in mass spectrometry data. Results:   Hilbert-Huang Transformation is a non-stationary transformation used in signal processing. We provide a novelalgorithm for preprocessing that can deal with MALDI and SELDI spectra. We use the Hilbert-Huang Transformation todecompose the spectrum and filter-out the very high frequencies and very low frequencies signal. We think the noise inmass spectrometry comes from many sources and some of the noises can be removed by analysis of signal frequencedomain. Since the protein in the spectrum is expected to be a unique peak, its frequence domain should be in the middlepart of frequence domain and will not be removed. The results show that HHT, when used for preprocessing, is generallybetter than other preprocessing methods. The approach not only is able to detect peaks successfully, but HHT has theadvantage of denoising spectra efficiently, especially when the data is complex. The drawback of HHT is that this approachtakes much longer for the processing than the wavlet and traditional methods. However, the processing time is stillmanageable and is worth the wait to obtain high quality data. Citation:  Wu L-C, Chen H-H, Horng J-T, Lin C, Huang NE, et al. (2010) A Novel Preprocessing Method Using Hilbert Huang Transform for MALDI-TOF and SELDI-TOF Mass Spectrometry Data. PLoS ONE 5(8): e12493. doi:10.1371/journal.pone.0012493 Editor:  William C. S. Cho, Queen Elizabeth Hospital, Hong Kong Received  November 26, 2009;  Accepted  August 5, 2010;  Published  August 31, 2010 Copyright:  2010 Wu et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricteduse, distribution, and reproduction in any medium, provided the srcinal author and source are credited. Funding:  This project is supported by the Cathay General Hospital (www.cgh.org.tw) and National Central University (www.ncu.edu.tw) Collabaration Projectnumber 97CGH-NCU-A1 and National Science Council Project number 98-2627-M-008-003. The funders had no role in study design, data collection and analysis,decision to publish, or preparation of the manuscript. Competing Interests:  The authors have declared that no competing interests exist.* E-mail: richard@mail.sybbi.ncu.edu.tw Introduction Mass spectrometry is currently used to explore protein profilesexpressed under different physiological and pathophysiologicalconditions [1]. Moreover, recent progress has opened up newavenues for tumor-associated biomarker discovery [2]. A massspectrum of a sample is a profile representing the distribution of components by mass-to-charge ratio. Spectra of tissues or fluids,like serum, are studied for possible profile changes that furtherdisease diagnosis. Matrix assisted laser desorption ionization(MALDI) and surface-enhanced laser desorption ionization(SELDI) time of flight (TOF) are the two commonly techniquesused to generate profiles from experimental samples. The chief feature of mass spectra is the peaks detected in terms of theirintensity values and time of flight values. Further peak identi-fication can be done if tandem mass spectrometry is available [3].Since each spectrum contains ten thousands of time of flightpoints with various intensities, noise in the spectra is unavoidable.Therefore, it is important to develop a suitable algorithm fordata preprocessing that improves performance when analyzing spectra.Recently, various data preprocessing methods have been usedand these usually comprise several steps. First, baselinesubtraction is often used to rescale the plots with the aims of removing systematic artifacts produced by small clusters of matrixmaterial [4]. Next, denoising attempts to remove noise signalsthat are added to the true spectra from the matrix material andby sample contaminants (chemical noise) together with noisecaused by the physical characteristics of the machine (electricalnoise) [5,6]. Furthermore, alignment is a required for combining unusual groups of data together. The same peak may be present,but with small gaps between the different biological samples due PLoS ONE | www.plosone.org 1 August 2010 | Volume 5 | Issue 8 | e12493  to unavoidable inaccuracy in the spectrum. Peak detection is stillnecessary across every method and is a key feature of preprocess-ing the data. It is necessary to detect each peak by relying on theirpeak intensity and time of flight. Finally, normalization helps usto have a uniform format for the analysis of the data and thiscorrects any systematic variation between the different spectra[7].There are many studies that have described preprocessing of mass spectrum data [6,8,9,10,11,12,13] and have explored theirapproach’s influence on the raw data. In the study of Meuleman,Engwegen et al. [14], they compared various different algorithmsthat can be used for normalization. In another, Beyer, Walter et al.[15] compared the performance ofthe package ‘‘Ciphergen ExpressSoftware 3.0’’, which is produced by Ciphergen against the ‘‘Rpackage PROcess’’. Recently, Cruz-Marcelo, Guerra et al. [7]compared a number of widely used algorithms, namely ‘‘Pro-teinChip   Soft-ware 3.1’’ (Ciphergen Biosystems), ‘‘Biomarker’’Wizard (Ciphergen Biosystems), ‘‘PROcess’’, which was written byXiaochun Li as the ‘‘BioConductor’’ package, ‘‘Cromwell’’ writtenusing Matlab scripts, ‘‘SpecAlign’’ developed by Wong, Cagney etal. [11], and ‘‘MassSpecWavelet’’ developed by Du, Kibbe et al.[12] as a ‘‘BioConductor’’ package. Nevertheless, although manypreprocessing methods have been put forward, the preprocessing algorithm can still be improved.In the past, the scientists have tried to compute a formula fornoise that consists of chemical and machine noise using a statisticalmethod and then constructing a model based on this. However,the chemical noise is generally due to true peaks, namely organicacids, which are part of the matrix used in mass spectrometry. Thematrix has two purposes: ionization and protection. It provideshydrogen ions to the peptides or proteins, which are then allowedto undergo ionization and flight in the machine. In addition, thematrix protects the peptide or protein during the laser flash.Matrix noise usually appears in the low mass-to-charge ratioregions (  , 1000DA). Nonetheless, we need to understand thatpeaks in the low mass-to-charge ratio region are not only due tochemical noise but also contain true signal peaks. If we mixed thechemical noise with the machine noise as part of preprocessing, wemight conclude that the noise strongly affecting the low mass-to-charge ratio region is due to the abundance of organic acids in thisregion. However, if we take into account the difference betweenchemical and machine noise when we analyze the spectra, weought to be able to separate chemical noise from machine noise;this is because the peaks in the low mass-to-charge ratio regionsare due to the organic acids and thus distinct from machine noise.The machine noise may come from variety of different sourcesincludes air dust, electric detection limitation, electric white noise,and even earth magnetic field. These noises may not have fixedfrequences since the m/z value have measurement shifting problem.In the present study we present a novel preprocessing method using Hilbert Huang transformation (HHT) that is usedto decompose a non-linear and non-stationary model. By using HHT, the data can be decomposed into different trends whichseparate some noises from signals. The main advantage of HHT is non-stationary. It does not make strong assumptionthat the signal respond to the axis to be stationary distributed.Since the m/z axis exist shifting problem, the HHT can Figure1.Components decomposed usingHHT.  HHT decomposes the spectrum into sixteen components. From bottom to top, we call them asC1, C2, and so on. Summation of all components can is the srcinal spectrum.doi:10.1371/journal.pone.0012493.g001Using HHT on Mass SpectrometryPLoS ONE | www.plosone.org 2 August 2010 | Volume 5 | Issue 8 | e12493  eliminate more non-stationary noises than stationary methodsuch as wavelet. The disadvantage is the calculation time willmuch longer then stationary method. We then compare ouralgorithm with three familiar preprocessing methods and withanother algorithm that has been suggested by Cruz-Marcelo,Guerra et al. [7]. Figure 2. Before and after HHT preprocessing.  (a) The average of fifty ovarian cancer datasets. There are greater amounts of noise in the lowregion than in the high region. The scale is approximately ten to the ninth power. (b) The same figure after using the Hilbert Huang transformationformula and the various modifications carried out after preprocessing. When comparing with (a), it can be seen that the chief peaks and the profileare maintained.doi:10.1371/journal.pone.0012493.g002Using HHT on Mass SpectrometryPLoS ONE | www.plosone.org 3 August 2010 | Volume 5 | Issue 8 | e12493  Materials and Methods Hilbert Huang transformation HHT [16] is an adaptive data analysis method for non-linearand non-stationary processes. We use HHT to define the trends ina spectrum. In the past, we have defined the trend that representsthe baseline and the noise as a straight line, which is then fitted tothe spectrum; then we removed the straight line to yield a zero-mean residue. However, such trends are not suitable for non-lineardata and the real-world. Noise exists non-linearly and is non-stationary. In reality, the line is non-linear and non-stationarywhen we try to rescale spectra.The main feature of the HHT is the empirical modedecomposition (EMD) method with which any complicated datacan be decomposed into a finite and often a small number of components called intrinsic mode functions (IMF). We define theIMF if the intrinsic mode of oscillation satisfies two conditions:firstly that the number of the extrema and the number of the zero-crossings must either equal or differ at most by one in the wholedataset and, secondly, the mean value of the envelope defined bythe local minima is zero at any point. The IMFs by the EMDmethod are chiefly obtained by an approach called the sifting process. Actually, the number of IMFs is closed to log2N where Ndenotes the total number of data points. The sum of all IMFs isequal to the srcinal data.We chose one of the ovarian cancer datasets from the NationalCancer Institute published by Kwon, Vannucci et al. [6] toundergo the HHT process. Sixteen IMF components wereidentified while applying sifting process to our data. As is shownin the Figure 1, the later components, namely the ones from thefourteenth IMF to the sixteenth IMF, can be removed for thepurpose of rescaling; we also removed the components from firstIMF to sixth IMF for the purpose of denoising. Thus a significantpart of the chemical noise can be separated from the mainspectrum by removing the first components. Subsequent Modifications In addition to using the HHT for de-noising, the baseline needsto be adjusted. Here we apply SpecAlign software for baselineestimation, which is available at PHYSCHEM.OX.AC.UK/ ,  JWONG/SPECALIGN [11]. For removing the baseline, thesoftware has two user-defined options: window size of the baselineand subtraction of the baseline. We set the window size as 20, andthen we remove the baseline. After baseline subtraction, werescaled the spectrum to positive. We moved the whole spectrumto be positive by changing the intensity values. However, we didnot change other parameters. Our method, which we have calledHHTMass, consists of using the Hilbert Huang transformation fordenoising followed by modification of the spectrum by baselinesubtraction and rescaling. The spectrum before and after theHHT preprocessing are shown in Figure 2(a) and 2(b) (spectrumsource shown in data source section). Peak detection We apply three methods, namely MassSpecWavelet [12],SpecAlign [11], and PROcess [9] for peak detection. The majorfeature ofMassSpecWavelet isthatthe package does not contain anypreprocessing method. According to Cruz-Marcelo, Guerraet al. [7], MassSpecWavelet has the best performance in terms of peakdetection.PROcessisaBioConductorpackagebyLi[9],whichhas high quality of peak quantification. SpecAlign, written by Wong [11], is a well known spectrum analysis software package; it has theusefulpropertyofcontainingmanyuserdefinedoptionsthatincreasechoice.ThepreprocessingmethodologylinkedtoMassSpec-Wavelet Figure3.Samplepancreatic cancerdata. Original data of pancreatic cancer provided by Ge and Wong (2008). In this dataset, there is more noisewhere the peaks exist.doi:10.1371/journal.pone.0012493.g003Using HHT on Mass SpectrometryPLoS ONE | www.plosone.org 4 August 2010 | Volume 5 | Issue 8 | e12493  Using HHT on Mass SpectrometryPLoS ONE | www.plosone.org 5 August 2010 | Volume 5 | Issue 8 | e12493
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks