A Correlation between Protein Function and Ligand Binding Profiles

We report that proteins with the same function bind the same set of small molecules from a standardized chemical library. This observation led to a quantifiable and rapidly adaptable method for protein functional analysis using experimentally derived
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Published:  March 03, 2011 r 2011 American Chemical Society  2538  dx.doi.org/10.1021/pr200015d |  J. Proteome Res.  2011, 10, 2538 – 2545 ARTICLEpubs.acs.org/jpr Correlation between Protein Function and Ligand Binding Profiles Matthew D. Shortridge, Michael Bokemper, Jennifer C. Copeland, Jaime L. Stark, and Robert Powers* Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Nebraska 68588-0304, United States b S  Supporting Information ’ INTRODUCTION The recent explosion in sequenced genomes has revealed a vast number of proteins that lack a functional annotation. 1 Many of these unannotated proteins may play an important role inhuman disease and, correspondingly, are critical for developingnew therapeutics. Protein sequence and structure similarity methods are currently the most robust and widely used toolsto annotate a protein of unknown function. 2 Nevertheless, thesemethods are limited in scope, prone to errors, and based on asmall set of experimentally characterized proteins. 3 Only 40  60% of sequences suggest a potential functional assignment.Moreover, error rates of <30% occur even with conservativesequence identities of >60%. The accuracy of functional annota-tions decreases substantially in the twilight zone of 20  35%sequence identity.Recent attempts to extend functional prediction beyondglobal sequence and structure similarity have led to the devel-opment of active-site similarity search methods. 4  7 These meth-ods try to identify protein surface structures that interact with biologically important ligands since active-sites that share asimilarity in sequence, structure and ligand binding are predictedto be functionally related. This is based on the fundamentalprincipal that a protein ’ sactive-site has been optimized by natureto interact with a unique and speci fi c set of targets, where thisinformation can be leveraged to understand function. Conse-quently, protein surfaces have been shown to be exquisitely selective and to only bind ligands at very speci fi c functionally relevant locations. 8  11 This understanding is also essential todrug discovery, where extensive resources are allocated by thepharmaceutical industry to identify high-a ffi nity and selectivecompounds that target a speci fi c therapeutically relevantprotein. 12,13 The use of ligands as functional probes is the basisof our FAST-NMR methodology  4,14 that has been successfully applied to explore the function of   Staphylococcus aureus  proteinSAV1430, 4  Pseudomonas aeruginosa  protein PA1324, 15  Pyrococ-cus horikoshii  OT3 protein PH1320, 14 human protein Q13206, 14  Bacillus subtilis  protein YndB, 16 and  Salmonella typhimurium  PrgIprotein. 17 Similar successes have been reported using ligand binding to infer function in virtual screens. 18,19  While promising,current active-site similarity techniques still rely on high-resolu-tion protein structures to identify and measure functionalsimilarity. 20 The availability of structures for the entire proteomeremains a signi fi cant bottleneck for the high-throughput func-tional annotation of hypothetical proteins. We report herein a new method to infer protein function thatis independent of sequence and structural information. Ourmethod uses a similarity in ligand binding pro fi les to annotatea protein of unknown function. This is similar in concept to themapping of pharmacological space or the use of structure  activ-ity relationships (SAR) for target selection and chemical leadidenti fi cation in drug discovery. 21  24  A ligand binding pro fi le isde fi ned as a set of ligands that bind a protein from a high-throughput ligand a ffi nity screen. Ligand binding is monitoredusing our 1D  1 H NMR line-broadening screen. 25 In essence, thechemical and structural diversity of a compound library providesan experimental means of mapping the physiochemical proper-ties of a protein ’ s active-site based on the compounds that do ordo not bind the protein. Functional annotation is inferred by clustering unknown proteins with previously annotated proteinsthat share similar ligand binding pro fi les from the same chemicallibrary. A modi fi cation of the E-value routinely used in sequence Received:  January 7, 2011  ABSTRACT:  We report that proteins with the same function bind the same set of small molecules from a standardizedchemical library. This observation led to a quanti fi able andrapidly adaptable method for protein functional analysis usingexperimentally derived ligand binding pro fi les. Ligand bindingismeasuredusingahigh-throughputNMRliganda ffi nityscreen with a structurally diverse chemical library. The method wasdemonstrated using a set of 19 proteins with a range of functions. A statistically signi fi cant similarity in ligand bindingpro fi les was only observed between the two functionally identical albumins and between the  fi  ve functionally similaramylases. This new approach is independent of sequence,structure, or evolutionary information and, therefore, extends our ability to analyze and functionally annotate novel genes. KEYWORDS:  protein function, ligand binding, NMR ligand a ffi nity screen, functional genomics, functional annotation  2539  dx.doi.org/10.1021/pr200015d |  J. Proteome Res.  2011, 10,  2538–2545 Journal of Proteome Research ARTICLE homology is used to quantify ligand binding pro fi le similarities.Themethodologyisdemonstratedusing19proteinswitharangeof function de fi ned by Gene Ontology (GO) terms. 26 ’ EXPERIMENTAL SECTION Materials The human serum albumin (HSA) (essentially fatty acidfree,  g  96% pure), bovine serum albumin (BSA) (minimum98% agarose gel electrophoresis, lyophilized),  R -amylase from  Bacillus lincheniformis  (Bli) (500  1500 units/mg protein,93  100% (SDS page)),  R -amylase from  Aspergillus oryzae (Aor) (powder, ∼ 30 units/mg), R -amylase from  Bacillus amylo-liquefaciens  (Bam) (liquid,  g 250 units/g protein),  β -amylasefrom barley (Hvu) (type II  B 20  80 units/mg protein), and  β -amylase from sweet potato (Iba) (Type I  B, ammonium sulfatesuspension,  g 750 units/mg protein) protein samples were allpurchased from Sigma (St. Louis, MO). The  S. typhimurium  PrgIprotein samples and assigned  1 H  15 N HSQC spectrum weregenerously provided by Dr. Roberto DeGuzman (University of Kansas).  Staphylococcus aureus  primase C-Terminal domain(CTD) protein sample was purchased from Nature Technolo-gies Corporation (Lincoln, NE).  H. sapiens  diacylglycerol kinasealpha (DGKA),  P. aeruginosa  unannotated protein PA1324,  S.aureus  unannotated protein SAV1430,  S. typhimurium  unanno-tated protein STM1790,  H. sapiens  ubiquitin-fold modi fi er-con- jugating enzyme 1 (UFC1),  E. coli  unannotated protein YjbR,  E. coli  unannotated protein YkfF,  B. subtilis  unannotated protein YkvR and  E. coli  unannotated protein YtfP protein samples wereprovided by Dr. Gaetano Montelione, Director of the NortheastStructural Genomics Consortium (NESG, www.nesg.org). The S.aureus nuclease wasoverexpressed inhousefrom acell stock of   E. coli  Bl21 DE3 codon þ  (Stratagene) containing the pET28-(a) þ plasmid with the  dnuc  gene provided by Dr. Greg Somer- ville (University of Nebraska-Lincoln) grown in LB broth andpuri fi ed using a Talon cobalt a ffi nity resin (Clontech). Thedeuterium oxide (99.9 atom % D) and the dimethyl sulfoxide-d 6  (99.9 atom % D) were purchased from Aldrich (Milwaukee, WI) The 3-(trimethylsilyl)propionic acid-2,2,3,3-d 4  (TMSP-d 4 ) was purchased from Cambridge Isotope (Andover, MA). TheBis-Tris-d 19  (98 atom % D) was purchased from Isotec(Milwaukee, WI). The compound library was previously com-plied as described elsewhere. 27 NMR Data Collection and Sample Preparation  All NMR data was collected on a Bruker 500 MHz Avancespectrometer (Billerica, MA) equipped with a triple resonance, Z  -axis gradient cryoprobe and using a Bruker BACS-120 samplechanger and IconNMR software for automated data collection.The screening dataforthis studywascompiledover a5yeartimespan in which two di ff  erent 1D  1 H solvent suppression pulsesequenceswereusedforthemeasurement ofligand1D 1 HNMR line broadening. Data for the HSA, BSA,  S. aureus  primase CTD,PrgI, PA1324, and SAV1430 were collected as previously described. 4,15,17,25 Data for DGKA, STM1790, UFC1, YjbR, YkfF, YkvR and YtfP, the 5 amylases and  S. aureus  nucleaseproteins was collected at 298 K using 64 transients with aspectrum width of 6009 Hz with 8 K data points and a 1.0 srelaxation delay using the excitation sculpting 28 method forsolvent suppression of the residual H 2 O resonance signal.The samples for the HSA, BSA,  S. aureus  primase CTD, PrgI,PA1324, and SAV1430 NMR screens were prepared as previously described. 4,15,17 S. aureus  nuclease, DGKA, STM1790, UFC1, YjbR, YkfF, YkvR, YtfP, and the 5 amylases were screened at 5  μ M protein concentration and 100  μ M ligand concentration in ascreening bu ff  er of 2% DMSO-d 6  , 20 mM Bis-Tris-d 19  pH 7.0(uncorrected), 11.1  μ M TMSP-d4 in  “ 100% ”  D 2 O. Chemical Library  All NMR ligand a ffi nity assays were completed by screeningeach protein individually with a library of 437 biologically activecompounds (http://bionmr-c1.unl.edu/ligands). 27 The library contains amino-acids, carbohydrates, cofactors, fatty-acids, hor-mones, inhibitors, known drugs, metabolites, neurotransmitters,nucleotides,andsubstrates.Thecompoundlibraryisdividedinto116 mixtures with 3  4 ligands per mixture and is described indetail elsewhere. In order to assess the structural diversity of thelibrary, 1300 molecular descriptors were calculated for eachcompound using the online software eDragon (VCClabs,http://www.vcclab.org/lab/edragon/). 29 MM2 minimized 3DMOL2  fi les were generated using ChemBio 3D Ultra 12.0(CambridgeSoft, Cambridge, MA), converted to SMILES usingOpenBabel (http://openbabel.org) and then uploaded to theeDragon Web site. The molecular descriptors calculated for eachstructure were incorporated into a single Excel spreadsheet andimported into SIMCA (UMETRICS, Kinnelon, NJ). Eachmolecular descriptor was treated as a separate bin or data pointforeach structure.A3DPCAscoresplotwasgeneratedusingthecalculated molecular descriptors for the structures in the library.False positive and false negative rates were simulated todetermine if the screening library of 437 compounds is of su ffi cientsizetomakemeaningfulcomparisonsbetweenproteinsof unknown function. An in-house program was written thatrandomly generates a ligand binding pro fi le using a Gaussiandistribution about two means: (i) average hit rate of 32  (  44 bound ligands, or (ii) a lower hit rate of 16 (  6. Either 1   10 6 randompairsofligandbindingpro fi lesweregeneratedorasinglerandomly generated ligand binding pro fi le was compared againsta random set of 1  10 6 ligand binding pro fi les. The simulations were done in triplicate and the library sizes used in the simula-tions corresponded to 437, 1000, 2000, 5000, and 10000compounds. An E-value of   e 1    10  9  was used to de fi ne asimilar ligand binding pro fi le. A histogram of the Log(E-values) were plotted and  fi tted using EasyFit V5.4 (MathWaveTechnologies).To estimate a false negative rate, an error was introduced torandomly generated pairs of identical ligand binding pro fi les.Each ligand binding pro fi le has false binders added or true binders removed at a percentage of the rate that a true binder was added to the srcinal ligand binding pro fi le (based on thesrcinal number of predicted binders ( m  and  n ) chosen from theGaussian distribution): m e  ¼  m o  (  em o  and n e  ¼  n o  (  en o  ð 1 Þ  where e is the error rate (10  50%),  m e  and  n e  ( m e 6¼ n e ) are thenew number of bound ligands after the error rate is applied, and m o  and  n o  ( m o  =  n o ) are the srcinal number of bound ligandspredicted from the Gaussian distribution. Binding Assay Ligand binding was manually identi fi ed from a decrease in thefreeligand1D  1 HNMR signal upon theadditionof protein. Thisdecrease is determined by visually comparing ligand peak intensities to the TMSP-d 4  methyl resonance (0.00 ppm) from  2540  dx.doi.org/10.1021/pr200015d |  J. Proteome Res.  2011, 10,  2538–2545 Journal of Proteome Research ARTICLE the 11.1  μ M TMSP-d 4  internal standard. Any ligand with a visually observable decreasein peak height from the addition of aprotein is considered to be a binder. A detailed analysis of therelationship between  K  D  and NMR line-broadening has beenpreviouslydiscussedindetail. 30 Fromthisanalysis,aconservativeestimate of our limit of detection can be made, which corre-sponds to ligands with a  K  D  of >100  300  μ M. Of course, thislimit is dependent on the molecular weight of the protein, wheresensitivityincreaseswithMW.Thus,ourligandbindingassaywill be dominated by biologically relevant protein  ligand interac-tions, where nonspeci fi c or irrelevant interactions start to dom-inate as the  K  D  increases beyond 300  μ M. 31 Conversely, tight- binders (  K  D  e  nM) that are governed by slow-o ff   rates may simply result in a decrease in peak intensity proportional to thelimiting protein concentration. A 5% change in peak intensity may be di ffi cult to decipher and correspond to a false positive.Nevertheless, encountering tight binders in ligand binding assay is generally a rare event. Binders from our chemical library aretypically structural homologues to the natural ligand. Also, thesetight binders would be expected to be uniformly missed forfunctionally similar proteins. The methods for data processingandidentifyingbindingligandshavebeenpreviously discussedindetail. 25,27,30 Overall, for our library of 437 compounds, the 1D 1 H NMR line-broadening screen requires approximately a day tocomplete both the data acquisition and the data analysis. Ligand Binding Profiles  A similarity in ligand binding pro fi les was measured betweeneach pair of proteins using eq 1. Overlapping binding ligands ( S )for every protein in a pairwise manner were identi fi ed by comparing a list of all binding ligands and counting the numberof overlapping ligands. Each pairwise E-value was calculatedusingalibrarysizeof437compounds(  p 0 =1/437=0.00229).AnExcel spreadsheet program was written to match overlappingligands and measure E-values. Functional Similarity Measurement The Uniprot accession number was obtained for each proteininthestudy.ThelistofUniprotaccessionnumberswasuploadedto the semantic similarity tool FunSimMat (http://funsimmat. bioinf.mpi-inf.mpg.de/). All reported functional similarities areexpressed as a  funsim  score measured as previously described. 32 ’ RESULTS AND DISCUSSION Structural Diversity of the Screening Library Our chemical library for NMR ligand a ffi nity screening wasdesigned to maximize functional diversity. 27 In addition topractical considerations such as solubility, stability and cost,compounds were added to our library based on a known biological activity involving a distinct protein or protein class.Compounds correspond to known drugs, inhibitors, substratesor cofactors.Not surprisingly, thecompounds are also consistent with typical  “ drug-like ”  characteristics and with fragmentlibraries. 33,34 These characteristics include good aqueous solubi-lity, low molecular-weights, and low number of rings, heteroa-toms, and hydrogen-bond donors and acceptors. Diversity in biological activity was also anticipated to result in a correlateddiversityinchemicalstructure.Tovalidatethestructuraldiversity of our functional chemical library,  ∼ 1300 di ff  erent moleculardescriptors were calculated for each compound. 29  A principalcomponent analysis (PCA) of the set of molecular descriptorsindicates a uniform coverage of structural space. A 3D PCA scores plot is shown in Figure 1A. The structures are distributedthroughout the structural space de fi ned by the molecular de-scriptors. Conversely, if there was an overabundance of any structural class, distinct clustering patterns would be apparent in Figure 1.  (A) Three dimensional PCA scores plot where each point represents one compound from the functional chemical library. The placement of each point in the PCA scores plot is indicative of the unique structural identity for each compound. The contribution of each principal component islabeled on the axis. The sphere represents the 95% con fi dence limit. (B) Histogram distribution of E-values calculated from a simulation of ligand bindingpro fi les.Arandomligand-bindingpro fi lewascomparedagainstarandomsetof1  10 6 ligandbindingpro fi leusingalibraryof437compounds.Thesolidlinecorrespondstothebest fi tcurvefromtheWeibullDistribution(ExtremeValueTypeIIIDistribution)model.(C)Plotofthepercentageof false negatives as a function of error rate (10  50%) and library size (437  10000) from a simulation of ligand-binding pro fi les.  2541  dx.doi.org/10.1021/pr200015d |  J. Proteome Res.  2011, 10,  2538–2545 Journal of Proteome Research ARTICLE the 3D PCA scores plot. Clearly, our chemical library is anacceptable set of molecular probes to evaluate a diversity of protein function. Calculation of Ligand Binding Profile Similarities Measuring a signi fi cant similarity between two ligand bindingpro fi les requires the development or adaptation of a robustscoring function. Current similarity scoring methods used forsequence analysis, such as the E-value developed by Karlin and Altschul, 35 arealsowell-suitedformeasuringasimilaritybetweenligand binding pro fi les.  E  ¼  Kmne   λ S ð 2 Þ Here, the E-value is only dependent on the total number of compounds that bind each protein ( m  and  n ) and the totalnumber of compounds that bind both proteins ( S ). Additionally,theprobabilityof  fi ndingasigni fi cantsimilarityisproportionaltothe probability search space (  K  ) and scoring function (  λ ).  K   ¼ ð q   p 0 Þ 2 q  and  λ  ¼  ln  q p 0  ð 3 Þ Unlikesequencesimilarity,asimilaritybetweenligandbindingcan be thought of as a binary system (binding vs nonbinding)therefore the probabilities  p 0 and  q  simply becomes the prob-ability of   fi nding a hit within a library:  p 0 ¼  1library size  ð 4 Þ and the probability of   fi nding a ligand that binds both proteins: q  ¼  Sm  n  ð 5 Þ The standard E-value also provides a robust measure of theprobability that the ligand binding similarity is not due to chanceusing the standard P-value.  P   ¼  1  e   E ð 6 Þ  As expected, the ligand binding pro fi le E-value rapidly be-comes insigni fi cant (  P   > 0.0001) as the probability of   fi nding aligand that binds both proteins ( q ) decreases. Binding pro fi lesthat have a  P   < 0.0001 are signi fi cant at the 99.99% con fi denceinterval (  E  = 10  5 ). Thus, our method is only dependent oncomparing the total number of binding events ( m  or  n ) and theset of overlapping binding ligands ( S ) between two proteins. Sufficient Size of a Screening Library Obtaining a balance between library depth and breadth is very challenging and has been a focus of compound library design forover a decade —  without a clear consensus conclusion. 36 Clearly,the size of the library wouldbe expected to impact the number of observed binders ( m  and  n ) and the corresponding similarity inligand binding pro fi les ( S  and E-value). Fundamentally, deter-mining the optimal size of the chemical library is an open-ended,and at some level, a very di ffi cult question to adequately answer.It is always plausible for a protein to be screened that results in acompleteabsenceofbindersregardlessofthesizeorcompositionof the chemical library. If the protein is a true unknown, how is itpossible to ascertain  a priori  that the library composition isadequate? The only recourse is to explore the probability of identifying binders within a given set of reasonable assumptionsand given experimental hit rates.On average, 32 ( 44 ligands were observed to bind a proteintarget in our NMR ligand a ffi nity screen. Our simulationsindicate that even with a modest library size of 437 compounds,the probability of randomly   fi nding two similar (E-value e 1   10  9 ) ligand binding pro fi les was shown to be e ff  ectively zero.Thisisnottoosurprisingconsideringthatintheorythereare2 437 (3.5    10 131 ) di ff  erent binding pro fi les, where the product(1.3    10 263 ) leads to an e ff  ectively miniscule probability of  fi ndingtwosimilarligandbindingpro fi les.Ofcourse,onlyasmallsubsetofthesepotentialligandbindingpro fi lesarepossiblegiven32  (  44 bound ligands, but this still represents a very largenumber of dissimilar pairs of ligand binding pro fi les. A randomly selected ligand binding pro fi le using a Gaussian distribution of  bound ligands with a smaller mean (larger potential false positiverate) of 16  (  6 was compared against a random set of 1    10 6 ligand binding pro fi les using the same Gaussian distribution toselect binders. A histogram of the Log(E-values) is shown inFigure 1B and best fi tted with the Weibull Distribution (Extreme Value Type III Distribution), which indicates the calculatedE-values are signi fi cant. 37 Consequently, the comparison didnot yield any signi fi cant similarities and the most commonoccurrence was an overlap ( S ) of zero ( S  ranged from 0 to 7). While the false positive rate is e ff  ectively zero, a false negativerate was measurable and, as expected, decreased for increasinglibrary size. Again, a total of 1    10 6 pairs of   identical  ligand binding pro fi les ( m  =  n  =  S ) was randomly generated using aGaussian distribution with a mean of 16  (  6 bound ligands( m  and  n ). An error rate ranging from 10%-50% was introducedinto each ligand binding pro fi le, independently changing the twoidentical ligand binding pro fi les. The simulations were repeatedfor library sizes that ranged from 437 to 10000 compounds. Thepercentage of false negatives (E-value of >1    10  9 ) found ineach simulation are plotted as a function of library size inFigure 1C. The false negative rate increases proportional to theerror rate and decreases proportional to the library size. For ourlibrary of 437 compounds, the percentage of false negatives is ∼ 9% with a 50% error rate (see eq 1). Conversely, only a ∼ 2%false negative rate is observed for a library of 2000 compounds atthe maximum error rate of 50%. The false negative rate is below 1% for a library of 10000 compounds. Correspondingly, ligand bindingpro fi lesimilaritiesarerelativelytoleranttoerroneousbinders.This is consistent with the lack of any false negatives in the 19screens reported herein. Thus, the simulations indicate that evena modest library of 437 compounds provides a relatively robustand reliable measure of functional similarity, but a slight increasein the library size may improve the methods accuracy. Of course,increasingthelibrarysizealsoincreasesassaytime,butalibraryof 1500  2000 compounds is still practical since the assay time isonly estimated to increase to ∼ 1.5  2 days.The library size also de fi nes the minimal number of binders( m  and  n ) and overlapping binders ( S ) required for obtaining asigni fi cant E-value of 1    10  9 . For a modest library of 437compounds, the minimal number of binders and overlapping binders is 5 compounds. The number drops to 4 compounds fora library size of 1000  2000 compounds and to 2 compounds fora library size of 5000  10000. Considering the average numberof binders is 32  (  44, these are e ff  ectively inconsequentialimprovements for a substantial increase in screening time. Alternatively, false negatives in the binding assay (missed tight binders) may be potentially detrimental to proteins that bind a very limited number of ligands (<5). In principal, a single false negativemay be the di ff  erence between a signi fi cant or insigni fi cant E-value.  2542  dx.doi.org/10.1021/pr200015d |  J. Proteome Res.  2011, 10,  2538–2545 Journal of Proteome Research ARTICLE Of course, the number of binders isexpected to scalewith thelibrary size, assuming a relatively constant hit rate. 38 Correspondingly, inc-reasing the library size to 1500  2000 compounds is expected tomakeproteinsthatbindonlyfourorlessligandsarelativelyrareevent. Correlating Protein Function with Ligand Binding Profiles To experimentally support the ligand binding pro fi le hypoth-esis, 19 proteins were screened by NMR using our chemicallibrary of biologically active compounds. 27 Binding events wereidenti fi ed as previously described by measuring a decrease inligand  1 H NMR peak intensities in the presence of a protein(Figure 2a). 4,25 Thus,theligandbindingpro fi leissimplyabinary list that indicates which compounds out of the library of 437compounds were shown to bind the protein. The completesummary of results from the NMR ligand a ffi nity screen for the19 proteins can be found in Table 1S (Supporting Information).For the 19 proteins screened in the NMR ligand a ffi nity assay,13 proteins have a previously annotated function based on GOterms and6proteins haveanunknownfunction. The 19proteins were chosen to contain two sets of functionally similar proteinsmixed with a third set of functionally diverse proteins. The twosets of functionally related proteins are 2 serum albumins and 5amylases. The serum albumins and amylases were chosen because the proteins have a function related to ligand bindingand were readily available from commercial sources. The addi-tional 12 proteins are from NESG or other ongoing functionalannotation projects involving our FAST-NMR methodology. 4,14 The primary intent of these additional proteins is to provide a “ functional background ”  to test the ability of the ligand bindingpro fi letodistinguishtheserumalbuminsandamylasesfromeachother and from the remaining proteins. Will the addition of the12 functionally diverse proteins cause erroneous similarities tothe albumins or amylases that is not correlated with function? A FunSimMat functional similarity score was calculated foreach pair of proteins within the set of 19 proteins. 32 FunSimMatuses GOterms togenerateasemanticsimilarityscorethatrangesfrom 0 for no functional similarityto 1 for identical functions. Anaverage FunSimMat similarity score of 0.98 and 0.67 ( 0.04 wascalculated between the albumins and amylases, respectively. Theremaining12proteins exhibitednofunctionalrelationshiptoany other protein in the screening set, yielding an average FunSim-Mat similarity score of 0.1  (  0.1. The complete list of FunSim-Mat similarity scores can be found in Table 2S (SupportingInformation).Aweakfunctionalsimilaritywasobservedbetweenthetwoalbuminsandthehumanproteinubiquitin-foldmodi fi er-conjugating enzyme 1 (UFC1, Uniprot: Q9Y3C8). However,this similarity is limited to one overlapping and generic  “ protein binding ”  GO number (GO:0005515). An all-vs-all pairwise comparison of the 19 ligand bindingpro fi les gave a total of 171 ligand binding pro fi le comparisons withonly11comparisonsgivingasigni fi cantsimilarityscore(  P  <0.0001). The comparisons with the highest similarity scorescorresponded to the set of albumins (E-value 1    10  58 ) and Figure 2.  (A) Ligand binding is identi fi ed by a decrease in ligand peak intensity upon addition of a target protein. The 1D  1 H NMR spectrum of thenonsteroidal anti-in fl ammatory drug naproxen (I) is shown to broaden in the presence of   H. sapiens  serum albumin (HSA) (II) and  B. taurus  serumalbumin (BSA) (III) indicating a positive binding event. The NMR line broadening experiments used 100  μ M ligand and 5  μ M protein as described inthe methodssection. (B)Heat mapsummarizingthe NMRliganda ffi nityscreensfor 19proteins, where the albumins arecolored red,the amylasescyanand the remainder of the proteins gray. A binding ligand is indicated by a red line. The 437 ligands were sorted to maximize the clustering of bindingligands for the albumins and amylases. Table 1. Functionally Similar Proteins Yield Signi fi cantly Similar Ligand Binding Pro fi les a Comparison  m / n S  E-value Funsim scoreHSA-BSA 178/171 162 2.16  10  58 0.98Bam-Aor 35/36 22 6.38  10  19 0.68Bam -Hvu 35/29 14 1.17  10  10 0.63Bli- Aor 28/36 18 1.19  10  15 0.68Bli - Bam 28/35 16 1.42  10  14 0.68Bli - Hvu 28/29 9 3.86  10  06 0.63Hvu - Aor 29/36 13 2.98  10  08 0.64Iba- Aor 29/36 12 2.98  10  08 0.67Iba - Bam 29/35 15 7.56  10  12 0.63Iba - Bli 29/28 11 2.43  10  08 0.63Iba - Hvu 29/29 12 2.98  10  08 0.71 a Numberofhitsperprotein( m and n ),overlappingligands( S ),E-valuesand functional similarity scores (FunSim) are reported for signi fi cantly (99.99% con fi dence interval) similar ligand binding pro fi les from acomparison of 19 proteins, including a set of serum albumins from  H.sapiens  (HSA) and  B. taurus  (BSA) and amylases (Aor, Bam, Bli, Hvu,andIba)gavesigni fi cantsimilarity.Thesetofamylaseswascomposedof 3  R -amylases from  A. oryzae  (Aor),  B. amyloliquefaciens  (Bam),  and B.licheniformis (Bli)and2  β -amylases H.vulgare (Hvu)and  I.batatas (Iba). A complete list of binding pro fi les is reported in Supplementary Table 1(Supporting Information).
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks