Articles & News Stories

Simultaneous and coupled energy optimization of homologous proteins: a new tool for structure prediction

Description
Simultaneous and coupled energy optimization of homologous proteins: a new tool for structure prediction
Published
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Simultaneous and coupled energy optimization of homologousproteins: a new tool for structure prediction Chen Keasar 1,2 , Ron Elber 1 and Jeffrey Skolnick 3 Background: Homology-based modeling and global optimization of energy aretwo complementary approaches to prediction of protein structures. Acombination of the two approaches is proposed in which a novel component isadded to the energy and forces similarity between homologous proteins. Results: The combination was tested for two families: pancreatic hormones andhomeodomains. The simulated lowest-energy structure of the pancreatichormones is a reasonable approximation to the native fold. The lowest-energystructure of the homeodomains has 80% of the native contacts, but the helicesare not packed correctly. The fourth lowest energy structure of thehomeodomains has the correct helix packing (RMS 5.4Å and 82% of thecorrect contacts). Optimizations of a single protein of the family yieldconsiderably worse structures. Conclusions: Use of coupled homologous proteins in the search for the nativefold is more successful than the folding of a single protein in the family. Introduction This manuscript presents a feasibility study of a proposedenhancement to ab initio algorithms to fold proteins. Thepresent approach is suggested as a useful addition toalready existing protocols, an addition that can signifi-cantly improve their prediction capabilities. In the ab initio approach to protein structure prediction (the approach thatwe attempt to enhance), the conformation space issearched for the global energy minimum. The calculatedminimum energy conformation is an approximation of thenative fold. Clearly, the success of the search depends on the qualityof the energy function tested by the ability to recognizethe native fold from the rest of the conformations. Equallyimportant is the ability to rapidly examine alternativeprotein conformations within the framework of the prede-fined energy function and conformation space. Theenergy functions employed differ by the method used toderive them, their accuracy and their complexity. Somepotential energy surfaces are constructed from experimen-tal and computational data on small molecules and assignproperties for each atom in the system [1–3]. Other poten-tial energies are based on statistical analyses of knownprotein conformations and resolve the structure on thelevel of individual amino acids [4–7]. In addition to the energies, the conformational space isrepresented at different levels of accuracy and resolution.The number of possible conformations of the unfoldedprotein is very large; therefore, when a folding attempt ismade, models that give up the atomic description of thesystem and significantly reduce the number of degrees of freedom are very helpful. For example, it is possible toimplicitly represent water molecules and counter ions [8]and/or some parts of the protein molecule [9,10] in a waythat (so we hope) still captures the main features of theprotein fold and its interactions. Moreover, besides thereduction in the number of relevant coordinates, the con-formations may be restricted to a discrete (lattice) space[11–13]. The lattice makes it possible to pursue muchlarger Monte Carlo moves in comparison with similarmoves in continuous representation. Furthermore, manyof the energy calculations can be pursued and stored inadvance, leading to an additional computational gain. Anextreme and rather successful reduction in conformationalspace is used by the threading approach [14–17], whichrestricts the conformation space to structures alreadyfound in the protein structure databases. There are two principal problems that make it difficult toapply the energy-based approach to protein structure pre-diction. First, since the energy functions are approximate, Addresses: 1 Department of Physical Chemistry,Department of Biological Chemistry, Fritz HaberResearch Center for Molecular Dynamics andWolfson Center for Applied Structural Biology,Hebrew University, Givaat Ram, Jerusalem 91904,Israel. Present addresses: 2 Peptor, Limited, KiryatWeizmann, Rehovot 76326, Israel. 3 Department ofMolecular Biology, Scripps Research Institute, LaJolla, CA 92037, USA.Correspondence: Ron ElberE-mail: ron@fh.huji.ac.il Key words: homology modeling, lattice model,Monte Carlo, protein foldingReceived: 07 Apr 1997 Revisions requested: 27 May 1997 Revisions received: 06 Jun 1997 Accepted: 16 Jun 1997 Published: 17 Jul 1997 Electronic identifier: 1359-0278-002-00247Folding & Design 17 Jul 1997, 2 :247–259 © Current Biology Ltd ISSN 1359-0278 Research Paper 247  the native fold is not necessarily the global minimum of the potential energy; therefore, an incorrect structure maybe found even if the search is complete. Second, thepotential energy surfaces are typically very rough, includ-ing a broad distribution of barrier heights and well depths.This poses a significant challenge to the optimizationalgorithm. The computational effort associated withexhaustive search grows exponentially with the number of amino acids [12,18]. Nondeterministic methods such asMonte Carlo annealing [19,20] do not necessarily find theglobal minimum. Instead (as is quite common in complexsystems), they find and get ‘stuck’ in local minima. A possible strategy to bypass the multiple minimaproblem is to modify the energy function. Ideally, theenergy function should be modified in a way that makesthe energy surface smoother while still maintaining thesrcinal location of the global minimum. Smoother energysurfaces are easier to optimize. Smoothing approachesinclude smoothing based on the Schrödinger equation[21], the diffusion equation method [22], the Liouvilleequation approach [23], imaginary time Schrödinger equa-tion [24,25], and the locally enhanced sampling (LES)method [26] (for a review, see [27]).The algorithm proposed below also belongs to the generalclass of smoothing algorithms; however, it providessmoothing by adding information on homologous proteins.This should be contrasted with the previous protocols,which smooth the energy function by removing sharp fea-tures from the potential surface and are therefore based onfiltering out some of the initial data.An alternative approach to protein structure prediction notbased on energy optimization lies in the observation thatproteins with similar sequences (homologous proteins)have similar native folds. The relationship betweensequence and structural similarities can be statisticallyquantified based on known structures and sequences [28].This observation allows structure prediction of proteinsbased on experimentally determined coordinates of homol-ogous proteins [29]. Even in the absence of known struc-tures, the expected structural similarity of homologousproteins can facilitate the prediction of structural features.Predictions of secondary structure [30–33], solvent accessi-bility [32,34], and the topology of membrane proteins [35]based on multiple sequences are more reliable than predic-tions based on only one sequence. Knowledge based onmultiple sequence alignment was also shown recently toimprove threading efficiency [36,37]. Here, we extend thisidea to protein structure prediction by energy optimization. The main concept pursued in this manuscript is the aver-aging over sequences at given or similar conformations.Homologous proteins may have numerous compensatingmutations at the native fold. The mutations must becompensating because there is only one native configura-tion that is shared by the homologous proteins (by virtueof experimental observations). However, there are manyunfolded states for the protein family and not allunfolded conformations are expected to be affected inthe same way by the mutations. In contrast to the nativefold, there are many structures to choose from.We conjecture that the energy changes induced by muta-tions on unfolded structures will be random and almostindependent of the specific structure of the unfolded con-figuration. Furthermore, since homologous proteins havesimilar structures, we can safely assume that the sequencevariation at the native fold systematically yields low ener-gies. Hence, by adding the energy surfaces of homologousproteins, the energy surface can be distorted so that theglobal energy minimum is deeper, while the rest of thesurface (assuming random variation at the unfolded state)is similar in shape (on average) to what we started with(Figure1).The algorithmic realization of the above idea is the simul-taneous optimization of homologous proteins, whileforcing them to look alike. Hence, the experimentalobservation of structural similarity between homologousproteins is directly embedded into the optimization proto-col. The optimization protocol employed is a Monte Carloannealing on a lattice [13], whereby the structural similar-ity of homologous proteins is used to effectively modifythe energy function by optimizing the energies of severalhomologous proteins in a coupled parallel way. An addi-tional energy term penalizing the structural differencebetween the proteins is employed. We thereby force thedifferent proteins to have similar conformations at eachinstance of time during the parallel simulations. In this paper, we demonstrate that there are two mainadvantages to the simultaneous optimization of a wholeprotein family. The first, as mentioned above, is the elimi-nation of local minima not shared by the entire family. Inthe Results section, we describe the lowest-energyminima found for a pancreatic peptide and homeodomainfragment using a statistical potential [13]. The lowest-energy minima we found for the two proteins correspondto misfolded conformations. Nevertheless, considerablybetter structures (but not better energies) were foundwhen coupling was introduced. Hence, the couplingbetween the different family members prevents thesimultaneous runs from adopting the wrong conformationthat is (nevertheless) of the lowest energy. In this case, therequirement for unanimous ‘vote’ of the homologous pro-teins fixes an inaccurate energy function. Another feature of the coupling is the smoothing of theenergy surface, making it more accessible to stochasticoptimization. This is similar to the LES protocol [26] and 248 Folding & Design Vol 2 No 4  the diffusion equation [22], in which multiple copies of the same protein are optimized simultaneously. The useof multiple copies results in an effective energy functionwith lower barriers [26]. In this case, the smoothing is overcoordinate space. This effect is demonstrated separatelyin the Results section.The idea proposed in this work is related to other proto-cols that use sequence ‘averaging’ [30–33,35–37]. Thepresent technique differs in adding the energy optimiza-tion and in allowing some structural diversity when thesequence averaging is performed. The use of distributionsof sequences and structures is essential in making betterpredictions with energy-based methods. For example,using the same protein conformation and averaging overall sequences is equivalent to an optimization with a verysevere penalty on the diversity of protein structures. Inour experience, such a penalty results in significant energybarriers (alternative folding pathways of different proteinsare not allowed) and the slow down of the folding kinetics.We found it necessary to allow structural variations withinthe family if ab initio folding is attempted. Perhaps alter-native approaches to protein folding (e.g. threading [17])are less sensitive to the presence of barriers. This is not,however, what we observed for Monte Carlo folding.The feasibility of this idea was demonstrated recently on asimple model system: structural optimization of two-dimen-sional heteropolymers on a square lattice (2DHP) [38].Using four types of monomers and polymers of 14 units inlength, an attempt was made to ‘fold’ homologous sets of heteropolymers. The advantages of using this limitedmodel (as compared to proteins) are obvious since an exactenumeration of all the polymer states is possible. We haveunambiguously demonstrated the expected properties of the coupled runs, as discussed above. Nevertheless, thelimitations of the above protocol are also quite clear, sincetwo-dimensional heteropolymers do not share many of thecomplex properties of real proteins. The energy surface of proteins is rougher and the number of protein conforma-tions is significantly larger than in our simple model. It isconceivable that the additional complexity of real proteinsposes such a huge problem that the anticipated enhance-ment of ab initio folding algorithms will be too small andimpossible to detect. We therefore pursue here another fea-sibility study. This time the investigation is on two proteinfamilies for which we provide a detailed analysis.At present, our ability to test and apply the scheme of folding homologous proteins is restricted by two obstacles.The first is that ab initio folding protocols (such as the onewe have examined) are limited to relatively simple folds;therefore, our enhancing protocol is limited in the same way.The present scheme which aims at improving existing algo-rithms is likely to fail if the starting point is too far off. Thislimits our choices to the study of small proteins. The secondlimiting factor is the requirement of a diverse set of sequences to obtain an effective smoothing in sequencespace. Clearly, the present investigation of two protein fami-lies is insufficient to suggest the coupling idea as a generalmethod to fold proteins. Nevertheless, we consider thepresent data sufficiently encouraging to promote furtherstudy and implementation in other ab initio folding schemes.The method we attempt to improve is the lattice-basedMonte Carlo simulations of proteins developed by Skol-nick and co-workers [9,10]. The energy function and thealgorithm made it possible to predict the native folds andfolding pathways of several proteins [9,13,39,40]. Thus,this work was done within the scope of a given algorithmand within a specific energy function for protein folding. Results Pancreatic hormones The sequences of seven homologous proteins were used forthe prediction of the structure of the pancreatic peptides(Table1a). One of them, avian pancreatic peptide fromturkey, has a known structure. 100 uncoupled simulated Research Paper Simultaneous optimization of homologous proteins Keasar et al. 249 Figure 1 A schematic drawing of model one-dimensional energy surfaces of twohomologous proteins: protein A (thin line) andprotein B (dashed line). They share the sameglobal minimum (the native fold), but have alow correlation between the energies of theunfolded conformations. The average energysurface (thick line) is smoother and with adeeper global minimum for the native fold. Inthe simulation, we used the sum of thepotentials of the homologous proteins, U  sum  ,rather than the average, U  avg  . However, bothpotentials are related ( U  sum  = N  ⋅ U  avg   —where N  is the number of homologous proteins) in away that does not affect the optimization.    E  n  e  r  g  y Conformation  annealing (SA) runs are presented as set1 in Table2a.These are standard simulations using the lattice MonteCarlo program without the coupling. These calculationswere used in the avian pancreatic peptide sequence andserved the purpose of comparison. The comparison of the standard simulation runs is to 142coupled SA calculations of all the seven homologous pro-teins (set2 in Table2a). At the beginning of each SA run,all seven proteins had the same random conformation. Inthe early part of the simulation, the structures deviated 250 Folding & Design Vol 2 No 4 Table 1Proteins used in the current work. PDB IDSwiss-Prot entrySequence (a) Pancreatic hormones 1ppt [52]paho_chick [53]  GPSQPTYPGDDAPVEDLIRFYDNLQQYLNVVTRHRY paho_rante [54]  APSEPHHPGDQATQDQLAQYYSDLYQYITFVTRPRF pyy_myosc [55]  YPPQPESPGGNASPEDWAKYHAAVRHYVNLITRQRY neuy_carau [56]  YPTKPDNPGEGAPAEELAKYYSALRHYINLITRQRY paho_rat [57]  YPTKPDNPGEGAPAEELAKYYSALRHYINLITRQRY paho_erieu [58]  VPLEPVYPGDNATPEQMAHYAAELRRYINMLTRPRY pyy_pig [59]  YPAKPEAPGEDASPEELSRYYASLRHYLNLVTRQRY (b) Homeodomains 1enh [60]hman_drovi [62]  RPRTAFSSEQLARLKREFNENRYLTERRRQQLSSELGLNEAQIKIWFQNKRAKI hmn2_drome [63]  KRRVLFTKAQTYELERRFRQQRYLSAPEREHLASLIRLTPTQVKIWFQNHRYKT pho2_yeast [64]  PKRTRAKGEALDVLKRKFEINPTPSLVERKKISDLIGMPEKNVRIWFQNRRAKL 1hdp [61]oct2_human [65]  KKRTSIETNVRFALEKSFLANQKPTSEEILLIAEQLHMEKEVIRVWFCNRRQKE mec3_caebr [66]  GLRTTIKQNQLDVLQEMFSNTPKPSKHRRAKLALETGLSMRVIQVWFQNRRSKE may3_schco [67]  KPRPKFHSEYTPLLELYFHFNAYPTFADRRMLAEKTGMQTRQITVWFQNHRRRA hmgc_mouse [68]  RHRTIFTDEQLEALENLFQETKYPDVGTREQLARKVHLREEKVEVWFKNRRAKW hmb3_arath [69]  EKKKRLNLEQVRALEKSFELGNKLEPERKMQLAKALGLQPRQIAIWFQNRRARW gsbp_drome [70]  RSRTTFTAEQLEALEGAFSRTQYPDVYTREELAQTTALTEARIQVWFSNRRARL Aligned amino acid sequences of two sets of homologous proteins: pancreatic hormones and homeodomains. Each sequence is identified by itsSwiss-Prot entry and PDB ID if one exists. Table 2Simulation sets. SetProteinsNo. ofLength of simulationsCouplingsimulationsin cycles (a) Pancreatic hormones 1paho_chick10098,000 OFF2paho_chick14214,000ONpaho_rante14214,000ONpyy_myosc14214,000ONneuy_carau14214,000ONpaho_rat14214,000ONpaho_erieu14214,000ONpyy_pig14214,000ON37  × paho_chick10014,000ON (b) Homeodomains 1hman _drovi100126,000 OFF2hman _drovi10014,000ONhmn2_drome 10014,000ONpho2_yeast 10014,000ONoct2_human 10014,000ONmec3_caebr 10014,000ONmay3_schco 10014,000ONhmgc_mouse 10014,000ONhmb3_arath 10014,000ONgsbp_drome 10014,000ONThe simulation sets are series of runs starting from random conformations with the same energy and Monte Carlo parameters.  considerably (RMS deviation >8Å), but as the tempera-ture of the proteins decreased, the deviation was reducedand reached 3–4Å. This indicates a reasonably strongeffect of the coupling. The final configurations of thecoupled runs are presented in Figure2. To equate thelengths of the computations, each of the uncoupled trajec-tories is seven times longer than each of the coupled runs(Table2).As is evident in Figure2, the coupled simulations have ingeneral a higher tendency towards native-like conforma-tions when compared to the regular simulations. The leftside of the graph indicating lower RMS or  L values isenriched in the coupled results. The lowest-energy con-formations of the two simulation sets are shown inFigure3. While the conformation of the standard uncou-pled simulation (Figure3a) deviates considerably from thenative fold (RMS=9Å;  L =0.32), the lowest-energy con-formation of the coupled simulations (Figure3b) is a rea-sonable approximation to the native fold (RMS=5.8Å;  L =0.26). The helix is well reproduced, and even thesidearm has the correct shape for most of its part. TheRMS value improves to 5Å if the N terminus is removed(i.e. considering the RMS only for residues 4–36). Similarremoval of the N terminus for the lowest-energy confor-mation of the standard run provides an RMS of 8.6Å. Research Paper Simultaneous optimization of homologous proteins Keasar et al. 251 Figure 2 0.10.20.30.40.50.6Contact map deviation (L)0.020.040.060.0    F  r  e  q  u  e  n  c  y   (   %   ) 0.00.20.40.60.81.0Contact map deviation (L)    E  n  e  r  g  y   (  a  r   b   i   t  r  a  r  y  u  n   i   t  s   ) 3.06.09.012.015.0RMS (Å) –160.0 –140.0 –120.0 –100.0 –80.0 –60.0 –160.0 –140.0 –120.0 –100.0 –80.0 –60.0    E  n  e  r  g  y   (  a  r   b   i   t  r  a  r  y  u  n   i   t  s   ) (a) 2.04.06.08.010.012.0RMS (Å)0.010.020.030.040.050.0    F  r  e  q  u  e  n  c  y   (   %   ) (c)(b)(d) Comparison of two sets of simulations of the pancreatic hormones:single protein simulations (uncoupled) and simultaneous simulations ofhomologous proteins that are forced to look alike (coupled). Each ofthe uncoupled simulations is represented by the deviation of the finalconformation from the native fold of paho_chick (1ppt) and its energy.In the coupled simulations in which the seven proteins are forced tolook alike, each simulation ends with seven conformations and sevenenergies. The highest among the minimized energies of the final sevencoupled structures is used as the energy of the protein family at theend of the run. The quality of the structure prediction for the family is judged by considering the average coordinates of the seven coupledproteins at the end of the run. The (a) C α RMS and (b) contact mapdeviations of the final conformations from different annealing runs areplotted against the energy. The distribution of (c) RMS and (d) contactmap deviations among the 20% lowest-energy structures. Uncoupledsimulations (crosses in (a,b) and unfilled bars in (c,d)), set1 (inTable2a); and coupled simulations of homologous proteins (filledcircles in (a,b) and filled black bars in (c,d)), set2 (in Table2a).
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks