Word Search

BODIL: a molecular modeling environment for structure-function analysis and drug design

Description
BODIL: a molecular modeling environment for structure-function analysis and drug design
Categories
Published
of 19
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  BODIL: a molecular modeling environment for structure-function analysisand drug design Jukka V. Lehtonen a , Dan-Johan Still a , Ville-V. Rantanen a ; b , Jan Ekholm a , Dag Bjo ¨rk-lund a , Zuhair Iftikhar a , Mikko Huhtala a , Susanna Repo a , Antti Jussila b , Jussi Jaakkola b ,Olli Pentika ¨inen a , Tommi Nyro ¨nen c , Tiina Salminen a , Mats Gyllenberg b ; d and Mark S.Johnson a ;  a Department of Biochemistry and Pharmacy, A˚ bo Akademi University, Tykisto ¨ katu 6A, FIN-20520 Turku,Finland;  b Department of Mathematics, University of Turku, Matematiikan laitos, FIN-20014 Turun yliopisto,Finland;  c CSC, the Finnish IT Center for Science, P.O. Box 405, FIN-02101 Espoo, Finland;  d Currentaddress: Rolf Nevanlinna Institute, Department of Mathematics and Statistics, FIN-00014 University of Helsinki, Finland  Received 20 April 2004; accepted in revised form 10 September 2004 Key words:  density docking, molecular visualization, sequence comparisons, structure comparison, struc-ture modeling Summary BODIL is a molecular modeling environment geared to help the user to quickly identify key features of proteins critical to molecular recognition, especially (1) in drug discovery applications, and (2) to under-stand the structural basis for function. The program incorporates state-of-the-art graphics, sequence andstructural alignment methods, among other capabilities needed in modern structure–function–drug targetresearch. BODIL has a flexible design that allows on-the-fly incorporation of new modules, has intelligentmemory management, and fast multi-view graphics. A beta version of BODIL and an accompanyingtutorial are available at  http://www.abo.fi/fak/mnf/bkf/research/johnson/bodil.html Introduction As the biological sciences enter what has beenreferred to as the ‘post-genomic’ era, where the fo-cus has shifted to the comparison of genomes anddetailed investigation of the encoded proteins, thecomparison and analysis of sequences and three-dimensionalstructureshavebecomeroutineaspectsin molecular biology. Very often, direct visualiza-tion of molecular structures and their relationshipto the linear sequence and to ligands they bind isnecessary to interpret and understand the detailedbiological functions of proteins,often revealed onlyindirectly by wet-lab experimentation.Five years ago we began to develop a graphicalinterface for our own programs and to use thisgraphical interface to ease the development anduse of novel software. The result today is the BodilMolecular Modeling Environment, which providesflexible and convenient integration of proteincomparison and modeling tools coupled with high-quality molecular graphics. In Bodil we sought todo basic tasks (e.g., alignments, display, high-similarity structure modeling) well, while fullyrealizing that it was impossible to accomplish all of the tasks that are available in commercial pro-grams. We aimed to make a straightforward userinterface for some of the tasks that are difficult toachieve in commercial and other academic pack-ages – most often tasks related to data manipula-tion or access to frequently used procedures.  To whom correspondence should be addressed. Fax: +358-2-215-3280; E-mail: johnson4@abo.fi Journal of Computer-Aided Molecular Design  18:  401–419, 2004.   2004  Kluwer Academic Publishers. Printed in the Netherlands. 401  Furthermore, we wanted Bodil to be written insuch a way that it would be easy to introduce newmodules and to modify and enhance the programas needs change. Our ultimate goal was to producea quality set of tools useful for protein structure– function analysis and applicable to ligand design,and whose features could evolve with futuredesires.Bodil consists of a core program and a set of modules that perform different tasks. These tasksinclude reading/writing sequence and structurefiles, making multiple sequence alignments, align-ments of three-dimensional structures, graphicaldisplay of structures, estimating the coordinates of a protein structure, ‘protein modeling’, and so on.The core of Bodil provides for common datastorage and management of plug-in programs.The plug-ins present data in different ways,e.g., alignments, structure, surface features, rela-tionships, etc. The change of the common datain one plug-in notifies the other plug-ins via thecore program. Thus, one can highlight sequenceidentities, differences, amino acid properties (e.g.,hydrophobicity, polarity, size, etc.), motifs inthe sequence alignment, and their location onthe three-dimensional structure is immediatelyshown. Likewise, interesting structural featuresare linked back to the residues in the sequencealignment.The graphical structure view editor makes iteasy to mock-up a complicated view of proteinsand any bound ligand molecules, where, forexample, different parts of the structures can bedisplayed simultaneously as opaque and trans-parent surfaces, the secondary structure asribbons, and any portion of the structure as ball-and-sticks, CPK or as wire frameworks. Bodil canread in density grids specifying the ideal locationfor binding chemical groups [1], chemical probegrid maps from AutoDock [2], GRID [3] andelectron density from X-ray crystallography orcryo-electron microscopy (cryo-EM). We haveused a plug-in devised by us within Bodil to dockX-ray coordinates into low-resolution electrondensity obtained from cryo-EM. Grid density datacan be displayed as contours, ranges or as iso-en-ergy values, colored appropriately.The current generation of mid-priced graphicscards for desktop machines can display and rotatesurfaceswithouttheannoyingdelayseenpreviouslyeven on older, high-priced graphics workstations.At the present time, a fast personal computer run-ning under Linux is an ideal solution, but the pro-gram is nearly fully functional under the MicrosoftWindows operating system, too. Bodil can usehardwarestereomode,forwhichgraphicscardsandX-windows support is available for the Linuxoperating system. Methods Bodil – design strategy The initial design goal of the program Bodil was tocreate a software package that would visualizemolecules with high quality graphics even on Intel-based PC computers, which would present the userwith a simple and intuitive, yet powerful, graphicaluser interface, and that should be easy to expandby adding new functionality. The main design taskwas to choose an appropriate data structure forthe biochemical data and to develop an efficientmethod for incorporating useful algorithms intothe program. The result is a modular design, wherethe main executable contains only the commondata and the algorithms are encapsulated inmodules that are physically separate, dynamicallyloaded libraries. This produces independent mod-ules, which only require access to the commondata. New functionality can be added to the pro-gram simply by adding modules; the existing pro-gram does not need to be changed. Similarly,changing a module requires only the recompilationof that single library. There is also a run-timeperformance benefit from this modular design:modules are loaded into memory only if theyare used and unloaded after use to release thememory.In Figure 1 we show the program componentsand the modules currently implemented. Themodules have been grouped by their purpose:visualization tools, computational algorithms,parsers, and utilitarian procedures. The visualiza-tion modules employ different techniques to showthe data graphically and allow interactive modifi-cation of the data. The parser modules convertdata between the program’s internal representa-tion and external file formats. The set of algo-rithms includes computational procedures both toassist visualization and to analyze proteins andmolecular interactions. The utility modules402  provide helper functions, such as the selection of the files for data import. The data structure Biochemical data contain both clearly definedphysical entities and the relationships betweenthose entities. Some entities, like molecules, areclearly composed of smaller parts – the atoms.These higher-level abstractions of molecules areusable alone; a protein sequence describes the one-dimensional properties of a protein and onlyimplies the existence of an atomic structure. Thus,it is a sufficient representation of the protein forsequence comparison purposes. A chemical bondis an example of a stronger dependency, since itmust explicitly refer to defined atoms. It is ratherstraightforward to use an object oriented datastructure to represent such data in the form of anobject tree. A composite design pattern [4]describes a way to construct an object tree, andeach molecular entity can be modeled by either of the two basic types – leaf and composite – definedby the pattern. The leaves are always terminalnodes in the tree and the composites can beinternal nodes. An internal node represents a Figure 1.  Program components. The program is divided into the core binary, which holds the data and module libraries that containthe computational algorithms and the user interface. The modules are grouped by their task. Some of the algorithms are implementedwithin the visualization modules, and some computational modules do have their own user interface for selecting operating param-eters. The density grids are three-dimensional arrays of discrete spatial data, which can be, for example, electron densities, electrostaticpotentials, or a force field. Besides molecules and density grids, the program also handles alignments and arbitrary geometric objects. 403  larger structure that is composed from smallerobjects (leaves). Thus, an operation performed ona composite node does not change that nodedirectly. Instead, the operation is automaticallyperformed on every node directly under the com-posite node in the tree (the child nodes). Since achild node can be a composite node, the resultingrecursion asserts that the operation is performedfor each leaf node within the subtree, rooted at thecomposite node on which the operation was initi-ated. Thus, the state of the composite node ischanged indirectly, as the state is a composition of the states of the leaf nodes. In addition, compositebiochemical entities, such as a protein chain, canhave some state variables (color, name, etc.)independent of the state of the child nodes.Therefore, the composite objects in our imple-mentation are more complex than the compositedesign pattern [4] requires.In our implementation we represent biochemi-cal entities as objects and store all objects in ahierarchical tree structure (Figure 2). The levels of hierarchy are a convention that is not required bythe data structure nor by the algorithms, but itsimplifies both the implementation and the use of the tree. While Figure 2 shows only an example of the main hierarchy, the tree also contains rela-tionships between physical entities, for examplebonds and alignments, as well as other objectssuch as grid maps, surfaces, and arbitrary geo-metrical shapes. A grid is an array that containsdiscrete values sampled from a three-dimensionalvolume. Electron densities, force fields and spatialprobabilities are types of data typically stored as adiscrete grid. The density within a three-dimen-sional volume is visualized by an iso-surface,similarly to the way a contour curve indicates aspecific height in a two-dimensional map. Thecoordinates of iso-surface points are computedfrom grid points by interpolation of density values.The surface points can be used to draw triangles,which approximate the iso-surface. We store theset of triangles as a separate surface object, whichis a child of the grid object.A bond between two atoms or an alignmentbetween two or more protein chains is an object,although it represents a less tangible relationshipbetween the entities. A normal object, for examplean amino acid, is a branch node in a tree: a childof a chain and the parent of the atoms. A bond,however, does not fit into the tree as easily: thebond must connect two atoms without beingeither the parent or the child, since the presence of more than one bond per atom would violate thetree structure where each child can only have asingle parent. Therefore, it is clear that a treestructure (Figure 3a) cannot be used to representsuch relationships. Instead, we have used adirected acyclic graph (DAG) to describe both themolecular entities and their relationships (Fig-ure 3b). This graph includes a tree, which is onlya special form or subset of a DAG where eachnode has only one incoming edge; a tree doeshave the advantage of allowing more simplifiedrecursive operations than a generic DAG. Con-sequently, we identify the tree within the DAG,marking the edges in the data structure to be of either a ‘tree’ or a ‘relationship’ type (Figure 3b).The edges belonging to the tree are stored sepa-rately from the relationship edges. Thus, we canaccess the data objects efficiently with tree algo-rithms, but the graph can still be traversedexplicitly using the relationship edges whenneeded. Figure 2.  Hierarchy of molecular data. On the left we list major levels of the hierarchy; on the right is an example tree organizedaccording to the hierarchy. 404  A set of properties is implemented for eachobject type. Such properties include name, color,position, and selection. The position of an atom – essentially a point – contains two components: theinitial  xyz  coordinates and the transformation, i.e.the rotations and translations that have so far beenapplied to that atom. The position of objects thatare not leaves is either based on the position of theirleafobjects,forexampleanaminoacidresiduehasapositionthatisequaltothepositionoftheC a -atom,ortheobjectdoesnothaveanimplicitposition,asisthe case of a residue without defined atoms.The most important tool for interactive datamanipulation is the selection property, which hasthree states: ‘selected’, ‘partial selection’, and‘unselected’. These correspond to all, some, andnoneoftheleavesofanodeinthetreetobeselected.Most algorithms and operations that manipulatedata objects operate exclusively on selected objects.The ‘partial selection’ property enables the efficientsearch of the data tree to locate selected objects.Thus,theusercanselectasetofmoleculesusinganyof the graphical tools and then the computationalalgorithms will operate on that set of molecules.The selected and partially selected objects arehighlighted with green and dark green colors,respectively,inordertomakeiteasierfortheusertoidentify the selected objects from graphical repre-sentations, such as from a list of objects, from analignment,orfromthethree-dimensionaldisplayof the structure of the molecule. For example, theselection of a residue in the alignment will alsoimmediately update the graphical view in order tohighlight the position of the selected residue withinthe protein structure (Figure 4). Implementation The program has been implemented using theC++ programming language, although somesubroutines in the modules have been obtainedfrom existing C programs and have not beenconverted to C++: program components writtenwith C can be directly called by the C++ mainprogram. Despite the high level of abstraction,which allows the program design to closelyresemble the modeled (biochemical) problem, theexecutable program produced from the C++source code is efficient. There are also several well-designed C++ libraries available for all desiredplatforms. Consequently, the application devel-oper can focus on the biochemical problem byusing predefined program components from a li-brary rather than wasting a large amount of efforton platform-specific implementation issues.Thehighqualitythree-dimensionalgraphicsusesstandardized OpenGL [5]. There are OpenGLimplementations for all common operating systemsandgraphicshardware.Thegraphicaluserinterfacewas implemented using the Qt-library [6] because itprovided a well-defined, multi-platform graphicalframework.TheQt-libraryisaversatileframework,now supports the use of OpenGL graphics, and isavailable for platforms based on X11 (UNIX, Li-nux), Microsoft Windows, and Mac OS X. Allgraphicaluserinterfaceobjects–dialogsandmenus – have been implemented using Qt. However, wehave tried to minimize the Qt-dependencies of themaindatastructureinterface,sincethedevelopmentof computational modules should be possible evenin the absence of the Qt-library.We began the development of Bodil on the SGIIRIX operating system, since at that time it was thedominant graphics workstation platform used forstructural studies. Already then Linux-based PCmachines were considered an important targetplatform, both due to the price of the hardware andsoftware and because of the potential for futuredevelopment. The graphics capability of the PCmachines has now reached an impressive level with (a) (b) Figure 3.  (a) Directed acyclic graph representing a water molecule. (b) Modified graph that contains a proper tree as a subgraph. Theedges that do not belong to the tree are drawn with dotted lines. 405
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks