Concepts & Trends

Web-Based Workflow Planning Platform Supporting the Design and Execution of Complex Multiscale Cancer Models

Description
Web-Based Workflow Planning Platform Supporting the Design and Execution of Complex Multiscale Cancer Models
Published
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  824 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014 Web-Based Workflow Planning Platform Supportingthe Design and Execution of Complex MultiscaleCancer Models Vangelis Sakkalis, Stelios Sfakianakis, Eleftheria Tzamali, Kostas Marias  , Member, IEEE  ,Georgios Stamatakos  , Member, IEEE  , Fay Misichroni, Eleftherios Ouzounoglou  , Student Member, IEEE  ,Eleni Kolokotroni, Dimitra Dionysiou, David Johnson, Steve McKeever, and Norbert Graf   Abstract —Significant Virtual Physiological Human efforts andprojects have been concerned with cancer modeling, especially intheEuropeanCommissionSeventhFrameworkresearchprogram,with the ambitious goal to approach personalized cancer simu-lation based on patient-specific data and thereby optimize ther-apy decisions in the clinical setting. However, building realistic in silico  predictive models targeting the clinical practice requiresinteractive, synergetic approaches to integrate the currently frag-mented efforts emanating from the systems biology and compu-tational oncology communities all around the globe. To furtherthis goal, we propose an intelligent graphical workflow planningsystem that exploits the multiscale and modular nature of cancerand allows building complex cancer models by intuitively link-ing/interchanging highly specialized models. The system adoptsand extends current standardization efforts, key tools, and infras-tructure in view of building a pool of reliable and reproduciblemodelscapableofimprovingcurrenttherapiesanddemonstratingthe potential for clinical translation of these technologies.  Index Terms —Cancersystemsbiology,clinicaltranslation,com-putational oncology, personalized medicine, scientific workflows. I. I NTRODUCTION T HE extreme complexity of the natural phenomenon of cancer in conjunction with the prevalence of the disease Manuscript received April 30, 2013; revised September 4, 2013 and Novem-ber15,2013;acceptedDecember20,2013.DateofpublicationJanuary2,2014;date of current version May 1, 2014. This work was supported in part by theEuropeanCommissionundertheTransatlanticTumorModelRepositories-TU-MOR (FP7-ICT-2009.5.4-247754) and the Computational Horizons In Cancer- CHIC (FP7-ICT-2011.5.2-600841) projects.V. Sakkalis, S. Sfakianakis, E. Tzamali, and K. Marias are with the Insti-tute of Computer Science, Foundation for Research & Technology—Hellas,GR-70013 Heraklion, Greece (e-mail: sakkalis@ics.forth.gr; ssfak@ics.forth.gr; tzamali@ics.forth.gr; kmarias@ics.forth.gr).G. Stamatakos, F. Misichroni, E. Ouzounoglou, E. Kolokotroni, and D.Dionysiou are with the Institute of Communication and Computer Systems,School of Electrical and Computer Engineering, National Technical Univer-sity of Athens, GR-15780 Athens, Greece (e-mail: gestam@central.ntua.gr;faymisi@central.ntua.gr; elouzou@central.ntua.gr; ekolok@central.ntua.gr;dimdio@esd.ece.ntua.gr).D. Johnson is with the Department of Computing, Imperial College London,London, SW7 2AZ, U.K. (e-mail: david.johnson@imperial.ac.uk).S. McKeever is with the Department of Informatics and Media, Uppsala Uni-versity, 75120 Uppsala, Sweden (e-mail: steve.mckeever@im.uu.se).N. Graf is with the Department of Pediatric Hematology and On-cology, Saarland University Hospital, 66421 Homburg, Germany (e-mail:Norbert.Graf@uniklinikum-saarland.de).Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.Digital Object Identifier 10.1109/JBHI.2013.2297167 have dictated the development of highly demanding mathemat-ical and computational cancer models aiming at optimizing theindividualized clinical decisions. Already a great diversity of cancer related models exist, focusing on various aspects of thiscomplex phenomenon at different levels [1]. In the past decade,it has become evident that multiscale methods need to be ap-plied to cancer modeling. This is to address the various phasesand scales using several levels of biocomplexity [2].In general, two strategies to model the multiscale cancer phe-nomenon may be identified. The  bottom-up  approach that fol-lows an inductive synthesis tactic when trying to predict thetumor growth by focusing on linking together the elementarybiological components of the underlying mechanisms and the top-down  deductive decomposition design that phenotypicallymodels the whole system without specifying in great detail thelower scales in terms of biocomplexity, e.g., molecular scale.Obviously, the second approach is much easier to manipulateand much closer to clinical translation.In the computational oncology domain, microscopic modelsattempt to describe the individual cell dynamics focusing on thesubcellular and cellular levels. On the other hand, the macro-scopic models focus on tissue-level and assume that the solidtumor behavior can be predicted by simulating the behavior of a group of cells and their global interaction with the surround-ing and underlying tissue properties [3]–[6]. In order to produceaccurateandreliablemodelsbothapproachesareequallyimpor-tant.Inotherwords,oneshouldbeabletofinetunemacroscopicmodels using microscopic meaningful parameters.From the mathematical point of view, such approaches to ad-dress the multifaceted cancer phenomenon may be grouped intothree main categories; the continuous and discrete methods, aswell as the hybrid approaches [7]–[9]. Continuous approachesdescribe both cancer cell populations and their microenviron-ment (such as nutrients or signaling cues) using continuousvariables formulating a system of partial differential equations,whereas discrete approaches describe cells as discrete elementsthat can change states and evolve in discretized time based onthe changing dynamics (ruled by deterministic or probabilisticlaws), i.e., cellular automaton models [9] and agent-based mod-els [10]. Hybrid approaches combine the benefits of continuousand discrete mathematics and offer the possibility of integratingphenomena of different time and length scales (from the tissuescale, for example, modeling neovascularization, to intracellu-lar processes such as cell signaling and progression through 2168-2194 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.  SAKKALIS  et al. : WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 825 the cell cycle). These models describe cancer cells as discretevariables and the tumor microenvironment using continuous,reaction–diffusion equations as opposed to the typical discretemodels.Digging deeper into the mathematical foundations of mostclinically oriented continuous models one has to deal with dif-ferent numerical methods for approximating the solutions toPDEs (e.g., finite differences or finite elements) that involvedifferent assumptions and convergence rates [11].It is obvious from the above that there is no single gold stan-dard or all-encompassing model that achieves the best possibleresults in all heterogeneous cancer types under study. Whatis most critical to the success of computational oncology, andmore specifically to the success of   in silico  systems modeling,is to promote the interaction and collaboration among model-ers, experimentalists, clinicians, and other specialists so as todevelop advanced multicompartmental models of cancer devel-opment and response to treatment. Efforts on an internationalandevenintercontinentallevelhavealreadystartedinthecourseof the TUMOR project as a proof of concept and the results arepromising [12].The systems biology community has been particularly ac-tive in standardizing the way to formulate, store, exchange,and integrate biological models with growing number of com-munity driven initiatives [13] to harmonize the developmentof the various standards and formats in systems biology, e.g.,COMBINE [14]. However, there has not yet been any formalstandardization efforts specifically tailored to the cancer model-ing specific needs, aside from the TumorML language [15] thathas been delivered out of the TUMOR project.In addition, there are still important problems when dealingwith scale and model linking. In order to translate models inthe clinical setting as decision support tools, we should migratefromsystemsbiologymodelstoclinicallydrivenmodelsthataremotivated by actual clinical problems and questions. Also, it isnecessary to involve large and diverse communities of scientistsclosely collaborating with clinicians in the model developmentand validation process.In this paper, we present a web-based scientific workflowplanning platform (see Section IV) designed to support the de-velopment of complex multiscale cancer models aiming towardengagingthewidecancermodelingaudience(modelers,compu-tational biologists, and clinicians) and encouraging scientists tocollaborate constructively. The underlying foundation involvesa dedicated model repository, parsing SBML, and TumorMLinformation, as well as executing both SBML and proprietarymodels.II. M ODEL  D ESCRIPTION  S TANDARDS To build the envisioned workflow environment, we had to se-lect existing standards wherever possible and design new onestocover missingdomains.The idea istofacilitatemodel linkingwith no extra effort to port existing models to a new framework,or reimplementing them, both costly and error prone activities.Hence, the need to fuse disparate models together, in the pre-sentedplatform,isaddressedusingtheSystemsBiologyMarkupLanguage (SBML) to model the biochemical processes at themolecular scales, whereas the higher and more clinically rele-vant scales, specific to cancer modeling, are addressed using thenewly developed TumorML markup language.  A. SBML Among the numerous standards related to model descriptionat the subcellular level, CellML [16] and SBML [17] are the most widely accepted. Both attempt to describe the structureand underlying mathematics of subcellular models. SBML ismore specific and constrained in exchanging information aboutpathway and reaction models and uses successive hierarchicaldeclarationsofmodelconstituents.Thereisalsoawidecommu-nity supporting SBML and tools to convert CellML to SBML.We prefer SBML mainly based on its constrained nature, whichallows the language to be adopted quickly and evolve with therequirementsoftherepresentationandunderstandingofsystemsbiology.  B. TumorML The higher scale models enrolled in our environment are de-scribed using TumorML [15], an XML-based markup languagefor describing cancer models. The development of TumorMLcontributes to enabling some of the key interoperability aimswithin the TUMOR project.First,byannotatingcancermodelswithappropriatedocumentmetadata,digitalcurationisfacilitatedinordertomakepublish-ing, search, and retrieval of cancer models easier for researchersand clinicians using the TUMOR digital repository. Second,markup will be used to describe abstract interfaces to publishedimplementations allowing execution frameworks to run simula-tions using published models. Finally, TumorML markup facil-itates the composition of compound models, regardless of scaleand source, enabling multiscale models to be developed in amodular fashion, and models from all around the globe may beintegrated with any related models in the TUMOR transatlanticplatform. The TumorML model description will also incorpo-rate and integrate with the MIRIAM guidelines [18] in order toprovide reference correspondence, attribution annotation, andexternal resource semantic annotation to the described models.III. M ODEL  E XECUTION There are two main execution frameworks in the TUMORplatform. The first is based on the SBML description of a modelwhereasthesecondoneismoregenericinthesensethatamodelcan be provided as a self-contained executable. An SBML de-scription of a model is a declarative artifact. It describes themathematics required, typically in the form of ordinary differ-ential equations (ODEs), to implement the model and nothingelse. In order to implement the model, a solver is required to nu-merically resolve the equations and execute the correspondingreactionsbasedonthekineticlawsandtheprescribedparametervalues. This solver can be a simulation environment, a compilerthat links the SBML file with numerical library and generatesa standalone executable or a partial evaluator that attempts to  826 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014 unfold the ODEs with respect to known solving algorithms. Ingeneral, the SBML models can be classified as deterministic orstochastic, with the latter using Monte Carlo simulation and re-lated methods. The TUMOR execution infrastructure supportsdeterministic and stochastic models, through the incorporationof the COPASI simulator [19]. The use of COPASI softwareallows the parsing of SBML models and their execution butnevertheless there are a couple of parameters that need to bespecified prior to the execution:1) the simulation time for the model;2) the algorithm to be used, e.g., deterministic, stochastic, orhybrid.These parameters are not specified by SBML but they areessential in order for the models to produce the desired results.In order to support flexibility, the users can input values for bothparameters at runtime. These parameter values are then passedto the COPASI solver for simulating the models.In the more generic case, the model is provided with noinformation on its internals. The supplied code, either in binaryor in source format, should be able to be run as a command lineprogramwithitsinputsandoutputsspecifiedeitherascommandline options or as files. For example, if the execution framework (as in our case) is a Linux 64-bit environment, the suppliedexecutable code should be compliant with it. Of course, in thecase where the source code of the model is available in the formof a scripting language, like Python or Perl, there are fewerrestrictions imposed to the model creators.Irrespective of the models’ type (SBML or generic/commandline formats), TumorML offers a generic metadata “envelope”to describe both their interface, i.e., input parameters and out-put results, and execution requirements. The interface definitionprovides valuable information for  linking  models in the work-floweditor,basedontherequiredinputandthegeneratedoutput.Ontheotherhand,theexecutioninformationisutilizedfromtheworkflow’sruntime,whenthemodelsaresimulatedorexecuted.IV. W ORKFLOW  D ESIGN Systems biology presents a new way to study biological sys-tems shifting from a “reductionist” approach to a more holisticone [20]. In this new perspective, the complex biological sys-temsarenotstudiedbytheisolatedanalysisoftheircomponentsbut through their investigation as whole integrated systems withdynamic relationships among their parts.As a first step, we argue that the use of scientific workflowsis a legitimate way to achieve this holistic view of systemsbiology. In general, a workflow can be described as a sequenceof operations or tasks needed to manage a business process or acomputational activity. The latter definition can also be appliedtoscientificworkflows,whicharemeanttodecomposecomplexscientific experiments into a series of repetitive computationalsteps that could be run on supercomputers or distributed on acloud system [21].The proposed new scientific workflow management systemhasbeendesignedandbuiltfocusingexactlyontherequirementsimposedbythedomainusersandscenarios.Themainobjectivesof this new workflow design system are the following.1) To provide an easy, intuitive, and secure environment forthe design of integrative, predictive, computational mod-els represented as scientific workflows. The activities orsteps in these workflows represent computational modelsin the microscopic or macroscopic level that interact byexchanging information through their adjustable parame-ters.2) Tofollowa“SoftwareasaService”(SaaS)deploymentap-proach. In particular, the system is accessible through theWWW using state-of-the-art web protocols and follows acloud-based architecture in order to alleviate installationand maintenance costs.3) Tosupportthevisualrepresentationofthemodelsandtheirsimulation/execution at the workflow runtime by buildingon the TumorML model descriptions.4) Tobuilduponanextensiblearchitecturewherethemodelsare stored in potentially disparate model repositories [22].In terms of its architecture, the TUMOR workflow manage-ment system consists of two components:1) Theworkfloweditor(ordesigner),whichisawebapplica-tion,accessiblethroughtheusers’webbrowser.Thisisthegraphical front-end for the editing of the workflows, theinvocation of their execution, and the visualization of theresults. A depiction of its interface can be found in Fig. 1.2) The workflow engine, which is responsible for the man-agement and the execution of the workflows, the commu-nication with the model repositories, etc.The workflow designer depicts each model as a box with itsabstract interface (inputs and outputs) as little circles attachedto the model (see Fig. 1). The integration of the models intoa scientific workflow is then driven by the user through theintroductionofconnectinglinesbetweentwomodeloutputsandinputs, in a familiar box-and-arrows diagram. The connectinglines therefore represent “data-flow,” i.e., the flow of data froman output of the source model to an input of the destinationmodel. At the workflow level, inputs of models that are “free”(i.e., not connected) are used as inputs to the whole workflowat the workflow evaluation (execution) phase. Similarly, notconnected outputs of models are used to provide the high levelresults of the workflow execution.The connections between two models representing flow of data and information are not arbitrary but rather constrainedbased on the information that the TumorML descriptions of the models provide. In particular, the connected parameters arechecked both at the syntactic and the semantic level. At the syn-tactic level, the workflow designer validates that the parameterstobeconnectedhavethesamedatatype,e.g.,theybothrepresentan integer or a character (string) value. At the semantic level,the designer takes advantage of the semantic, MIRIAM-based,annotation of the parameters in the TumorML descriptions inorder to make sure that they represent the same physiologicalor biological entity. Additional checks include the validationof the units used for the parameters and the range of values.When the user tries to connect two models based on their out-puts and inputs by the familiar “drag-and-drop” operation, theapplicationprovidesinformationonthematchingparametersbyhighlighting the corresponding connectors. Therefore, the users  SAKKALIS  et al. : WEB-BASED WORKFLOW PLANNING PLATFORM SUPPORTING THE DESIGN AND EXECUTION 827 Fig. 1. Proposed workflow designer represents each model as a box with its abstract interface (inputs and outputs) as little circles attached to the model. can get an immediate visual indication when a connection be-tween two models is legitimate or not, based on syntactic (e.g.,data type) and the semantic (e.g., units, high level ontologyannotation).The search and discovery of the models is supported by theworkflowengine,whichhasbeenconfiguredtocontactacertainlistof modelrepositories.Asnoted previously, themodel repos-itories need to comply with specific architectural constraints,notably the use of TumorML for describing the models and aset of web service interfaces for querying the models, based ontheTumorMLdefined metadata,andretrievingtheirdefinitions.In addition to this model query and retrieval functionality, theworkflow engine is responsible for the user authentication, thestorageandretrievaloftheworkflowdefinitions,and,lastbutnotleast,theexecutionoftheuserdefinedworkflows.Theexecutionof the workflow is implemented by first performing a topolog-ical sort of the workflow, since the constructed workflows arein the form of directed acyclic graphs, in order to determinethe proper ordering of the model executions based on their datadependencies (connections). Subsequently, the TumorML de-scriptions of the models are again consulted in order to identifytheir execution requirements, and especially whether they arerealized as SBML or standalone, program-based, models. In thecase of the SBML models, the user is asked to provide addi-tional simulation information, as explained previously, such asthe simulation time and the algorithm to be used. Alternatively,the system validates that it can execute the standalone, com-mand line program that represents the model. Such validationincludes the check for the execution framework compatibilityof the binary files (e.g., Linux 64-bit), since this is the currentlysupported operating system and machine architecture.When the user provides the input parameter values for theworkflow and any additional execution information needed, theworkflow engine starts the evaluation of each model based onthe given parameter values and the outputs of the precedingmodels. During the execution of the workflow, the user is ableto “log out” of the application and the execution will continuein a “headless” manner, i.e., running in the background, in theserver’s premises. On the other hand, if the user wants, they caneven monitor the execution of the workflow and have a visualindication of which models are currently running and which areabout to be launched.The results of the workflow are available after its successfulcompletion along with a detailed listing of all the intermediateresults and files produced. Therefore, an execution trace is pro-ducedandkeptforfuturereferenceintheuser’saccountinorderto facilitate reproducibility and validation of the workflow.V. E XEMPLAR  C LINICAL  S CENARIO To test the presented framework and better evaluate the out-come, a complex clinically relevant scenario is presented asa test case. The scenario addresses the case of glioblastoma  828 IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014 Fig. 2. First compartment of the simulated tumor evolution is the EGFR-related molecular entity, which forms a gene-protein interaction network (greytop box, modified figure from [25], with permission from Elsevier). The PLC γ  ,a downstream element of the EGFR pathway is used to constrain the rates of its corresponding metabolic reactions in the second genome-scale metabolicmodeling compartment (middle box, modified figure reprinted by permissionfrom Macmillan Publishers Ltd: Nat. Rev. Cancer [24], copyright 2004). Themetabolic model estimates the proliferation rate of the glycolytic cancer cellsproviding a microscopic parameter to the tissue-level, macroscopic cytokineticmodel (lower box). multiforme combined modality treatment using radiation ther-apy and chemotherapy with temozolomide. The anonymizeddata were provided by the Institute of Pathology, UniversityHospital of Saarland, Germany. Schematically, the simulatedmodular tumor is illustrated in Fig. 2. The different modules,which include glioblastoma-specific epidermal growth factorreceptor (EGFR) signaling, cancer metabolism, and the On-cosimulator, are indicated as different colored boxes.The presented example reflects the multiscale fusion of threeindependently developed cancer models (as depicted in Fig. 1),in an attempt to link microscopic, genotype–phenotype char-acteristics of cancer cells into a macroscopic, tissue-level can-cer model. Reprogramming of signaling, gene regulatory, andmetabolic pathways has been usually observed in cancer cellsaffecting proliferation, migratory response, and other pheno-typic characteristics [23], [24]. Furthermore, these microscopic characteristicsaffecttumorevolution,morphology,invasionandmetastasis, as well as tumor response to treatment.Although not incorporated in the presented case, it shouldbe stressed out that in a realistic scenario, cells are in a con-stant interaction with their microenvironment, which dynam-ically shapes their molecular pathways and phenotypic prop-erties. Furthermore, tumors usually consist of heterogeneouscell populations with different traits. Therefore, depending onthe structure and variables of the macroscopic model differ-ent instances of the subcellular modeling components (e.g.,EGFRsignalingandcancermetabolism)correspondingtoalter-native environmental conditions or/and genetic traits could beincorporated.  A. EGFR Signaling Pathway-Based Model The EGFR has been implicated in several cancers includ-ing lung cancer, breast cancer, and glioblastoma, yet the EGFRactivity itself is not capable of predicting the phenotype of can-cer cells. As shown in Fig. 2 (top box), a microscopic, EGFRgene-protein interaction network-based model has been devel-oped [25]. Given initial concentrations of important moleculesin tumor microenvironment such as glucose, oxygen, and trans-forminggrowthfactor α (TGF α ),themodelpredictswhetherthecell proceeds to proliferation or migration. Specifically, whenthe change in PLC γ   concentration, an enzyme that lies down-streamofEGFRpathway,isbelowthemigration-threshold,thencells prefer to proliferate than migrate. This key enzyme (de-picted with a red arrow in Fig. 2) can be used to directly link EGFR signaling and metabolism through its regulatory effecton the rate of the metabolic reactions it catalyzes.  B. Cancer Metabolic Model Fig. 2 (middle box) shows the metabolic alteration of highlyproliferating cancer cells to inefficient-glycolysis regardless of whether oxygen is present (aerobic glycolysis). This metabolicreprogramming can be modeled utilizing genome-scale compu-tationalmodelingapproaches[26].BasedontheworkofShlomi et al.  [26], a genome-scale human metabolic network recon-struction consisting of 1496 ORFs, 3742 reactions, and 2766metabolites [27], is used in order to account for the interconnec-tivity of the metabolic reactions. In addition, differentially ex-pressed metabolic genes in glioblastoma multiforme [28], [29]are used as flux constraints in the corresponding metabolic re-actions for the construction of a cancer-specific model. Theconcentration of PLC γ   enzyme that is predicted by the EGFR-signaling-based model is also used to constrain the rates of 
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks