An approach for reverse engineering of web-based applications

An approach for reverse engineering of web-based applications
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  An Approach for Reverse Engineering of Web-Based Applications G.A. Di Lucca  ¡  , M. Di Penta   , G. Antoniol   , G. Casazza  ¡  dilucca@unina.it ¢ dipenta,gec £ @unisannio.it antoniol@ieee.org ¤¡¥ University of Sannio, Faculty of Engineering - Piazza Roma, I-82100 Benevento, Italy ¤ ¡ ¡¥ University of Naples ”Federico II”, DIS - Via Claudio 21, I-80125 Naples, Italy Abstract The new possibilities offered by WEB applications are pervasively and radically changing several areas. WEBapplications, compared to WEB sites, offer substantiallygreater opportunities: a WEB application provides theWEB user with a means to modify the site status. WEB ap- plicationsrepresent a competitiveadvantage: theyare criti-cal and strategically relevant resources, not only to commu-nicate the company image, but also to manage productionand distribution.WEB applications must cope with an extremely short development/evolution life cycle. Usually, they are imple-mented without producing any useful documentation for subsequent maintenance and evolution, thus compromis-ing the desired high level of flexibility, maintainability, and adaptability that is de-facto necessary to compete and sur-vive to market shakeout.This paper presents an approach inspired by the reverseengineeringarenaandatoolprototypesupportingWEBap- plication reverse engineering activities, to help maintain,comprehend and evolve WEB applications. The approachdefines a set of abstract views, modeled using UML dia-grams, organized into a hierarchy of different abstractionlevels, depicting several aspects of a WEB application to facilitate its comprehension. A real world WEB application was used as case study,and information previously not available was recovered,with encouraging results. Keywords: WEB applications, Reverse Engineering,UML, WEB Engineering 1. Introduction The World Wide Web ability to ubiquitously provideand gather information, the economy globalization to-gether with the need of new marketing strategies havetremendously boosted the development of WEB applica-tions (WA): software applications backbonedon the WWWinfrastructure.The WEB boom pervasively and radically changed sev-eral areas. Information gathering/managing, commerce,software development, maintenance and evolution are justa few examples of human activities reshaped by the WEBtechnologies embedded in WA.Software companies having a geographically distributedstructure or geographicallydistributed customers are adopt-ing WA to communicate, share and exchange knowledgeand information between different company branches andwith customers.WA, compared to WEB sites, offer considerably greateropportunities: a WEB site may be thoughtof as a static site;it may sometimes display dynamic information (e.g., an ac-cess counter or the date). In contrast, a WA, usually backedup by a database, provides the WEB surfer with a meansto modify the site status (e.g., by adding/updating informa-tion to the site). WA are the underlying engines of any e-business: business-to-customer or business-to-business ap-plications.As clearly demonstrated by large companies (e.g., Or-acle, Amazon, Yahoo, eBay), WA represent a competitiveadvantage: they are critical and strategically relevant re-sources, not only to communicate the company image, butalso to manage production and distribution.WA must cope with an extremely short develop-ment/evolution life cycle: a high level of flexibility, main-tainability, and adaptability are de-facto necessary to com-pete and survive to market shakeouts. Unfortunately, to ac-complish time-to-market, WA are usually directly imple-mented without producing any useful documentation fortheir maintenance and evolution, and so those requirementsare never satisfactorily met.This paper proposes an approach and a tool prototype,inspired by the Reverse Engineering (RE) arena, to sup-port the maintenance and evolution of WA. Abstract repre-sentations proved to be valuable in the traditional softwarecomprehension, maintenance and evolution tasks. Much in DRAFT  the same way, we believe that suitable representations auto-maticallyorsemi-automaticallyextractedfromexistingWAmay ease the task of WA evolution.The approach proposes a process, clearly defining thereverseengineeringactivities, and is complementedby a setof views organized into a hierarchy of different abstractionlevels; those views were cast into UML diagrams. Toolsto cope with existing WA and languages (e.g., MicrosoftVBScript and PHP) are currently under development.The proposed RE approach deals both with new WA de-veloped from scratch and derived from legacy system. Ineither cases, these applications must evolve to cope with anever-changingenvironment, but no one has the entire appli-cation picture or the idea of where business rules are coded.The approach was applied to real world WA, recoveringinformation previously not available. This paper describesthe approach, the preliminary tool and a case study.The remainder of the paper is organized as follows: af-ter a discussion on related works, an overview of a UMLextension for the WEB is summarized, then the approach ispresented in Section 4 and a conceptualmodel describingaWA is proposed in Section 5. To give concreteness, a pre-liminary tool set and a case study are presented in Sections6 and 7. 2. Related Works There are several works in literature describing pro-cesses, methodologies and tools related to the develop-ment WA. Fewer contributions deal specifically with theproblem of recovering information from existing WEBsites/applications.MethodologiesforengineeringWA aredescribedin [10,11, 12, 13], while Cloyd [6] described a process able todesign WA, and Beker, Mattley [2] proposed a model topromote user satisfaction.Some other works, similarly to this paper,focusedon theextraction and analysis of the architecture of a WA. ChungandLee [5] adoptedthe ConallenUML extensions [7, 8, 9]for WEB site reverse engineering, and extracted compo-nent diagrams (they considered each page as a component)and package diagrams (packages are directly mapped toWEB site directory structure). Ricca and Tonella [14, 15]proposed ReWeb , a tool able to perform several traditionalsourcecode analysis on WEB sites: dominancerelationandreachability analysis were extended to graph representingWEB sites. Authors proposed to consider strongly con-nected components, and introduced the idea of pattern re-covery over WEB representations.Schobe et al. [16] proposed a framework to reuse de-sign in WA, separating application behavior concerns fromnavigational modeling and interface design. Antoniol et al.[1] claimed the challenge to apply program comprehensionand maintenance knowledge to WEB sites. In [1], a tax-onomyof increasing complexityis discussed: at the highestlevels, there are WA encompassing databases and scriptinglanguages; sites that are a mix between a program and adatabase.Finally, some metrics-related works: Turau [17] ana-lyzedthepresenceofsomecomponents,like forms,images,maps,applets,cookieinWA,analyzingdifferencesbetweenacademicsites andcommercialsites. Warren, BoldyreffandMunro [18] adapted some conventional metrics and hyper-textmetricstotheWEB, anddevelopedanewset ofspecificWWW metrics.Noticeably, the above papers are mostly focusedon the static aspects of a WA, and mainly about acoarse grained WA representations, representations cen-tered around HTML pages.Dynamic features are almost always disregarded (i.e.,there is no identification of the dynamic interactions amongthecomponentsofaWA)aswellasafinergrainedrepresen-tation including pages’ sub-components. However, thoseaspects are extremely relevant to fully comprehenda WA tomaintain or evolve it.Our contribution is geared toward site maintenance andevolution. We share with [14, 15] the idea of a tool toextract abstraction from exiting sites. However, much like[5], we adopted a representation based on UML. In con-trast to [5], we propose different levels of granularity andour approach include the analysis and the representation of dynamic and behavioral aspects of a WA. We use a lay-ered tool architecture where a repository acts as separa-tion between extraction and abstraction . As a consequencethe user is not really tied to a specific set of queries; newqueries may interactively be formulated and thus, new dia-grams/views constructed. 3. Background Notions As stated above, the number and the complexity of WAare steadily increasing. By and large, the industry efforthas focused on the deploymentof technologies, methodolo-gies and tools mostly focused on the forward engineeringphases, and more precisely, on the coding . The scriptinglanguages proliferation, the adoption of new technologiesfor developing the client, e.g., Shockwave, or the introduc-tion of XML to separate contents from presentation are justfew examples of the aforementioned trend.Other phases of the developmentprocess, and more gen-erally, of the application life-cycle have not been deeply in-vestigated. The result is a lacking in the knowledge andin our understanding of how complex WA should be devel-oped, tested, documented to promote desired qualities suchas maintainability, flexibility, portability, in a word to fosterevolution. DRAFT  Figure 1. Composition between page andforms, and submit to server pages. J. Conallenintroducedan extensionanda tailorizationof UML for modeling and developing WA [7, 8, 9]. Briefly,both static diagrams (use cases, class diagrams, componentdiagrams, deployment diagrams) and dynamic diagrams(sequence diagrams and collaboration diagrams) were ex-tended and applied as follows:   Use Cases , often developed in the first steps of theanalysis, highlight the interaction of an actor with thesystem (the information content is the same of a tra-ditional application development).They model the be-havior of a WA from the user point of view.   Class diagrams model pages and pages relations; anypage is considered as a class; distinction among differ-ent kind of pages (client pages, server pages, frame-sets, etc.) is obtained by means of stereotypes. Pagecomponents (client and server scripts, forms, applets,etc.) are also represented by classes (and their in-stances by object in object diagrams). Relations modelthe usual notions in the usual way (e.g. a server pagehas an association named “build”with any client pageit generates; a page may be an aggregation of forms; aradio button is a specialization of an input field). Classdiagrams may be used to model the static architectureof a WA;   Component diagrams/package diagrams representrelations between different resources (client pages,server pages, DBMS, libraries) composing the WA.Component diagrams Model the WA implementationarchitecture;   Deployment diagrams focus on the configuration of different peers involved in the application, on their lo-cation and communication. These diagrams are veryuseful for the comprehension of a WA, highlightingwhere resources are located and/or executed;   Sequence diagrams and collaboration diagrams highlight dynamic behavior of each use case, detailingthe interaction of actors with the system. They modelthe interactions of WA components corresponding tothe WA behavior.The class diagram shown in Figure 1 represents compo-sition relations between a page and two forms enclosed init, and submit relations from the forms to server pages thatprocess data. 4. The RE approach Reverse Engineering (RE) processes are characterizedby goals , models and tools . While tools aim to support therecovering process, goals and models specify the core of any RE process. Goals focus on RE motivations, they helpto define a set of abstract views representing the reverse en-gineered application; models deal with the definition of theinformationto extract;modelsveryoftenarecomplementedby intermediate representations upon which views are built[3, 4].As in traditional software applications, WA behaviorstems from static and dynamic elements, thus WA RE pro-cesses must recover:   the static architecture;   the dynamic interactions;   the behavior.Goals and models are therefore instantiated according tothe above elements and consequently we propose a RE pro-cess (see Figure 2) encompassing the following phases:1. Static Analysis;2. Dynamic Analysis;3. Behavioral Analysis.The aforementionedphases recoverviews thatcanbe ad-equately represented by extended UML diagrams (see Sec-tion 3). In particular, in this approach the following dia-grams have been adopted: DRAFT    Class diagrams to represent the architecture of a WA;   Sequence and collaboration diagrams to represent thedynamic model;   Use Case diagrams to represent the WA behavior.UML diagrams recoveryrequires as preliminarystep thelocalization and identification of  elements such as pages,frames, forms, scripts, ( elements composing the applica-tion) and the relations among them. This is per-se a chal-lenging activity in that it requires the parsing of multi-language files comprising a mixed of HTML, scripting lan-guages, Java code etc. Moreover, to obtain independencefrom environments and tools, and to ensure a higher flexi-bility,recoveredinformationis mappedintoanintermediaterepresentation stored in a repository. ANALYSISANALYSISANALYSISSTATICDYNAMICWeb SiteSource CodeComponents andDirect Relationsbetween themDynamic Interactionsbetween ComponentsWA inExecutionBEHAVIORAL BehavioralModel Figure 2. Reverse Engineering process of aWeb Application. 4.1. Static Analysis Static analysis does not execute the application. It re-covers WA architecture components and the static relationsamong them. HTML files, directory structure, scriptinglanguage sources as well as any other static information(e.g., database structure, applet/servlet code) are processed.HTML pages and page sub-elements (frames, forms, wid-gets) composing the given page are localized, classifiedand recorded in an intermediate representation. Central tothe RE process is the mapping between WA elements andobject oriented entities, according to Conallen proposals[7,8, 9], we madethe assumptionthatHTMLpagesandrel-evant sub-elements (e.g., database connections)are mappedinto classes, while link are mapped into relations. In otherwords,eachidentifiedcomponentis a candidateclass, whilethe links between pages or page elements are translated intocandidaterelations. Parameters of eachcomponentarecon-sidered as class attributes or as new classes. Elementary pa-rameters may be modeled as attributes, while other, such asdatabase connections, may require the introduction of newclasses, see [7, 8, 9] for details.The static analysis phase may be decomposed in the fol-lowing activities:   Inventory (e.g., WA files, databases and more gener-ally components);   Component localization;   Relation recovery;   Intermediate representationgeneration. Recovered intermediate representation populates arepository upon which queries and views are constructed.It is worth noting that at the end of the static analysis phasea first approximation of the WA class diagram is available.Unparsed COTS or databases may serve to generate HTMLdocuments that are not discovered by the static analysisphase. 4.2. Dynamic Analysis Thedynamicanalysisphaserelyonthestaticanalysisre-sults. The WA is executed and dynamic interactions amongthecomponentsdescribedinthe class diagramarerecorded.Dynamic analysis is performed observing the executionof the WA, tracing to source code (and, consequently,to the classes represented in the class diagram) any eventor action. Traced events are those observed  by the useror related to components external to the WA (e.g., thirdparty databases or WEB sites). Events are the HTMLpages/frames/forms visualization, the submission of forms,the processing of data, a link traversal, or a database query,etc.All element responsible of these actions (typically links,scripts applets) are localized and the actions given to themethodof related classes. The sequencesof actions fired byan event, deriving from WA code control flow (e.g., accessto a database following a user form submission) or fromuser actions (e.g., clicking on a link or submitting a form)areassociatedtosequencesofmessagesexchangedbetweenthe objects of the WA. These sequences can be representedby sequence diagrams (or by collaboration diagrams ).Notice that dynamic information is also used to verify, val-idate and, eventually, complete the class diagram extractedby the previous phase. In other words, repository informa-tion is complemented, augmented and assessed by dynamicanalysis phase.The dynamic analysis phase may be decomposed in thefollowing activities:   WA execution: the WA is executed tracing the execu-tion to the source code and to the class diagram. DRAFT    Verification & Validation: for each page displayedthe class diagramhas to include: a class correspondingto the page itself; a class for each component includedinto the page; appropriate relationships linking theseclasses. Hypertextual links are represented by associa-tions among classes.   Detection of Interactions: elements interacting bymessages (caused by events fired from the control flowofthe WA orfromuseractions)are detectedandtracedto the classes.   Abstraction of sequence/collaboration diagrams: these diagrams are recovered to describe the operatingscenarios.It is worth noting that the Detection of Interactions steplocalizes those ’active’ elements responsible of WA ’ac-tions’. Thesecomponentsmaybe scripts, applets, hypertex-tual links, etc. Each action performed is associated with aservicein the class whois responsibleforthat action. Whilestatic analysis may be automatically performed, dynamicanalysis requires human intervention. WEB server log filesmay be used to extract sequence of events, that played back mimic the user interaction while the loading of a page ina browser may be recognized as an observable user event.However, at the present level of tool implementation, theWEB server was not hacked (e.g., values corresponding toform inputs are not saved) thus, for example, forms fillingrequires the human intervention. 4.3. Behavioral Analysis The behavioralor functionalanalysisessentially consistsin abstraction processes oriented to detect the behavior of the WA from the user point of view. The recovered behav-ior is described by use case diagrams. This phase may bedecomposed in the following tasks:   Analysisofsequence/collaborationdiagrams: allin-teraction diagrams are analyzed to abstract functionalbehaviors grouped into use cases;   Use cases definition: use cases, actors, uses and ex-tends relations are defined on the basis of the func-tional behaviors;   Usecasediagramsabstraction: the use cases definedin the previous step are representedin diagrams, at dif-ferent levels of details. 5. A Conceptual Model for WA A WA conceptual model has to specify abstractions rep-resenting the application, its components and the relationsbetween components.At first, and coarse grain, level, WA could be thought of as composed by HTML pages. A page or a group of pagesis/are responsible of a defined behavior of the WA. Pagesare deployed on a WEB server; the WEB server processesclient requests sending back HTML code, client scripts, ap-plets, images, etc. What the WEB server sends to the clientmay or may not correspond to a physical file stored on theserver: CGI bin, Servlet, server-side include, ASP and re-lated technologies and tools may generate pages on-the-fly.Morepreciselythe followingpreliminarytaxonomycon-cerning pages was considered:   server pages (i.e., pages that reside on the server) asopposedto clientpages (i.e.,the pagesthat are actuallysent to a client);   staticpages asopposedto dynamicpages : astatic pagecontent is fixed, while a dynamic page content variesover time;   simple pages as opposed to framed pages : a page maybedividedinto frames usinga particularpage, a frame-set  . A page appears in a frame specified by a target  .   unlinkingpages as opposedto linkingpages : a page,ora page component, may have hypertextual links to it-self ortootherpages. Links, inturn, maybe bothstaticand dynamic, i. e. the linked component is always thesame or it is defined at run time.A page will always be an aggregation/composition of finer grainedcomponents,such as text, images, input/output  form, text box, multimedia objects (sounds, movies), an-chors, scripts, applets , and so on. Page components (e.g.,scripts or applets) may be active components . An activecomponentis a componentperformingsome processing ac-tion, for example, it may exchange data with other pages.A page, usually a server page, may be linked to otherobjects allowing the connections of the WA to a DBMS,managing the data of the WA, or to other systems.To obtain WA abstractions, it is necessary to extract,from the source code , all the information to identify pages,their components, the relations (both static and dynamic)existing among pages and components. WA abstractionsmust represent:   The WA static architecture;   The component dynamical interactions;   ThebehaviorsoftheWA, assigningcomponentstoanygiven behavior.The proposed RE approachfocuses on a recoverof com-ponents that may affect the behavior (and, thus the compre-hension) of the WA; i.e. components like pages, scripts, ap-plets, input/outputforms, frames, links are recovered,while DRAFT
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks