Documents

A Case Study in the Use of Defect Classification in Inspections

Description
A Case Study in the Use of Defect Classification in Inspections Diane Kelly and Terry Shepard Royal Military College Of Canada Abstract In many software organizations, defects are classi- fied very simply, using categories such as Minor, Major, Severe, Critical. Simple classifications of this kind are typically used to assign priorities in repairing defects. Deeper understanding of the effectiveness of software development methodolo- gies and techniques requir
Categories
Published
of 14
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Case Study in the Use of Defect Classification in Inspections Diane Kelly and Terry Shepard Royal Military College Of Canada Abstract In many software organizations, defects are classi-fied very simply, using categories such as Minor,Major, Severe, Critical. Simple classifications of this kind are typically used to assign priorities inrepairing defects. Deeper understanding of theeffectiveness of software development methodolo-gies and techniques requires more detailed classifi-cation of defects. A variety of classifications hasbeen proposed.Although most detailied schemes have been devel-oped for the purpose of analyzing software pro-cesses, defect classification schemes have thepotential for more specific uses. These uses requirethe classification scheme to be tailored to providerelevant details. In this vein, a new scheme wasdeveloped to evaluate and compare the effective-ness of software inspection techniques. This paperdescribes this scheme and its use as a metric in twoempirical studies. Its use was considered success-ful, but issues of validity and repeatability are dis-cussed. Keywords Software engineering, software maintenance,software metrics, software testing, softwarevalidation, orthogonal defect classification. 1. I NTRODUCTION Classification of software defects can be as simpleas specifying major/minor or as detailed as thescheme described by Beizer [2]. For decidingwhether to assign resources to fix defects, major/ minor may be sufficient. To assess sources of defects and trouble spots in a large complex system,something more detailed is needed.As described in [6], defect classifications havebeen successfully used to analyze and evaluate dif-ferent aspects of software development. Severalorganizations have developed defect classificationschemes to identify common causes of errors anddevelop profiles of software development method-ologies.For example, IBM has been refining a defectclassification scheme for about ten years [3,4,7].The IBM scheme is intended to provide analysisand feedback to steps in the software process thatdeal with defect detection, correction and preven- tion. As software techniques have evolved, IBM’sdefect classification has changed to add support fornew areas such as object oriented messaging andinternational language support. Defect classification schemes are concernedwith removing the subjectivity of the classifier andcreating categories that are distinct, that is, orthogo-nal. IBM’s scheme defines an orthogonal defectclassification (ODC) by defining a relatively smallnumber of defect types. The thought is that withfewer choices for any defect, the developer canchoose accurately among the types. To evaluate the results of an empirical study of specific software inspection techniques, we devel-oped a classification scheme specific to our needs.Similar to exercises carried out at IBM and SperryUnivac [6], we used findings from an extensiveindustrial inspection exercise [8] to develop adetailed defect classification specifically for com-putational code (ODC-CC) [9]. Each category inthe classification was then associated with one of four levels of understanding, based on the perceivedconceptual difficulty of finding a defect in a givencategory. Using these levels of understanding, theresults from two inspection experiments were ana-lyzed to determine if a new software inspectiontechnique encouraged inspectors to gain a deeperunderstanding of the code they were inspecting[10].Software inspection is recognized as an effec-tive defect detection technique, e.g. [14]. Researchinto improving this effectiveness has focused bothon the inspection process, e.g. [13], [17] and on individual inspection techniques, e.g. [11], [12]. In  each of the winters of 2000 and 2001, we conductedan experiment to examine the effectiveness of anew individual inspection technique called task-directed inspection (TDI) [9]. Instead of using sim-ple findings counts (e.g. [13]) as a basis for theanalysis of the technique, we used the ODC-CC todifferentiate between defects based on the differentlevels of understanding that findings represent.Validating the completeness of the coverage of the ODC-CC was straightforward, but validatingthe orthogonality of the ODC-CC is more problem-atic. Validation of the ODC-CC was carried out aspart of both experiments. Details are given in therest of the paper, with the main emphasis on thesecond experiment. 2. Defect Classification Schemes Defect classification schemes can be created forseveral purposes, including: ãmaking decisions during software development, ãtracking defects for process improvement, ãguiding the selection of test cases, and ãanalyzing research results.This paper illustrates a defect classification schemeused for the last of these purposes.The 1998 report by Fredericks and Basili [6]provides an overview of classification schemes thathave been developed since 1975. The goals formost of the schemes, from companies such as HP,IBM and Sperry Univac, are to identify commoncauses for defects in order to determine correctiveaction. Our ODC-CC is unique among the defect classi-fication schemes that we know of, in that it wasdeveloped specifically to analyze the results of inspection experiments. In other words, the activityof software inspection is analyzed in isolation fromthe software development process, without worry-ing about the cause of the defects or the actiontaken to fix the defect. This is a very different view-point from those of other classification schemes. Asa point of comparison, we describe the IBM ODCwhich has been evolving over the past ten years anddescribe how the ODC-CC differs. 3. The IBM Orthogonal Defect Classifi-cation Scheme The IBM Orthogonal Defect Classification (ODC)was srcinally described in the paper by Chillarege et al  in 1992 [4].The goal of the IBM ODC as described by Chil-larege et al  is to provide a measurement paradigmto extract key information from defects and use thatinformation to assess some part of a software devel-opment process for the purpose of providing correc-tive actions to that process. The application of the1992 version of the IBM ODC involves identifying,for each defect, a defect trigger  , a defect type , and adefect qualifier  . More recent versions of the IBMODC [7] include the activity  that uncovered thedefect, defect impact  , and defect target, age  and source  as well as the srcinally described trigger  , type , and qualifier.  The activity, trigger, and  impact  are normally identified when the defect is found;the others are normally identified after the defecthas been fixed.The activities  in the current version of the IBMODC (v. 5.11) include design reviews, code inspec-tion, and three kinds of testing. For the purpose of this paper, we focus on code inspection. There arenine defect triggers  assigned by the inspector toindicate the event that prompted the discovery of the defect.  Impact   presents a list of thirteen qualities thatdefine the impact the defect may have on the cus-tomer if the defect escapes to the field.Assigned at fix time, defect types  are describedin the 1993 paper by Chaar et al [3] as: assignment,checking, algorithm, timing/serialization, interface, function, build/package/merge, and documenta-tion . In version 5.11 of the IBM ODC [7], the inter- face  defect type has been expanded to includeobject messages and the algorithm  defect type hasbeen expanded to include object methods.  Build/  package/merge  and documentation  have beenremoved from the defect type list. An additionaldefect type now appears: relationship , defined as problems related to associations among proce-dures, data structures, and objects. The defect qualifier,  also assigned to the defectat the time of the fix, evolved from two qualifiers,  Missing  and  Incorrect,  to include a third qualifier,  Extraneous  [7]. As an example, a section of docu-mentation that is not pertinent and should beremoved, would be flagged as  Extraneous.Target   represents the high level identity of theentity that was fixed, for example, code, require-ments, build script, user guide.  Age  identifies the defect as being introduced in:ã  base : part of the product that was not modified bythe current project; it is a latent defect,  ã  new : new functionality created for this product,ã  rewritten : redesign or rewrite of an old function,ã  refix : a fix of a previous (wrong) fix of a defect Source  identifies the development area that thedefect was found in: developed in-house, reusedfrom library, outsourced, or ported. Since our goals for defect classification are dif-ferent from those of the IBM ODC, the IBM ODCis not completely suitable for our analysis. TheIBM ODC serves instead as a starting point and as apoint of reference for describing the ODC-CC. 4. Inspection Experiments In 1996, one of us developed a new technique forguiding and motivating inspections of computa-tional code [8]. In the winter of 2000, we conducteda first experiment to evaluate the effectiveness of this new inspection technique, called task-directedinspection (TDI) [9]. A second experiment wasconducted in the winter of 2001. The need to ana-lyze results of the experiments led to the develop-ment of the ODC-CC, and a metric based on it, asdescribed in the following sections. The intent of the metric is to differentiate inspection results insuch a way that the TDI could be compared toindustry standard inspection techniques such as adhoc or paraphrasing.The TDI technique, similar to scenario-basedinspection techniques [16], provides structuredguidance to inspectors during their individual work.The TDI technique piggybacks code inspections onother software development tasks and uses thefamiliarity the inspector gains with the code toidentify issues that need attention. In the applica-tion of TDI so far, the software development tasksthat combine readily with code inspections are doc-umentation tasks and development of test cases.Both experiments involved graduate studentsenrolled in a Software Verification and Validationgraduate course [15] offered at Queen’s and theRoyal Military College (RMC). The experimentseach consisted of applying three different inspec-tion techniques three different code pieces drawnfrom computational software used by the military.The computational software chosen was written inVisual Basic and calculates loads on bridges due tovehicle convoy traffic. The pieces of code chosenfor the experiments were all of equivalent lengthand were intended to be equivalent complexity. Thepieces were not seeded with defects in order to notpredetermine the types of defects in the code.The three inspection techniques chosen for theexperiments consisted of one industry standardtechnique and two TDI techniques. The industrystandard technique used was paraphrasing (readingthe code and acquiring an understanding of theintent of the code without writing the intent down).The two TDI techniques used in the first experi-ment were Method Description and White Box TestPlan. Method Description required the student todocument in writing the logic of each method in theassigned piece of code. White Box Test Planrequired the student to describe a series of test casesfor each method by providing values for controlledvariables and corresponding expected values forobserved variables.For the second experiment, three differentpieces of code were chosen from the same militaryapplication. The new pieces were shorter and turnedout to be less complex. The White Box Test Planwas simplified to a Test Data Plan. This involvedidentifying variables participating in decision state-ments in the code and listing the values those vari-ables should take for testing purposes.Both experiments were a partial factorial,repeated measures design where each student usedall three techniques on the three different codepieces. The application of a technique to a codepiece is referred to as a round. The paraphrasingtechnique was always used first, with the two TDItechniques being alternated amongst the studentsduring rounds 2 and 3. The code pieces used werepermuted amongst the students and the rounds. Forexample, student 1 may use code pieces 3, 2, 1 inrounds 1, 2 and 3 while student 2 uses code pieces2, 1, 3. In the first experiment, twelve students wereinvolved, which allowed the partial factorial designto be complete. Only 10 students were involved inthe second experiment, so the partial factorialdesign was incomplete.The goal of the experiment was to measure theeffectiveness of the TDI technique as compared tothe industry standard technique. Effectiveness wasdefined as the ability of the individual inspector todetect software defects that require a deeper under-standing of the code. To evaluate whether that hadbeen achieved, a metric was needed beyond simplycounting findings. The metric must differentiatebetween findings that simply address formattingissues and those that address logic errors. TheODC-CC [9] was developed for this purpose. Asso-  ciated with this detailed defect classification wasthe concept of a level of understanding. Each category in the defect classification wasassigned a level of understanding intended to reflectthe depth of understanding needed by an inspectorto be able to identify a defect in that category. Theanalysis of the experiment results involved firstclassifying each defect and then assigning the asso-ciated level of understanding.If an inspector using a TDI technique identifiesproportionally more defects at a deeper level of understanding, then using a TDI technique is moreeffective than using the paraphrasing technique forfinding these deeper defects.The depth of understanding needed to make afinding of course does not correlate with the endconsequence of the finding on the operation of thesoftware product. Defects that are logically difficultto find may have minor consequences while obvi-ous defects may have major consequences. 5. Comparing IBM ODC and ODC-CC The IBM ODC scheme [7] is probably the mostdeveloped of the classification schemes due to itscontinual evolution over the past ten years. How-ever, due to its different purposes, the IBM ODCdid not readily lend itself to what we needed for theanalysis of the results from the inspection experi-ments. By considering the attributes defined in theIBM ODC, we can both map our inspection activityto the ODC and identify where changes are neces-sary.Our defect removal activity  is code inspection.The  triggers  are defined in the IBM ODC as “what you were thinking about when you discov-ered the defect” [7]. We define a trigger in such away as to remove any subjective aspect to the activ-ity. Instead of considering “what you were thinkingof”, we define the trigger as “what task were youcarrying out”. In our inspection experiments, thiswas clearly defined, e.g. writing a method descrip-tion or creating a test plan.  Impact   was not considered in our experiment. Target   was the code or the documentation usedin our experiments.For the ODC-CC, we changed the time at whichdefect types are assigned, expanded the set of defecttypes, and decreased the granularity of classifica-tion.In the IBM ODC, defect  types  are assigned atthe time the developer fixes the defect. This meansthe defect types are defined in terms of the fix. Forexample, the defect type  function  is defined as Theerror should require a formal design change ... . Tosimplify evaluation of the results in our inspectionexperiments, the defect types are assigned beforethe time of fix without considering the change nec-essary to fix the problem. This is a valid view inindustry as well, where there are times wheninspection is decoupled from fixing.For our experiments, definitions of defect typesmust thus reflect the problem as the inspector per-ceives it in the code: the defect type must relate tothe code rather than the fix activity. For example,obscure language constructs, lack of encapsulation,and logically unrelated data items in a structure allreflect what the inspector may find. Any of thesecould eventually require a formal design change .This is a significant shift in viewpoint from the cat-egorization needed for the inspection experiment tothe IBM ODC categorization done by the fixer.As well as changing the viewpoint of the defecttype from fixer to inspector, we found the list of defect types for code and design was inadequate fordefects typically found in computational code.Defects such as poor naming conventions for vari-ables, inaccessible code, and inadequate capture of error conditions didn’t seem to fit any category inthe IBM ODC. It was unclear if wrong assumptionsshould be classified as assignment defects or algo-rithm defects. The ODC described in the Researchweb site [7] removed documentation  from thedefect type list, yet this is a category we needed forthe inspector.Finally, we expanded the number of defect typessubstantially. Finer detail was needed than wasoffered by the IBM ODC. The issue of granularityin a defect classification scheme leads in conflictingdirections. A small number of types may make iteasier to pick the one that applies. A larger numberof types may increase precision and give greatercertainty but may also mean that classification takeslonger. In our case, the extra detail was needed to beable to deduce the level of understanding needed tofind a given defect.In the IBM ODC the defect  qualifier   is alsodefined at the time of the fix. If we assign the defectqualifier at the time of the inspection, then furtherqualifiers are needed. Inconsistent becomes nec-essary since there are cases where, for example, theinspector may not be able to identify if the code is
Search
Similar documents
View more...
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks