Arts & Culture

An Empirical Study of Operating Systems Errors

Description
An Empiricl Study of Operting Systems Errors Andy Chou, Junfeng Yng, Benjmin Chelf, Seth Hllem, nd Dwson Engler Computer Systems Lbortory Stnford niversity Stnford, CA 9435 {cc, junfeng, bchelf, shllem,
Categories
Published
of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
An Empiricl Study of Operting Systems Errors Andy Chou, Junfeng Yng, Benjmin Chelf, Seth Hllem, nd Dwson Engler Computer Systems Lbortory Stnford niversity Stnford, CA 9435 {cc, junfeng, bchelf, shllem, Abstrct We present study of operting system errors found by utomtic, sttic, compiler nlysis pplied to the Linux nd OpenBSD kernels. Our pproch differs from previous studies tht consider errors found by mnul inspection of logs, testing, nd surveys becuse sttic nlysis is pplied uniformly to the entire kernel source, though our pproch necessrily considers less comprehensive vriety of errors thn previous studies. In ddition, utomtion llows us to trck errors over multiple versions of the kernel source to estimte how long errors remin in the system before they re fixed. We found tht device drivers hve error rtes up to three to seven times higher thn the rest of the kernel. We found tht the lrgest qurtile of functions hve error rtes two to six times higher thn the smllest qurtile. We found tht the newest qurtile of files hve error rtes up to twice tht of the oldest qurtile, which provides evidence tht code hrdens over time. Finlly, we found tht bugs remin in the Linux kernel n verge of 1.8 yers before being fixed. 1 Introduction his pper exmines fetures of operting system errors found utomticlly by compiler extensions. We ttempt to ddress questions like: Do drivers ccount for most errors? How re bugs distributed? How long do bugs lst? Do bugs cluster? How do different operting system kernels compre in terms of code qulity? We derive initil nswers to these questions by exmining bugs in 21 snpshots of Linux spnning seven yers. We cross check these results ginst recent OpenBSD snpshot. he bugs tht we exmine were found in previous work, which used compiler extensions to utomticlly find violtions of system-specific rules in kernel code [8]. hese bugs fll into severl ctegories including: not relesing cquired locks, clling blocking opertions with interrupts disbled, using freed memory, nd dereferencing potentilly null pointers. Bsing our nlysis on compiler-found errors hs two nice properties. First, the compiler pplies given extension uniformly cross the entire kernel. his evenhnded error slice llows us to do mostly pples-topples comprison cross different prts of the kernel. Likewise, we cn compre two different kernels by running the sme checks over both. hese comprisons would be difficult to mke with mnul error reports becuse they tend to overrepresent errors where skilled developers hppened to look or where bugs hppened to be triggered most often. Second, utomtic nlysis lets us esily trck errors over mny versions, mking it possible to pply the sme nlysis to trends over time. he scope of errors used in this study, though, is limited to those found by our utomtic tools. hese bugs re mostly strightforwrd source-level errors. We do not directly trck problems with performnce, highlevel design, user spce progrms, or other fcets of complete system. Whether or not our conclusions will pply to these types of issues is n open question. he pper revolves round five centrl questions: 1. Where re the errors? Section 3 compres the different subsections of the kernel nd shows tht driver code hs error rtes three to seven times higher for certin types of errors thn code in the rest of the kernel. 2. How re bugs distributed? Section 4 shows tht the error distribution is redily mtched to logrithmic series distribution whose properties could yield some insight into how bugs re generted. 3. How long do bugs live? Section 5 clcultes informtion bout bug lifetimes cross ll 21 kernel snpshots nd shows tht the verge bug lifetime for certin types of bugs is bout 1.8 yers. 4. How do bugs cluster? We would expect tht if function, file, or directory hs one error, it is more likely tht it hs others. Section 6 shows tht clustering tends to occur most hevily where progrmmer ignornce of interfce or system rules combines with copy-nd-pste. For the most hevily clustered error type, less thn 1% of the files tht were checked contined ll of the errors. 5. How do operting system kernels compre? Section 7 shows tht OpenBSD hs higher error rte thn Linux on ech of the four checkers we used to compre them. OpenBSD s error rtes rnge from 1.2 to six times higher. he pper is lid out s follows. Section 2 describes the kernels we check nd how we gther dt from them. Section 3 exmines where bugs re. Section 4 discusses the distribution of error counts nd mtches it to theoreticl distribution. Section 5 ddresses how long bugs live. Section 6 describes how bugs cluster. Section 7 compres OpenBSD nd Linux. Finlly, Section 8 summrizes relted work. 2 Methodology his section discusses the versions of Linux tht we use for our study nd the system tht we use to gther our results. 2.1 Where the dt comes from Our dt comes from 21 different snpshots of the Linux kernel spnning seven yers. We use Linux for severl resons. First, the source code is freely vilble. Without this feture, compiler-driven study could not work. Relese snpshots dting bck to the erly nineties re redily ccessible, llowing us to look for trends in time nd llowing others to get these sme releses to check our results. Second, Linux is widely used. As result, reltive to other systems, its code hs been hevily tested, mening tht mny of the bugs tht re esy to find hve lredy been removed. Finlly, mny progrmmers hve developed Linux code. In ggregte, this effect should reduce the degree to which our results re skewed becuse of individul idiosyncrsies. Structurlly, the Linux kernel is split into 7 min sub-directories: kernel (min kernel), mm (memory mngement), ipc (inter-process communiction), rch (rchitecture specific code), net (networking code), fs (file system code), nd drivers (device drivers). Figure 1 shows the size of the code tht we check cross time. he size is mesured in millions of lines of code (LOC), including newlines nd comments. Ech of the 21 different releses tht we check re mrked with point. he grph ignores ll prts of the kernel specific to rchitectures other thn x86. he grph shows severl interesting fetures: he checked snpshots hve grown by fctor of roughly 16 (from 15K lines t version 1. to 1.6 million lines in version 2.4.1). he bulk of the code we check comes from the drivers. At the extreme ends of the grph, versions 1. nd 2.4.1, driver code ccounts for bout 7% of the code size; in the middle of the grph, this percentge drops to slightly over 5%. In the two yers between 2.3. nd the size of the OS lmost doubles, growing s much s it did in the previous 5 yers. Most of this growth comes from drivers. Secondry contributors re the file systems nd network code. 2.2 Mesurements Most of the grphs in this pper re built upon four different mesurements. he first three re computed directly from the code, while the lst is clculted from the other metrics: Inspected errors: these were errors we mnully reviewed. Projected errors: these were unreviewed errors found by low flse positive checkers. Notes: these count the number of times check ws pplied. If there re no notes there cn be no errors. Million Lines of Code otl drivers fs net other Linux Code Bse Growth /94 1/95 1/96 1/97 1/98 1/99 1/ 1/1 ime Figure 1: he size of the Linux tree tht we check over time. Versions , 2.1.{2,6,1,12}, 2.3.{1, 3, 4}, pre6, nd 2.4. hve + mrk but re not lbelled. Most of the growth comes from drivers; secondry contributors re the file system nd network code. he growth of the rest of the kernel is significntly smller. he growth rte chnges t 2.3. where the rte of new driver code increses. Reltive error rte: this metric is the number of errors, either inspected or projected, divided by the number of notes for tht error type: err rte = errors/notes. For exmple, if one kernel hs one error nd ten notes, its verge error rte will be 1/1 = 1%. We use this to normlize results when compring different code bses or checkers. 2.3 Gthering the Errors Our errors were found by the twelve system-specific checkers listed in ble 1. hese come from previous work on the xgcc compiler [8]. Wheres this pst work demonstrted the effectiveness of system-specific sttic nlysis, it ws reltively unreflective bout how nd why the errors rose. his pper tkes the pproch s given nd focuses solely on the errors. o get the inspected errors, we mnully exmined the error logs produced by the checkers for smll number of kernel versions nd determined which reports were bugs nd which were flse positives. hese selected error logs were nnotted with this informtion nd propgted to ll other versions. he propgtion process used the inspected error logs for one kernel version to utomticlly nnotte ny errors tht lso pper in other, uninspected error logs. For exmple, for the Null checker, we mnully inspected the errors for Linux Ech error report ws nnotted, nd then the nnotted results were propgted bckwrds through ech version bck to 1.. If bug in ws lso reported for n erlier version, these versions utomticlly got the bug nnottion. We did this bck propgtion for ll bugs found in the kernel. In ddition to inspecting error logs ourselves, we distributed them to system implementors for externl confirmtion. o get the projected errors, we rn checkers with low flse positive rtes over ll Linux versions nd treted their unexmined results s errors. We primrily use three low flse positive checkers in this pper: Vr, Check Nbugs Rule checked Block o void dedlock, do not cll blocking functions with interrupts disbled or spinlock held. Null Check potentilly NLL pointers returned from routines. Vr Do not llocte lrge stck vribles ( 1K) on the fixed-size kernel stck. Inull 69 Do not mke inconsistent ssumptions bout whether pointer is NLL. Rnge 54 Alwys check bounds of rry indices nd loop bounds derived from user dt. Lock 26 Relese cquired locks; do not double-cquire locks. Intr 27 Restore disbled interrupts. Free 17 Do not use freed memory. Flot Do not use floting point in the kernel. Rel Do not lek memory by updting pointers with potentilly NLL relloc return vlues. Prm 7 Do not dereference user pointers. Size 3 Allocte enough memory to hold the type for which you re llocting. ble 1: he twelve checkers used in this pper. If the checker hs few flse positives, we report the number of bugs s inspected + projected. In totl there re 125 bugs. he top three re the primry projected checkers: we ssume ll potentil errors reported by these checkers re rel bugs. he middle set of checkers re used throughout the pper, but we only count mnully inspected errors from s rel bugs. he bottom set of checkers re used only occsionlly throughout the pper. Number of Bugs otl Number of Projected Bugs hrough ime 1.2. Code Bse Growth Block-projected Null-projected Vr-projected Flot-projected Rel-projected pre /94 1/95 1/96 1/97 1/98 1/99 1/ 1/1 Figure 2: he bsolute number of projected errors in this study. We believe 1 is conservtive estimte of the number of unique bugs we hve. he errors found by the three projected checkers re usully function of code size, though the block checker hs n unusul dip from version to he number of projected errors goes down t for Block nd Null becuse bout 3 Block errors nd 4 Null errors were fixed in tht version. Block, nd Null. he Vr checker produces lmost no flse positives, Block less thn three percent, nd Null less thn ten percent. While the projected results hve more noise, they re firly representtive of the inspected results. Rw error counts lone cnnot nswer questions relting to error rtes, which require some notion of the number of times progrmmer hs correctly obeyed given restriction. hus, we lso use notes, which re emitted whenever n extension encounters n event tht it checks. For exmple, the Null checker notes every cll to kmlloc or other routines tht cn return NLL; the Block checker the number of criticl sections it encounters, the Free checker the number of delloction clls it sees, etc. Notes re the number of plces progrmmer MLOC could mke mistke relevnt to given check. hus, for given checker, dividing the number of errors by the number of notes gives the reltive error rte. Figure 2 grphs ll the projected errors we use. We hve pproximtely 1 unique bugs in totl, counting both projected nd inspected errors. here re severl fetures to note bout the grph: he number of errors for the unsupervised checkers generlly rises over time, especilly fter the relese of version he Block checker ccounts for n unexpectedly lrge number of the errors. Mny developers seem unwre of the restriction tht it checks. he Null checker lso ccounts for lrge number of errors. his is cused by creless slips, ignornce of exctly which functions might return NLL, nd the ubiquitous use of NLL pointers to indicte specil cses. 2.4 Scling A key feture of our experimentl infrstructure is tht it is lmost completely utomtic. he min mnul prts re ctully writing checkers nd, for inspected bugs, uditing their output for single run. Running checker over ll versions of Linux requires typing single commnd. hese results re then utomticlly entered in dtbse nd cross-correlted with previous runs. A common pttern is inspecting errors from the most recent relese nd then hving the system utomticlly clculte over ll releses how long ech error lsts, where it dies, how mny checks were done, nd the reltive error rte. Further, with the exception of some xis lbeling, ll the grphs in this pper re generted from scripts. hus, dding new results nd even new checkers or operting systems requires very little work. 2.5 Cvets here re severl cvets to keep in mind with our results. First, while we hve pproximtely thousnd errors, they were ll found through utomtic compiler } } 6, rs rs rs rs rs rs r r rs s s Œ 7 ' '( ( ( S S S S j k w Ž Ž k w m m m m m H H H H I I I I I ` ` ` ` ` ` ` ` ` ` ` m m y m y m y m y m y nlysis. It is unknown whether this set of bugs is representtive of ll errors. We ttempt to compenste for this by (1) using results from collection of checkers tht find vriety of different types of errors nd (2) compring our results with those of mnully conducted studies ( 8). he second cvet is tht we tret bugs eqully. his pper shows ptterns in ll bugs. An interesting improvement would be to find ptterns only in importnt bugs. Potentil future work could use more sophisticted rnking lgorithms (s with Intrins [11]) or supplement sttic results with dynmic trces. he third cvet is tht we only check long very nrrow xes. A potentil problem is tht poor qulity code cn msquerde s good code if it does not hppen to contin the errors for which we check. We try to correct for this problem by exmining bugs cross time, presenting distributions, nd ggregting smples. One rgument ginst the possibility of extreme bis is tht bd progrmmers will be consistently bd. hey re not likely to produce perfectly error-free code on one xis while busily dding other types of errors. he clustering results in Section 6 provide some empiricl evidence for this intuition. A finl, relted, cvet is tht our checks could misrepresent code qulity becuse they re bised towrd low-level bookkeeping opertions. Idelly they could count the number of times n opertion ws eliminted, long with how often it ws done correctly (s the notes do). he result of this low-level focus is tht good code my fre poorly under our metrics. As concrete exmple, consider severl thousnd lines of code structured so tht it only performs two potentilly filing lloctions but misses check on one. On the other hnd, consider nother severl thousnd lines of code tht perform the sme opertion, but hve 1 lloction opertions tht cn fil, 9 of which re checked. By our metrics, the first code would hve 5% error rte, the second 1% error rte, even though the former hd n rgubly better structure. 3 Where Are he Bugs? Given the set of errors we found using the methodology of the previous section, we wnt to nswer the following questions: Where re the errors? Do drivers ctully ccount for most of the bugs? Cn we identify certin types of functions tht hve higher error rtes? 3.1 Drivers Figure 3 gives brekdown of the bsolute count of inspected bugs for Linux At first glnce, our intuitions re confirmed: the vst mjority of bugs re in drivers. his effect is especilly drmtic for the Block nd Null checkers. While not lwys s striking, this trend holds cross ll checkers. Drivers ccount for over 9% of the Block, Free, nd Intr bugs, nd over 7% of the Lock, Null, nd Vr bugs. Since drivers ccount for the mjority of the code (over 7% in this relese), they should lso hve the most bugs. However, this effect is even more pronounced when we correct for code size. Figure 4 does so by plotting the rtio of the reltive error rte for drivers versus the rest of the kernel using the formul: err rte drivers /err rte non drivers Number of Errors Number of Errors per Directory in Linux Block Free Inull Intr Lock ) ) * * Null Rnge ) ) * * Vr ) ) * * ) ) * * 8 ) ) * * ; ; ) ) * * - -..! #$ / / 87 %& : : ; ; 9 9 : : ' ) ) * * other rch/i386 net fs drivers Figure 3: his grph gives the totl number of bugs for ech checker cross ech min sub-directory in Linux We combine the kernel, mm, nd ipc subdirectories becuse they hd very few bugs. Most errors re in the driver directory, which is unsurprising since it ccounts for the most code. Currently we only compile rch/i386. he Flot, Prm, Rel, nd Size checkers re not shown. Rte Rte of Errors compred to Other Directories 7 = =? ? J J K K V V W W b b c c b b c c A A B B C C D D D D D D D D D D E E E E E E E E E E LMNO P P Q Q F FG R R R R R R R R d de XYZ Z[ f fg \ \ \ \] ] hi ^ ^ _ _ j Block 6 Free Inull 5 Intr Lock 4 n n o o z z { { Null z z { { Rnge 3 v v t t u u v v x x x x x x x x x x Vr 2 1 ˆ ˆ ~ ~ Š p p q q Œ ƒ ƒ t t u u ƒ ƒ H other rch/i386 net fs drivers Figure 4: his grph shows drivers hve n error rte up to 7 times higher thn the rest of the kernel. he rch/i386 directory hs high error rte for the Null checker becuse we found 3 identicl errors in rch/i386, nd rch/i386 hs reltively few notes. If drivers hve reltive error rte (err rte drivers ) identicl to the rest of kernel, the bove rtio will be one. If they hve lower rte, the rtio will be less thn one. he ctul rtio, though, is fr greter thn one. For four of our checkers, the error rte in driver code is lmost three times greter thn the rest of the kernel. he Lock checker is the most extreme cse: the error rte for drivers is lmost seven times higher thn the error rte for the rest of the kernel. he only checker tht hs disproportionte number of bugs in different prt of the kernel is the Null checker. We found three identicl errors in rch/i386, nd, since there were so few notes in the rch/i386 directory, the error rte ws reltively high. hese grphs show tht driver code is the most buggy, both in terms of bsolute number of bugs (s we would suspect from its size) nd in terms of error rte. here re few possible explntions for these results, two of which we list here. First, drivers in Linux nd other systems re developed by wide rnge of progrmmers who tend to be more fmilir with the device Aggregted Error-Rte Correltion between Error-Rte nd Function Size Block Inull Intr Lock Null 1 1 Averge Function Size (Lines) Figure 5: his grph shows the correltion between function sizes nd error rtes. It is drwn by sorting the functions tht hve notes by size, dividing them eqully into four buckets, nd computing the ggregted error rte per bucket for ech checker. For ll of the checkers except Inull, lrge functions re correlted with higher error rtes. rther thn the OS the driver is embedded in. hese developers re more likely to mke mistkes using OS interfces they do not fully understnd. Second, most drivers re not s hevily tested s the rest of the kernel. Only few sites my hve given device, wheres ll sites run the kernel proper. 3.2 Lrge Functions Figure 5 shows tht s functions grow bigger, error rtes increse for most checkers. For the Null checker, the lrgest qurtile of functions hd n verge er
Search
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks