A framework for region-based instrumentation of energy consumption of program executions

A framework for region-based instrumentation of energy consumption of program executions
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  1 A Framework for Region-based Instrumentation of Energy Consumption of Program Executions Simon Ostermann, Thomas S. Eiter, Vlad Nae, and Radu ProdanInstitute of Computer Science, University of Innsbruck, Austria  Abstract —Energy efficiency has become a key issue in computerscience related research and development over the last years.While most approaches focus either on hardware or on software,we propose a solution incorporating both hardware and softwareenabling the measurement the energy consumption of codesegments executed on physical machines. Our novel approach toenergy measurement and instrumentation allows for both state-of-the-art offline analysis, and innovative online measurementsassociated with the code being executed. We present our modulararchitecture which shields the users from in-depth knowledge of their energy measurement hardware, and allows the developmentof code for measurement and instrumentation independent of each instruments’ proprietary interface. Finally, we propose anefficient method for increasing the accuracy of measurements forlow sampling rate measurement devices. I. I NTRODUCTION Energy efficiency has become a key issue in computerscience research and development over the last years. Drivingforces behind this are the ever growing energy consumption of computers worldwide, the trend towards mobile devices, andthe Green IT initiative, aiming to save ecological resourcesduring the lifetime of a product. According to [1], the globalenergy consumption of data centres has more than doubledbetween 2000 and 2005. This sharp rise has slowed downto approximately 56% between 2005 and 2010, partly dueto the technical improvements and partly due to the financialcrisis starting from 2008. As illustrated in Figure 1, the datacentres were responsible for about 0.5% (75 billion kWh) of the world’s energy consumption year 2000, increasing up to1.0% (150 billion kWh) in 2005, and reaching between 1.1%and 1.5% (200 to 275 billion kWh) in 2010. [1, p. 6]. Anotherreason for the increasing importance of energy is the trendtowards mobile devices such as smartphones and tablets thatconquered the market over the last years. For example, inGermany, U.K. and France there were more customers usingmobile devices than personal computers in 2011 [2, p. 2].Regarding energy consumption in computer science, GreenIT has become a major initiative whose major goal is to saveecological resources during lifetime of a hardware or softwareproduct (or a combination of both [3]), covering production,active runtime, recycling and biodegradability. Focusing onenergy, the aim is to consume as little energy as possible duringthe product life-cycle. To curb the energy consumption andimprove energy efficiency, many techniques and technologieshave been invented in the last decade, most of them focusingeither on hardware or on software. Besides the hardware,the software application plays a key role concerning energyconsumption, as it defines which hardware components are  0 50 100 150 200 250 300200020052010    E  n  e  r  g  y  c  o  n  s  u  m  p   t   i  o  n  o   f   d  a   t  a  c  e  n   t  e  r  s  w  o  r   l   d  w   i   d  e   (   i  n   b   i   l   l   i  o  n   k  w   H   ) 0.5 %1.0 %1.1 %1.5 %Percentages refer to worldwide energy consumption in the given year  Fig. 1. Energy consumption of data centres worldwide, based on [1, p. 13]. used, when they are used, and to what degree. Thus, the totalenergy consumption of a code execution on a specific machinedepends on both the machine and on the code characteris-tics. To achieve the best possible energy efficiency, one of the promising approaches is to design self-tuning codes thatreceive online feedback about their energy consumption andallow for fast and precise parameter adjustments for improvingtheir energy efficiency. However, energy measurements in thefield of data centres are coarse grained (e.g. power-rail level,rack level), difficult to collect (e.g. proprietary interfaces,sometimes not available to the users), and are typically notassociated to the codes being executed. Moreover, if energyprofiling data is available and can be associated to the code,it is only available post-mortem (i.e. after the code has beenexecuted) and not online during its execution.To address these issues we propose in this paper an energyinstrumentation and measurement framework which allowsimplementation and easy deployment of self-tuning codes indata centres. Our framework allows for both offline and online(i.e. during execution) analyses by providing detailed energyconsumption data for software program executions with aregion-level granularity. Our modular architecture shields theusers from needing in-depth knowledge of their energy mea-surement hardware, and allows the development of measure-ment and instrumentation code independent of the instrument’sproprietary interface. Finally, we propose an efficient methodfor increasing the accuracy of measurements for low samplingrate measurement devices (e.g. 0.5 – 10 Hertz).The next section describes the framework architecture,  2 Fig. 2. Framework architecture with one server managing two sessions, eachcomprising one measurement device, one client, and the application code. followed by the methodology for energy consumption com-putation in Section III. Section IV presents techniques forimproving the accuracy of energy measurements. Section Vdiscusses the related work and Section VI concludes the paper.II. A RCHITECTURE We propose an energy measurement framework based on aclient-server architecture which allows for easy extensions todifferent programming languages, as only the client part hasto be ported. The architecture allows the server to be run on adifferent physical machine and the deployment of lightweightclients with minimum energy overhead intrusion. A diagramof the proposed architecture with a server and two measure-ment sessions is presented in Figure 2. A server program( PMServer ) is continuously running on a service machine,communicating and managing all measurement devices such asVoltech PM1000+. The other machines electrically connectedto the measurement devices run the code to be measured, whichuses the  PMClient  to retrieve runtime online measurementsfrom the  PMServer . In the following we present the role of each component and the interaction protocols.  A. Measurement server  The  PMServer  is responsible for managing the directcommunication with the measurement devices, hiding thedifferent access methods and data representations of differentdevice types and allowing for easy client access to the mea-surement results via a predefined interface. The server runs ona dedicated machine (i.e. not the machine being instrumented),records power and/or energy measurements from different de-vice types, and makes them available to the clients. To requestdata from the measurement devices, the server implementsthe communication protocols provided and supported by therespective device types as decoupled modules. To maintainthe light coupling of components, the communication and dataexchange with the client is based on a separate communicationprotocol built over TCP/IP, and independent of the proprietarydevice (see Section II-D). For concurrent handling of multiplemeasurement devices, multiple clients, and multiple requests,we employ encapsulated measurement sessions. When a newmeasurement session is initiated, the requested measurementdevice is reserved and a session key used for further session-related requests and authentication by the session owner isgenerated. When a session is ended and its measurementresults are retrieved by the session owner, the session is deletedand the measurement device is freed.  B. Measurement client  The  PMClient  offers a unified code instrumentation in-terface, allowing to start and stop measurements on specificdevices connected to the server, and to collect and processthe corresponding data via simple function calls. The client isthe central piece in our architecture which allows the onlinecollection of measurement data with minimal disruption tothe measured process. Therefore, it is purposely designedas lightweight code, offloading all instrumentation data post-processing to the server to minimise its influence on themeasurements. The client is implemented as a library whichcan be utilised by the code being instrumented. For increasedflexibility, the client library provides multiple interfaces indifferent programming languages (e.g. C++, C, Java). Theclient communicates with the server using the communicationprotocol introduced in Section II-D. C. Measurement session The  PMSession  associates the power/energy measure-ments to a specific code region. A  PMSession  has a life-cycle determined by its associated code region and comprises asequence of procedures designed to initiate, terminate and col-lect measurement data from the device. A typical  PMSession is presented in Figure 3. The first step any instrumentedcode performs in its initialization phase consists of collectinginformation about the available measurement devices (the  getdevices  call) and select the relevant one. Then the relevantregions of code are instrumented for energy consumptionencompassed in a measurement session (line 5) and surroundedby  start  and  stop  session calls (lines 7 and 11). Duringthe  PMSession ’s life-cycle, the  PMServer  continuouslycollects the measurement data from the measurement device,which it aggregates and delivers to the  PMClient  after theend of the session (line 15).  D. Client-server communication protocol The communication protocol for data exchange betweenserver and client is built upon TCP/IP for platform andlanguage independence. The basic format of a client requestis  < HEAD > : < TAIL > , where  HEAD  contains a code definingthe action to be performed on the server, and  TAIL  containsthe corresponding parameters separated by a semicolon. Thesame message format is used for server answers, where  HEAD indicates if the action succeeded or an error occurred, and TAIL  is used for further action-related data.  E. Instrumentation interface The instrumentation interface offers the methods neededto discover the available measurement devices managed by  3 Fig. 3. Interaction protocol between  PMClient ,  PMServer , and measure-ment device during a  PMSession  life-cycle.Listing 1. Code instrumentation example using the C++ interface. 1  # include  < CPPInterface . h >  // C++ Interface 23  int main ( int argc , char  ∗∗  argv ) { 4  // Defining parameters for the session 5  pmCreateNewSession ( ” sessionId ” , ” ” , 5025) ; 6  pmStartSession (0) ;  // Start session on device 0 78  /* Insert HERE the code to measure */ 910  pmStopSession () ; 11  pmRetrieveResults () ; 12  /* Use the desired measurements */ 13  double consumption = pmGetEnergyConsumption () ; 14  pmDeleteSession () ; 15  . . . 16  } a server, to start, stop, suspend, and resume a measurementsession, and to collect, access, and process the measurementdata. The interface is designed for simplicity of use thathides many background operations from the user (e.g. servercommunication, result post-processing). An example of a codeinstrumentation using the C++ interface is given in Listing 1.The intialisation phase consists of creating the measurementsession with a certain ID and indicating the connection detailsfor the  PMServer  (line 5). The measurement begins withthe  pmStartSession  call using the measurement devicewith the specified ID (line 6). The code to be instrumentedshould be placed immediately after the session start call. Themeasurements are stopped with the  pmStopSession  call(line 10), then the values are retrieved from the server (line13). Finally, the instrumentation data is available locally andit can be used, as exemplified in line 12, and eventually thesession can be deleted and the data released (line 14).III. E NERGY CONSUMPTION COMPUTATION Some power measurement devices only provide the basicinstantaneous power consumption readings. Thus, in order toprovide energy measurements even for these legacy devices,our framework records the power consumption during the PMSession  and eventually compute the energy consumption.  140 150 160 170 180 190 200 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5    W  a   t   t Time (sec.)Measurement interval 250 ms Continuous power valuesMeasured power values  140 150 160 170 180 190 200 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5    W  a   t   t Time (sec.)Measurement interval 500 ms Continuous power valuesMeasured power values Fig. 4. Accuracy deviation using different measurement intervals.  0 5 10 15 20 0 5 10 15 20    D  e  v   i  a   t   i  o  n   i  n   % Session runtime (sec.) Measurement set 1Measurement set 2Measurement set 3Measurement set 4Estimated deviation Fig. 5. Average deviation of calculated versus measured energy consumption. To calculate the exact energy consumption: E   =    t n t 0 P  ( t ) · dt  (1)between the start time  t 0  and end time  t n  of the  PMSession ,we would require continuous values for the power consump-tion. Since only a discrete amount of samples are availabledetermined by the capabilities of the measurement device, weapproximate the energy  ˜ E   using the Riemann sum: ˜ E   = 12  · n − 1  i =0 w i  + w i +1 t i +1  − t i ,  (2)where  [ w 0 ,...,w n ]  represent the measured instantaneouspower consumptions and  [ t 0 ,...,t n ]  their associated times-tamps on the server. The accuracy of the integral approx-imation improves by decreasing measurement interval, asillustrated in Figure 4. The measured code is a CPU intensiveprecise computation of   π . Some measurement devices such asthe built-in integrator circuit of the Voltech PM1000+ offernative support for measuring energy consumption that can beused by our framework additional to the integration method.To assess the accuracy of the energy integration  ˜ E  , we com-pared it with the values measured by the hardware integratorof the Voltech PM1000+ power meter. We run the same CPUintensive code (i.e.  π  computation) with varying target preci-sion resulting in proportionally longer run times, starting from  4  120 130 140 150 160 170 180 190 200 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5    W  a   t   t Time (sec.) 0 20 40 60 80 100 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5    C   P   U   W  o  r   k   l  o  a   d   i  n   % Time (sec.) ∆ t Fig. 6. Delay between workload and power consumption increase reflectedat the outlet. 1 up to 20 seconds. Figure 5 illustrates the average deviationof the  ˜ E   values as a percentage of the hardware-measuredvalues, corresponding to different  PMSession  durations. Thetrend-line represents an estimation of the average deviation.We observed the highest average divergence of approximately20% during our shortest test sessions lasting 250 milliseconds.Overall, we observed an average deviation of more than 5%only for sessions shorter than 5 seconds. For longer sessions,the average deviation decreases to approximately 1%.IV. I MPROVING THE ACCURACY OF ENERGYMEASUREMENTS We present several methods for improving the accuracyof energy measurements. First, we present in Section IV-Aour approach to minimise the inaccuracy introduced by thedelay between the code execution and its reflected energydraw. Then, we propose in Section IV-B buffering methods foravoiding inaccuracies introduced by measurement truncation.Finally, we present in Section IV-C three methods for aggre-gating measurements collected with the buffering methods.  A. Workload – power consumption delay Because of the long, multi-tier electronic circuitry betweenthe mains outlet and the components involved in actual com-putation (e.g. the processor, the memory), it takes a certainamount of time from the moment when placing a workload ona machine (i.e. executing code) until this state change (i.e. idle-to-load) is reflected in the power consumption (see Figure 6).This delay is caused by multiple high capacitance capacitorsplaced on the power supply line for current stabilisation andnoise filtering purposes. The actual delay  ∆ t  between theworkload increase and the increase in energy consumption ismachine dependent and has to be taken into account to increasethe accuracy of our energy consumption measurements.We determine the typical delay of a machine based on powerconsumption measurements in three steps: (1) a session is runfor measuring the machine’s characteristic power consumption  140 145 150 155 160 165 170 175 180 0 1 2 3    P  o  w  e  r   (  w  a   t   t   ) Time (sec.)   w idle t idle t s w load t load idle phaseload phase Fig. 7. Time and power values available for delay calculation.TABLE I. C HARACTERISTIC WORKLOAD POWER CONSUMPTIONDELAYS  ( d m ). No. CPUs and GPUs Power  d m  w idle Type Count Supplies [seconds] [watt]1. CPU: AMD Opteron    6168 21 0.485 245.074GPU: AMD Radeon HD 5870 12. CPU: AMD Opteron 6168 21 0.412 176.074GPU: Nvidia GeForce GTX 460 13. CPU: Intel Xeon X5650 2 2 0.240 199.7364. CPU: Intel Xeon X5650 22 0.349 371.171GPU: Nvidia GeForce GTX 480 25. CPU: AMD Opteron 885 8 4 0.368 832.8166. CPU: Intel Xeon E7-4870 4 2 0.512 474.2037. CPU: AMD Opteron 8356 8 4 0.242 494.0928. CPU: AMD Opteron 8356 8 4 0.333 508.2449. CPU: AMP Opteron 885 4 2 0.372 423.764 in idle state; (2) a succession of intensive codes combinedwith idle periods are run on the machine and the powerconsumption is monitored; (3) the characteristic workloadpower consumption delay of the machine is computed for eachexecution and the results are aggregated.For each workload measurement session, the following val-ues are available in the last step (see Figure 7): (1) maximumcharacteristic power consumption jitter in the machine idlestate  j m ; (2) session start time  t s ; (3) time  t idle  and power w idle  of the last measurement before  t s ; (4) time  t load   andpower  w load   of first measurement after  t s  for which thecondition  w load  − w idle  > j m  holds. Since for time  t s  no powermeasurement is available (except for the extreme case where t s  =  t idle ), we use  w idle  as reference as it is the temporallynearest power measurement to  t s . To determine the delay valueof a session, we compute first the power increase gradient  F  : F   =  w load   − w idle t load   − t s .  (3)We compute  F   for all code executions and extract the  char-acteristic machine delay : d m  =  t s  − t idle  (4)from the maximum gradient. Using this method, we measuredthe characteristic machine delay of multiple machines aspresented in Table I. We observe for the  9  studied machinesdelays between  240  milliseconds and  512  milliseconds and nocorrelation of   d m  to the number of redundant power suppliesor the machines’ peak power consumptions.  5  120 130 140 150 160 170 180 190 200 0 1 3 4 5 7 8    P  o  w  e  r   (  w  a   t   t   ) Time (sec.)pre bufferingpost bufferingmeasurement sessiontotal logged measurementst 0 t s t e t n w pre w 0 w n w post Fig. 8. Logged values for a measurement session with pre- and post-buffering.  B. Measurement truncation If a measurement session is shorter than the typical measure-ment interval of a device, we obtain at most one measurementwithin this session from which we cannot calculate the overallenergy consumption. For this reason, we log on the serverside a certain additional amount of measurements before thestart and after the end of the session called pre- and post-buffering, as shown in Figure 8. This method also improvesthe accuracy of short sessions with multiple measurements, asthe additional pre- and post-buffering measurements allow usto better interpolate the energy consumption  E  t , as follows: E  t  =  w 0  + w pre 2  · ( t 0  − t s )+ w n  + w post  2  · ( t e  − t n )+ E  m ,  (5)where  t s  is the session start time,  t e  is its end time,  E  m  usthe session energy consumption calculated from measurements w 0  to  w n  captured at times  t 0  to  t n , with  i  ∈  [0 ,n ] :  t s  ≤ t i  ≤  t e , and  w pre ,  w post   is the pre- and post-buffering valuescaptured at times  t pre  and  t post  . Similarly to the delay, nomeasured values are typically available for the session start t s  and end  t e  times unless  t s  =  t 0  or  t e  =  t n . Hence, weuse for interpolation the pre- and post-buffering values  w pre and  w post  , as those are the temporally closest measurementsavailable before  t 0  and after  t n .For a better understanding of the Equation 5 and its use, wegive a short example measuring the idle energy consumptionof an imaginary machine for which the measurement devicemeasures a constant power of 200 Watt every 500 milliseconds.Our measurement session runs for 1.400 milliseconds startingat time  t s  = 50  and ending at time  t e  = 1450 . We have onepre-buffered measurement w 0  at time t 0  = 0 , one post-bufferedmeasurement at time  t 0  = 1500 , and two measurementsat times  t 0  = 500  and  t 1  = 1000 . The measured energyconsumption  E  m  within the session is: E  m  =  P  m  · t m  = 200 W   · (1000 ms  − 500 ms  ) =200 W   · 500 ms   = 100 Ws  .  (6)By applying Equation 5 for calculating the total energy con-sumption  E  t , we obtain: E  t  = 200 W  · (500 ms  − 50 ms  )+200 W  · (1450 ms  − 1000 ms  )+100 Ws   = 90 Ws  + 90 Ws  + 100 Ws   = 280 Ws  .  (7)  0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120    A  c  c  u  r  a  c  y   i  n  p  r  o  v  e  m  e  n   t   % Session Runtime (sec.) Fig. 9. Accuracy improvements using different session runtimes. By comparing the measured and the interpolated energy con-sumption in this example, we get an absolute difference of 180Ws, which means that the accuracy improvement using theinterpolation mechanism is 180%. However, this mechanismis mainly beneficial for short measurement sessions. Figure9 shows the accuracy improvements using different sessionruntimes sharing the same setup as in this example.Pre- and post-buffering further allows us to meet theworkload-energy delay problem described in the previoussection, and use it to offer different measurement selectionfunctionality to the user as described in the next section. C. Measurement aggregation functions Based on the additional measurements available throughpre- and post-buffering, we offer three different (measurement)aggregation functions which determine the range of measure-ments retrieved from the server. a) Skewing:  aggregation function takes into account thecharacteristic delay  d m  of the machine, as described in SectionIV-A. This delay is added to session start  t s  and end  t e  times,so that the new session start  t sn  and session end  t en  correspondto:  t sn  =  t s  +  d m  and  t en  =  t e  +  d m . All measurements inthe  [ t sn ; t en ]  time interval are sent to the client, including theinterpolated values for  t sn  and  t en . b) Graph:  aggregation function detects the increasingand decreasing flanks of a measurement session based onthe measured power consumption values. All measurementsbetween the start of the first increasing flank and the end of the last decreasing flank are sent to the client. This aggregationfunction can be employed only if the energy consumptionincreases at the beginning of a session, and decreases at theend, therefore it is useful for the precise instrumentation of asingle power-intensive code region. c) Basic:  aggregation function is a special case of skew-ing with the machine-specific delay  d m  = 0 . All measurementsbetween the session start and end are sent to the client withoutconsidering the workload energy delay. If necessary, the valuesfor the start and end time are interpolated. This represents thesafe approach in case the characteristic delay of the machineis not known (i.e. skewing cannot be applied) and the userinstruments multiple consecutive power-intensive code regions(i.e. graph cannot be applied).Figure 10 illustrates the value ranges returned when apply-ing the different aggregation functions for measuring a power-intensive test code which computes  π  to a certain precision.The total execution time of the test code is 6 seconds.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks