Documents

TD_100066

Description
gffgfg
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  To Cache or not to Cache: The 3G case JeffreyErmanAlexandreGerberMohammadT.HajiaghayiDanPeiSubhabrataSenOliverSpatscheck AT&TLabs-Research ABSTRACT Recent studies have shown that in the wired broadbandworld, caching of HTTP objects results in substantial sav-ings in network resources. What about cellular networks?We examinethe characteristics of HTTP traffic generatedbymillions of wireless users across one of the world’s largest3G cellular networks, and explore the potential of forwardcaching. We provide a simple cost model that third partiescaneasilyusetodeterminethecost-benefittradeoffsfortheirown cellular network settings. This is the first large scalecaching analysis for cellular networks. 1. INTRODUCTION Cellular networks have witnessed tremendous growthrecently. For instance, one major US wireless carrierclaimed to have experienced a growth of 5000% in itsdata traffic over 3 years, while a network equipmentmanufacturer [2] predicts mobile data traffic will growat a compound annual growth rate (CAGR) of 108% be-tween 2009-2014. Despite this rapid growth, fueled bythe proliferation of smartphones, laptops with mobiledata cards and new technologies improving the perfor-mance of cellular networks, there is still a rather limitedunderstanding of the protocols, application mix and thecharacteristics of the traffic being carried. For wire-line broadband traffic, recent studies have shown thatthe new killer app traffic is HTTP again [13, 10] andforward caching is a promising content delivery mecha-nism [10]. What about cellular networks?Our flow level data set, collected over a 2-day pe-riod in March 2010 in a large US wireless carrier re-gion coveringmultiple states and millions of subscribers,shows that HTTP traffic accounts for 82% of the aver-age downstream traffic. HTTP being the killer pro-tocol should not come as a surprise. Indeed, if HTTPhas become the workhorse of various applications, rang-ing from video streaming to data downloads [13, 14]on broadband wireline access links, there are no rea-sons why it should be different for traffic generated bythese same computers when they use 3G cards instead.HTTP dominating cellular data traffic naturally raisesthe question of the potential for HTTP forward caching.Proposed first in the 1990s for achieving improved clientperformance and reduced network cost, a forward cacheis an HTTP cache is deployed within an Internet Ser-vice Provider’s (ISP) network for caching all cacheable Figure 1: 3G Architecture HTTP traffic accessed by its customers. In contrast toCDNs, a forward cache is deployed for the customersbenefit and under the control of the ISP, rather thanfor the benefit of the content owner. Would that tech-nology make sense for cellular networks?We first review the typical architecture of a 3Gnetwork. Figure 1 shows a typical Universal MobileTelecommunication System (UMTS) data network ar-chitecture. Contrary to a typical wireline architecturethat is relatively flat, cellular networks are highly cen-tralized. A User Equipment (UE) goes through the Ra-dio Access Network (RAN) first to the Node B, and thento the Radio Network Controller (RNC) to reach thecore network (CN). The CN consists of Serving GPRSSupport Nodes (SGSN) and Gateway GPRS SupportNodes (GGSN). The SGSN converts the mobile datainto IP packets and send them to the GGSN through theGPRS Tunneling Protocol (GTP). The GGSN serves asthe gateway between the cellular core network and theInternet. This means every IP packet sent to a UEhas to go through a GGSN. Multiple SGSNs are storedin Regional Data Centers (RDC) and GGSNs are collo-cated in National Data Centers (NDC). This centralizedarchitecture is ideal for forward caching. Therefore, itis worth studying the tradeoff between the network costreduction and the additional cost of having the cachesin cellular networks.One might argue that improving the performance of the core network via caching won’t make a significantdifference to end-end latency, given the high latency of today’s 3G RAN networks. This is a short term issue:the next generation of cellular networks (Long TermEvolution (LTE)), which are currently being deployed,1  plan to provide RAN latencies under 10 msec [1].Having made the case why it makes sense to studyforward caching in 3G networks, we first characterizethe properties of the cellular HTTP traffic amenableto caching in Section 2. We develop a cost model thatconsiders different resource costs involved and computesthe cost of using forward caching at different levelsin a 3G network hierarchy (see Section 3). This canbe used by network designers in performing the cost-benefit analysis of deploying forward caches in their net-work and determining the appropriate caching solutionfor their situation. Our results show: ã  At the NDC level, the cache hit ratio for the overallpopulation of UEs is 33%. ã  Cache hit ratios increases as the UE populationsize increases. For sizes of 10 K   or more, differentrandomly selected UE populations of the same sizeexhibit significant and similar cache hit ratios. ã  Using our caching cost model we show that in theregime where in-network caching leads to cost sav-ings, caching at the RDC is the most beneficial. 2. TRAFFIC ANALYSIS2.1 Data Collection We collected HTTP request and response headers atthe interface between the GGSNs and SGSNs in a large3G wireless network in North America over a 2 day pe-riod in March 2010. During this period, we observedmillions of UEs including laptops, smartphones, andregular cellphones making billions of HTTP requests.To preserve subscribers’ privacy, we used a secure hashfunction (MD5) to hash the URL, host header andnonces into a request identifier.Since traffic on the SGSN-GGSN interface is encap-sulated using the GTP protocol, we were also able to di-rectly extract from the encapsulation header, the NDC,GGSN, RDC and SGSN through which a particularHTTP request was served. Unfortunately, our collec-tion method did not allow us to collect the size of theobject returned for a particular request and we reportour results in terms of requests only. 2.2 Forward Caching Background We first introduce some forward caching background.When a HTTP request arrives from a client, the cachedirects the request to the web server (called originserver) if the request indicates that the client wantsa fresh copy of the requested object. Otherwise, thecache checks whether it has a local copy of the object.If not, the request is retrieved from the srcin server.If yes, the cache checks if it is stale (TTL expired). If not stale, the cache serves the request locally from disk.If stale, the cache sends a “if-modified-since” requestto the srcin server, and serve the object locally if thesrcin server answers no or receive the object from thesrcin server if it has changed.When the request is received from the srcin serverthe cache will serve the object to the client. If the objectindicates it is cacheable, the object is also written to thelocal disk. 2.3 Data Characterization We next show some highlights characterizing the dataused in this study.As we do not have reliable object sizes, we have cho-sen to use just an unlimited cache size for all our evalu-ations to understand the maximum obtainable benefits.Experiments in this section were all conducted usingdata from Thursday March 25, 2010,15:00 GMT to Sat-urday March 27, 4:00 GMT, containing many billionsof HTTP requests.The first experiment looks at the cache hit ratioacross all the requests. This is the same as if cachingat the NDC level. At the end of the period, the hit ra-tio was stable at 33.4%. The amount of non-cacheableobjects was 31.3% of the overall requests. If the non-cacheable requests are excluded, 48.7% of the cacheableobjects are served from cache. While there is no previ-ous work for 3G traffic, previous caching work on wire-lines networks [11, 4] and our own [10] found cache hitratios which range from 30%-49%. The majority of theearlier results when taking into account all the non-cacheable content are on the lower end of that rangewhich is similar to our findings here.We also investigated the number of requests per ob- ject which as expected follows a Zipf distribution. Tocompare this result to prior work we used the zipf R li-brary [3] to fit a Zipf distribution to the data. The zipf distribution has an  α  of 0 . 88. [15] includes a compara-tive study of multiple papers [7, 12, 5, 8, 9, 6] which per-formed similar studies more than 10 years ago. Thesepapers reported  α  values between 0 . 64 and 0 . 83 whenobject popularity is measured inside the network andclose to 1 when measured on the end devices. The dif-ference can possibly be explained by the fact that enddevices cache popular object themselves (e.g. browsercache) and, therefore, less requests for these objects arevisible in the network. Our distribution is a bit higherthan the high end of the  α  range for in-network mea-surement. This difference could be due to changes intraffic composition in the years since those older stud-ies were performed. It could also be due to the factthat 3G end devices have less resources then a desk-top computer and, therefore, are somewhat less likelyto cache all objects they might request in the future.On the other hand, the fact that the  α  is still not asclose to 1 as prior end user studies have shown might2   0 0.1 0.2 0.3 0.4 0.5 1000K100K10K1K10010    C  a  c   h  e   H   i   t   R  a   t   i  o Number of Random UE in Sampled Figure 2: Cache hit ratio for different randomlychosen UE sizes. indicate that even mobile end-devices already performsome level of local caching.To evaluate the cache hit ratio as the number of UEsthat use a cache increases, a set of experiments wasconducted. The number of UEs chosen for an experi-ment was varied from 10 to 1 million increasing by anorder of magnitude at each step. The UEs were chosenrandomly from a list of several million UEs that wereactive during the evaluation period. For sample sizesbelow 10000, the experiment was repeated 50 times andall others repeated 10 times. Figure 2 shows the min-imum, median, and maximum values obtained for theexperiments.The most interesting part of the result is how thecache hit ratios increase with consumer population size.As we were able to analyze data for such a large popula-tion, these results will actually allow us to guide futurenetwork designs. For example, if a 4G network is beingplanned and caching is being considered during the de-sign phase either for cost or performance reasons, thesenumbers can guide the carrier to plan the correct num-ber of aggregation points to allow for efficient caching.Looking at the results, it is not surprising that thecache hit ratio has high variance for populations below10000subscribers. However, abovethat population size,it seems that caching is similarly beneficial to all ran-domly chosen populations of a given size. This wouldindicate that caches deployed for populations of 10000or more will have predictable benefits. Not surprisinglythe cache hit ratio increases as the population size in-creases, and we see the increasing trend even with 1million UEs, suggesting additional caching gains evenat this high level of population aggregation. There ishowever a diminishing returns trend with increasing UEsize - that would be more apparent if the x-axis wereon a linear instead of a logarithmic scale. 3. CACHING ANALYSIS3.1 Caching Model We model all wireline costs of deliveringdata traffic ina 3G network to SGSN, and exclude the radio networkcosts and the wireline cost from GGSN to UE from ouranalysis. This is reasonable since we do not considerchanging the caching on the UE itself and as such theexcluded costs is not impacted by any caching schemadeployed within the included wireline network.We number the different levels in the 3G networkhierarchy, see Figure 1, in increasing order of depthwith an NDC, GGSN, RDC, and SGSN at levels1 , 2 , 3 , 4 respectively. Let  e i  denote the number of in-stances/locations (e.g., NDCs, GGSNs, etc.) at level i . We shall consider caching at a single level in the3G network hierarchy, with a forward cache deployedat every location in the selected level  i  (1  ≤  i  ≤  L  = 4).We also assume that a UE stays under the same SGSNfrom when an object is requested to its delivery time.We assume unlimited cache size and processing powerper cache, but cost proportional to cache size and pro-cessing throughput.Let  n  be the network-wide total number of requestsfrom all UEs over the time interval of interest. For theforward cache at the  j th location at level  i , we countthe following: (i) requests arriving from UEs which areleaves of the subtree rooted at this location ( n i,j ), (ii)number of these requests that are for cacheable objectsbut which require fetching the requested object fromits origin server ( n srci,j  ), (iii) requests for which the re-quested objects are served from the cache ( n cachei,j  ), and(iv) requests for which the cache has a stale copy of the object (based on object validity timestamps) andtherefore needs to send an if-modify-since request to thesrcin server to check if a new copy of the object needsto be downloaded ( n ifmodi,j  ). Note that for some of the n ifmodi,j  requests, the source server will indicate that thecached version of the object is still valid, in which casethe actual object will be served to the UE from the lo-cal cache. Finally let  n ui,j  denote the total number of unique cacheable objects requested over the observationtime interval. Let  p  denote the mean size (bytes) of arequested object, and  q   be the mean overhead (in bytes)associated with servicing an if-modified-since request.We need the following cost metrics per byte of traffic.At the caching infrastructure,  s : disk storage,  c : CPUusage,  d : disk bandwidth. Let  b i,i +1  and  t  respectivelybe the (i) bandwidth-mile cost per byte on the networkpath between 2 adjacent levels  i  and ( i  + 1) , (1  ≤  i  ≤ L  −  1) in the 3G network and (ii) the transit cost perbyte that the 3G operator pays to its upstream providernetwork. Define  B l,m  to be the bandwidth-mile cost perbyte on the network path between levels  l  and  m  (1  ≤ l < m  ≤  L ). Then,  B l,m  =  m − 1 k = l  b k,k +1 . 3.2 Cost Analysis: Caching at level  i The overhead of serving the requests using cachingat level  i  is  O cachei  =  e i j =1  O cachei,j  , where  O cachei,j  is3  the cost of serving requests arriving to the part of the3G network being served by the  j th cache at level  i . O cachei,j  =  R i,j  +  N  i,j  +  T  i,j , where  R i,j  is the resourceusage (disk storage, disk bandwidth and CPU usage)at the caching system,  N  i,j  is the cost of serving theobjects over the 3G network, and  T  i,j  is the transit costthat the 3G operator pays to its upstream provider net-work. For ease of exposition, we define the correspond-ing network-wide cost components across all the cachesat level  i :  R i  =   j  R i,j ,N  i  =  j  N  i,j ,T  i  =   j  T  i,j .Then, O cachei  =  R i  +  N  i  +  T  i  (1)We next compute each of these components. Computing  N  i :  N  i,j  is the sum of the following:1. The bandwidth-mile cost ( n i,j  −  n cachei,j  )  ∗  p  ∗  B 1 ,i on the network path between the NDC and the cachingserver, incurred when requested objects need to befetched from the srcin server, either because an ob- ject was not in the cache or because the cache had astale copy that needed to be updated.2. The additional bandwidth-mile cost  n ifmodi,j  ∗ q  ∗ B 1 ,i incurred by the if-modified-since requests (the cost of actual object download is already accounted for in 1above) on the network path between the NDC and thecaching server.3. The bandwidth-mile cost  n i,j  ∗  p  ∗  B i,L  incurredon the network path from the caching location down tothe UEs 1 Every request contributes to this cost.Let  V   i =  j  n i,j ∗  p . and  V   i 1  =  j ( n i,j − n cachei,j  ) ∗  p +  j  n ifmodi,j  ∗ q  . Here  V   i 1  is the total traffic volume carriedbetween the srcin servers and the caches at level  i .  V   i is the total traffic corresponding to all the incomingrequests to the network. Since this is independent of the caching level, we shall drop superscripts and use V   to refer to this term in the remainder of the paper.Then N  i  =  j N  i,j  =  V   i 1  ∗  B 1 ,i  +  V   ∗  B i,L  (2) Transit  T  i : The transit cost  T  i,j  for traffic betweenthe NDC and the provider network is incurred for (i)objects that need to be fetched from the srcin server,and (ii) for servicing if-modified-since requests.  T  i,j  =(( n i,j  −  n cachei,j  )  ∗  p  +  n ifmodi,j  ∗  q  )  ∗  t . Then T  i  =  j T  i,j  =  V   i 1  ∗  t  (3) Computing  R i :  R i,j  is the sum of the following:1. The storage overhead cost  n ui,j  ∗  p ∗ s  at the cache.2. The disk bandwidth cost ( n cachei,j  +  n srci,j  )  ∗  p  ∗  d  atthe caching system used for reading (for a cache hit) or 1 This only includes the wireline costs between the cache andthe UE as the radio network cost is excluded. writing (for a new cacheable object or replacing an oldobject with an updated version) to the disk.3. The CPU and system bus overhead at the cachingsystem =  n i,j  ∗  p  ∗  c .Let  V   i 2  =  j  n ui,j  ∗  p , and  V   i 3  =  j ( n cachei,j  +  n srci,j  )  ∗  p .  V   i 2  and  V   i 3  are the total volume of traffic storedat the level  i  caches and corresponding to requests forcacheable objects, respectively. Then R i  =  j R i,j  =  V   i 2  ∗  s  +  V   i 3  ∗  d  +  V   ∗  c  (4) 3.3 Caching Benefits In the absence of caching in the 3G network, eachrequest incurs the cost of traversing the entire 3G hi-erarchy from the NDC to the UE and of transiting the3G-upstream provider interface. As pointed out before,this only includes the wireline costs between the cacheand SGSN as the network from SGSN to UE t is ex-cluded. The total cost for serving the requests can becomputed as O nocache =  n ∗  p ∗ B 1 ,L + n ∗  p ∗ t  =  n ∗  p ( B 1 ,L + t ) =  V   ∗ ( B 1 ,L + t )(5)and the caching benefit at level  i  is  O nocache − O cachei  =( V,V   i 1 ,V   i 2 ,V   i 3 ) ∗ ( − c + B 1 ,i + t, − t − B 1 ,i , − s, − d ) T  , (notethat  B 1 ,i  =  − B i,L  +  B 1 ,L ). This equation is basicallythe product of one traffic pattern vector ( V,V   i 1 ,V   i 2 ,V   i 3 )and one cost parameter vector. Table 1 shows the trafficpattern vector (normalized by  V   to keep the data confi-dentiality), computed for over a billion HTTP requestsarriving over a 12 hour period to the large North Amer-ican 3G provider described earlier. This realistic trafficpattern vector can be plugged into formulas to evaluatecaching benefits at different levels for a different net-work where the traffic data are not readily available.The formula for caching benefit at level  i  canbe rearranged to the following form such that itcan be seen that all cost parameters have lin-ear impacts on the total cost when other param-eters are fixed: ( − V, − V   i 2 , − V   i 3 ,V   −  V   i 1 ,V   −  V   i 1 )  ∗ ( c,s,d,t,B 1 ,i ) T  = ( − V, − V   i 2 , − V   i 3 ,V   −  V   i 1 ,V   −  V   i 1 )  ∗ ( c,s,d,t,  i − 1 k =1  b k,k +1 ) T  , as shown in second last col-umn of Table 1. It is also apparent that none of theparameters has dominating impact on the total costbecause the constants of the factors are close to eachother (range from 0.23 to 1). 3.4 Simplifying cost parameters As the number of cost parameters is large, we now in-troduce a practical approach to simplifying them whereit makes sense to do so. We note that, although dif-ferent networks probably have different cost parametervalues, our simplification approach below can apply to4
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks