Optimized Content Distribution in a Push-VoD Scenario

Optimized Content Distribution in a Push-VoD Scenario
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Optimized content distributionin a push-VoD scenario Sandford BesslerTelecommunications Research Center Vienna (ftw.)Donau-City 1, A-1220 Vienna Austria  Abstract —This paper discusses a novel approach for distrib-uting a large number of multimedia (video) titles to users via amanaged network infrastructure based on DSL technology. Theapproach, sometimes called push-VoD, is based on multicastingand the direct download of the content on user disks, followed bythe local, but delayed playback of the video. We analyze in a firststep the optimization problems related to the efficient packing of multicast trees on capacity constrained links. Then, we show thatthe model in which the user requests several alternative titles fordownload performs much better than the single title case. Finally,we consider the multiperiod scheduling performance and describeways to reduce both the number of downloads of the same titleand the delivery time of less frequent titles. Keywords:  push video on demand, multicast, tree packing,multiple knapsack, mixed integer programming, content ad-mission schedulingI. I NTRODUCTION Internet traffic statistics showed recently a decrease of P2Ptraffic percentage in favor of video traffic that is today easilyembedded in community web sites such as YouTube. Analystsare warning that a further increase of the video traffic in theinternet would change the economic model of ISPs forcingthem eventually to block the distribution of video streams totheir users. The long awaited Video on Demand (VoD) servicehas still top priority among internet users, especially if existingcontent could be made globally available. Meanwhile, mostcable and network providers have adopted a closed modelthat serves the own user population, is based on a relativelysmall number of local and popular content titles, but enablesthe full control of the content distribution process. However,when the number of users and the number of available titlesincrease in the range of hundreds of thousands, even thelargest CDN networks fail, as the March Madness experimentshowed [1]. Proposals for world-wide VoD service are basedcurrently on two approaches: in the P2P approach [2] thecontent is downloaded or streamed to the disks of user peers.Variations of this model such as push-to-peer [10] optimize thedistribution and partition of the videos in smaller chunks. Theglobal amount of traffic does not decrease with this approach,but the congestion on the main paths from the central serversis avoided. The second approach is push-VoD adopted byStratacache [3] Intercast [4] in which user requested content isdownloaded directly to the user disks during the low utilizednetwork hours (at night) using multicast transmission. Thevideo content (that can be in this case also high definitioncontent) is played later on, locally. A comparison of the twoapproaches goes beyond the scope of this paper, we merelywant to investigate the efficiency gained through the buildingof optimized multicast trees in the push-video on demandarchitecture.  A. Related work  The key for a scalable video on demand service is the use of multicast, a technology extensively studied in the last 20 years,in particular in the context of IPTV streaming and multipartyconferencing. Most optimization problems that involve mul-ticast algorithms have been focused on routing and cachingat intermediate nodes. (see [7], [6] for surveys on multicastoptimization problems and VoD services). Specifically, thecapacity planning for the necessary bandwidth in a multicastnetwork is modelled as a packing problem, or a multipleknapsack problem [8], [9]. The Steiner tree packing problem[13] arises when, in a meshed network, trees with a commonnode and a set of terminal nodes have to be constructed in sucha way that they satisfy edge capacity constraints. The specificgoal is to maximize the number of those coexisting trees. In[8] and [9] the objective of the multicast packing problem isto minimize the total load of the most congested link, wherethe link load is the sum of the traffic demand of all multicastgroups that traverse that link. The variables are the same asin the Steiner tree problem. Both formulations assume that afeasible solution always exists, meaning that the network cansupport all the given multicast groups.The problem we address in this work is different: sincenot all the requested content can be delivered at once, weconcentrate on the admission and the scheduling of the contentdelivery into the network: we assume a fixed routing, i.e thepath from the server to each user is fixed, meaning that themulticast nodes are not variables of the problem. Instead, giventhe link capacities, we are first looking for a feasible subset of trees, such that the number of served terminal nodes (users)is maximized.In a second step we repeat this procedure periodically andaim to optimize the schedule of downloads, considering thewaiting time from the moment a user has selected a contentup to its delivery. For the case each user selects one singletitle, the tree construction is straightforward and the problemto be solved is the selection of a subset of the trees. In amore realistic scenario, users select a number of titles, butonly one of them should be downloaded simultaneously withinthe considered period. To our knowledge such a case has not  been investigated before, although it leads to a substantialincrease of the multicast efficiency. The formulation of thisproblem and the evaluation of its performance are the maincontributions of this work.The rest of the paper is organized as follows: Section2 presents a simple and an improved optimization model.Section 3 reports on simulation results and compares theperformance of both models. Section 4 analyzes the longterm scheduling of content downloads. In section 5 we giveconcluding remarks and further research directions.II. T REE ADMISSION MODELS  A. Single title model For this basic model we consider a snapshot of the network load and of the demand situation. We assume the contentconsists of a collection of   T   video titles that have to bedistributed from one central server via a tree network. Thelinks forming this tree is the set  E  . At the bottom of thetree are U users equipped with large disks. The simplifyingassumption is that for each user, a path is predefined so thatthe load of the corresponding links can be determined. This isequivalent with a hierarchical distribution network, a typicalconfiguration at most DSL network providers.To simplify the scheduling problem, we assume that thedownload of any title occurs in a time-slot of a fixed duration.This can be achieved by adapting the download rate of eachcontent tree  d k ,k  ∈  T  ).As mentioned in the introduction, the key mechanisms toreach scalability with respect to the number of titles and to thenumber of served users is to use multicast at the intermediatenodes. In the basic admission model each active user selectsone title to download. The titles follow a long tail distribution(i.e. according to the Zipf law).The number of possible multicast trees clearly equals to thenumber of titles T, but only a part of them can eventually becarried by the network capacity. Each tree has the demand  d k and  n k  users on it. A straightforward objective is to selectthose trees that maximize the number of served users. If therouting is fixed, then this integer problem is equivalent to amultiple knapsack (a knapsack for each link):P1: max  k ∈ T  n k y k  (1)subject to:  k ∈ T  y k d k r ek  ≤  c e ,e  ∈  E   (2)where  n k  is the number of users that requested the content  k ,the binary variable  y k  ∈ { 0 , 1 }  is one when the content tree k  is selected for download, and zero otherwise. The routingvector  r  has components  r ek  = 1 , if the tree  k  contributestraffic to the link   e , and 0 otherwise.  c e is the capacity of the link   e . It is well-known that 0-1 single knapsack problemsare NP-hard [11]. However because they are not hard in thestrong sense, they can be solved by dynamic programming. Onthe other hand, 0-1 multiple knapsack problems are NP-hardin the strong sense [12], so no pseudo-polynomial algorithmscan exist.  B. Multiple title model The single title model is rigid, it does not support a realisticbusiness model and has poor optimization potential. To extendit, we consider that each user selects P titles for download,however the order in which these titles are downloaded isdecided by the system. To simplify things we assume that theseP titles can be randomly selected by the user from the set of all titles T. Therefore, the optimization problem to be solvedis to decide which content should be selected for downloadfor each user in the current period.Commercial DVD rental systems such as Netflix [5] have asimilar model, selling a package with limitation on the numberof titles per week or viewing hours per week. Technically, inthe single title model, a content tree that has been selected fordownload contains all the users with that content. For multipletitles the situation is more complex as we can see from thefollowing example in Fig. 1. In the example we have 4 users, Maximize the number of served users, s.t. capacity constraintsT_2-{a} +T_1 f={2, 3, 2, 3} or T_2+T_1-{a} f={3, 3, 2, 3} non-feasibleT_1+T_3+T_4-{c} f={2,1,2,3}...12321424abdce1=2e2=4e3=2e4=4e6=5e5=5Titles:Users:Title download capacity:d_1=2d_2=3d_3=1d_4=3 Fig. 1. Example a,b,c,and d and four titles we denote with ”1”, ”2”, ”3” and”4”. Each user has selected 2 titles, such that the possibletrees are:  T  (2) =  { a,b,d } ,  T  (1) =  { a,c } ,T  (3) =  { b }  and T  (4) =  { c,d } . We see that if we include in the solution thetree for content 2 and the tree for content 1, we have to removeuser ”a” from one of the trees, because of the constraint thatmaximally one title can be downloaded by a user at a time.We denote with  x iu  the binary variable that becomes trueif the i-th of P titles of user u is selected for download. Thebinary variable  z ek  shall describe whether the content title  k contributes to the load on link   e . We formulate such a disjunctconstraint : z ek  =   0  if   i ∈ P   u ∈ U  ek x iu  = 01  otherwise(3)  where the set  U  ek  contains those users that request the content k  and contribute (by their fixed routing) to the load on link   e . U  ek  =  { u  :  a kui  = 1  AND  r eui  = 1 }  The binary matrix with theelements  a kui  denote the current assignment of titles to users,whereas the elements  r eui  denote the fixed routing of each usercontent (u,i) to a link e.Equation (3) states therefore the multicast flow conservation(which is not flow addition !): as soon as one single userrequests a content k, all the links on the path up to the serverare affected by the load caused by k.To express this dichotomy in a form that can be processedby a solver, the standard way is to use the big-M formulation[16], where M is a large integer: z ek  ≤  M   i ∈ P   u ∈ U  ek x iu  (4) z ek  ≥  1 / ( n k  + 1)  i ∈ P   u ∈ U  ek x iu  (5)where  n k  is the number of occurrences of the content k among the users, therefore the right term is between zero andone. We are now able to formulate the problem for multipletitle case (P2), where we use one summation symbol for clarityreasons:P2: max  u ∈ U,i ∈ P  x iu  (6)subject to:  k ∈ T  d k z ek  ≤  c e , ∀ e  ∈  E   (7)  i ∈ P  x iu  ≤  1 , ∀ u  ∈  U   (8) z ek  ≤  M   u ∈ U,i ∈ P  a kui r eui x iu  (9) z ek  ≥  1 / ( n k  + 1)  u ∈ U,i ∈ P  a kui r eui x iu  (10)We have replaced the sets  U  ek  in (4) by using the elements a kij  and  r eui  as defined above. In addition to the capacityconstraints (7), we have the constraint (8) that at most oneof the P titles may be selected. In this formulation, theproblem has   U   ·  P    +   T   ·  E    variables and about  U    + 2  T   ·  E    constraints.III. N UMERICAL RESULTS The models described above have been implemented inAMPL and use the CPLEX solver from ILOG on a Pentium 4CPU with 1GB RAM. The considered distribution network isa DSL hierarchical network with three levels: each group of M1 users is connected to a node (equivalent to the DSL accessmultiplexer) over DSL lines. Their capacity do not pose anyconstraints in our problem. A second level of M2 links of capacity C2 are connected to a node and a third level of M3links of capacity C3 lead all to the central server, so that thenumber of edges in the network are  E   =  M  2 +  M  3 .The total number of available titles T have a long taildistribution approximated in our case by a square function.Thus, in order to generate randomly an index  k  of a title, weuse  k  =  Tr 2 where  r  is a random function returning valuesbetween zero and 1, T is the number of titles. In Fig. 2 we havedepicted the title frequency distribution for T=400, P=5, i.e.the number of users that have requested a certain title index.A second curve in the figure describes the distribution of thetitles in the solution of the problem P2 during a single timeperiod. 0501001502002501 Title index        T       i       t       l      e Request distribution Solution distribution Fig. 2. Title request frequency As mentioned before, we denote with  d k  the download ratefor the content with index k, which is a function of the moviesize.  d k  is generated at random between 1.5 Mbps and 3 Mbpsand corresponds to MPEG2 coded video movies that can bedownloaded in a constant period of time.In order to find lower bounds of the mixed integer problem,CPLEX uses several heuristics (Gomory cuts, clique cuts, etc.)and solves small and medium instances up to 5000 users and1000 titles in several minutes.In order to evaluate the impact of P on the number of served users, we solve the problem P2 with increasing userpopulations. In Fig.2 we present a family of 3 curves for threevalues of P (1, 3 and 5). The number of served users on they axis is relative to the number of total active users, U. Thecurves show a significant increase in the distribution efficiencyfor the multiple title model ( P >  1 ). Even a small number of alternatives (P=5) is enough to boost the performance so that70% of the users are served within the period.However, the high percentage of 70% depends on severalfactors, such as network capacity and the number of availabletitles T. In the experiments, the capacities have been selectedto constrain the network, so that only a small part of theselected content is downloaded in one time period. Note that,if the number of titles increases, the multicast trees becomesmaller, serving less users. In Fig. 3 we describe the impact of increasing T. We observe a stronger and continuous decreasein the multicast performance for higher values of P (P=5).  01020304050607080901000 500 1000 1500 2000 2500 3000 Active users, U    S   e   r   v   e   d   u   s   e   r   s   (   %   ) P=1 P=3 P=5 Fig. 3. Served users for different P values for T=100 010203040506070800 200 400 600 800 1000 Available titles, T    S   e   r   v   e   d   u   s   e   r   s   (   %   U   ) P=1 P=3 P=5 Fig. 4. Impact of the number of available titles T on the objective for U=1000 The results are meaningful if they are extrapolated to largerinstances of the problem. In other words, a 30% percentageof the users can be served when the number of video titlesequals the number of users for example 1000, or 10,000.IV. M ULTIPERIOD CONTENT SCHEDULING In the previous sections we have addressed the admissionof a set of multicast groups (trees) during a download timeslot. The selected objective function is the number of servedusers. Such an objective favors large trees of popular contentbecause of the larger number of contributing users. Thus,popular content tends also to be downloaded more often.There is extensive research done for near-VoD systems inwhich the mechanism called batching is used to wait untilseveral requests for the same video title can be served (bymulticast). In [15] for example several scheduling policiesare compared aiming to minimize reneging user behavior(canceling the request), reduce waiting time and maximizefairness (for less popular movies). In the simple push-VoDapproach, we can relax the reneging aspect since the waitingtime is much larger and the user knows that.However, the question is still how to schedule the downloadsin the longer term. Two aspects we focus on are a) theredundancy created by frequent downloads of the same titleand b) the time a user has to wait for some rare content to bedelivered.Concerning a) we can make an important observation fromthe example in Figure 1: to resolve the ties between competingtitles, the corresponding trees are reduced, meaning that a partof the users that are interested in that particular content willhave to download it again during the following time periods. Inaddition, new requests for that content in the following periodswill increase the multicast group.In order to reduce frequent downloads of a certain titlewe have to discourage the admission of a content that hasbeen selected in the last or very recent period. With the sameweight function we can address requirement b) as follows: wepropose to give higher weight to ”longer waiting” contents.After a number of waiting periods, the weight increases faster(penalty function). The result is the function  w k ( t ) ,k  ∈  T  with  w k (0) = 1 , and parameters  c 1 ,c 2 ,t M   >  0 . w k ( t ) =  c 1  if k is downloaded in period t w k ( t − 1) +  c 2  if   t  ≤  t M  w k ( t − 1) +  c 3  if   t > t M  (11)This piece linear function is set to a low value after thecontent  k  has been downloaded, increase linearly, then itprovides a penalty function through a steep further increase.The objective in (7) becomes therefore: max  u ∈ U,i ∈ P  w k ( u,i ) ( t ) x iu  (12)where  k ( u,i )  is the content index the i-th title of user u.Through introducing the weights  w  in the objective, weimprove our model by accounting for ”soft” fairness criteriain the following way: large multicast trees (with many users)are part of the solution only if the corresponding contenthas been waiting for some time to be downloaded (theweight is large enough). We have performed experimentswith different weight functions: piece linear and exponential w k ( t ) =  w k ( t − 1) ∗ c 2 ,c 2  >  1  and investigated the numberof users served, the distribution of downloads over the contentindex and the distribution of waiting times.In order to perform the simulation experiments, we runthe optimization algorithm P2 in a loop aiming to reach asteady state (independent of the initial conditions). After eachiteration, all the downloaded contents are replaced with newrandom contents simulating the user selection, such that eachuser has always P chosen contents for download. For the linearweights according to (12), we set  c 1  = 1 ,c 2  = 1 ,c 3  =10 ,t M   = 10 , for the exponential we set  c 2  = 2 , then theoptimization step is repeated.In Fig. 5 we depict the distribution of the waiting times (inperiods) for different values of T. The capacities of the network   0,000,050,100,150,200,250,301 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16T=100T=200T=300T=400 Fig. 5. Waiting time distribution links are kept constant during the experiments. Instead tocontrol explicitly the number of multicast groups or channels(which is difficult since this model is based on capacityconstraints) we increase the number of titles, with the effectthat the efficiency of multicast decreases and the waiting timesincrease. The curves show as expected that if the long tail of rare (cold) movies becomes longer, the whole distribution of the waiting time (histogram) worsens. However, the penaltyfunction associated with the weight  w  avoids the creation of long waiting times. It is clear that the selection of the oper-ation point, i.e. the maximal waiting time or delivery qualitydepends on several parameters: the capacity of the links andthe number available titles. Another parameter of the systemis the arrival rate of download requests. In the current modeleach downloaded content is replaced immediately by anotherrequested content at random. In practice a soft limitation of thenumber of downloads per user and per week can be introducedusing for example an exponential moving average scheme.To measure the fairness of the scheme, we proceed similarlyto [15]: we divide the T titles, sorted according to their index,into bins, each bin summing up the same number of requests(see Fig. 2). For example 1000 users selecting each of them5 titles, make 5000 requests that are divided in 10 bins a 500requests each, starting with the most popular titles. Then, wecount the number of downloaded titles in each bin. In Fig. 6we compare the effect of the weight (12) on the distribution of downloads in each bin. The effect of weights on long waitingcontents is to increase their download frequency and is mostpronounced for the exponential (multiplicative) function. Forthis case the waiting time decreases down to eight periods,compared to 16 for the additive weight of Fig. 5. Further work has to be done to better understand the effects of the differentsystem parameters on the waiting times.Summarizing, in the multiperiod scheduling of contents,there is a trade-off between the efficiency of the multicast,expressed for example by the number of served users, andthe (maximum) waiting time for any content. This trade-off iscontrolled by the proposed weight function in the objective. 051015201 2 3 4 5 6 7 8 9 10 Content quantile    t   i   t   l   e   s exponentiallinear, c=.5constant, c=0 Fig. 6. Distribution of downloads V. C ONCLUDING  R EMARKS AND FURTHER RESEARCH We believe that the push-VoD approach is a viable andscalable solution for the increasing video traffic in the in-ternet. Once the user accepts the new model of delayedplayback, we get a system that is technically much simpler andcommercially cheaper than the True- and Near-VoD systemsextensively studied 10 years ago. Indeed, the new approachhas many important advantages: •  no interaction model, VCR actions are local giving fullfreedom to the user(some advertisements can be down-loaded as well and presented during the playback) •  less stringent QoS requirements than for streaming,buffering and preloading schemes are simpler. •  service latency is a feature and needs not to be minimized,batching has much more larger time constraints and isfully controlled by the server, defection can be relaxed,but has probably to be differently modeled.However, more research is needed to further develop the ideaof push-VoD, we mention in the following only a few aspects: •  in the normal case, the topology of the content distrib-ution network goes beyond the controlled domain of anetwork provider and includes additional nodes wherecontent is cached. Therefore, the whole distribution chainfrom content provider down to the user has to be carefullyengineered. •  the Digital Rights Management problem is in the push-voD case crucial as the entire content is downloaded onuser disks. •  reliable, error-correcting IP multicast schemes supportingheterogeneous user terminals are needed •  new push-VoD application layer protocols are neededbetween the user and the service providerIn our work we have investigated the optimization potentialof packing multicast trees efficiently in order to maximize thenumber of served users (with a downloaded movie). We haveshown that if users may select several titles instead of onlyone, then substantially more users can be served with the samenetwork.
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks