How To, Education & Training

A Novel Fast Restoration Mechanism for Optical Burst Switched Networks

Description
A Novel Fast Restoration Mechanism for Optical Burst Switched Networks
Published
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  A Novel Fast Restoration Mechanism for Optical Burst Switched Networks   Yufeng Xin ¡ Jing Teng ¢ Gigi Karmous-Edwards £ George N. Rouskas ¢ Daniel Stevenson £¤ Correspondending author, UMIACS, University of Maryland, College Park, MD, Email: yxin@umiacs.umd.org ¥ ANR, MCNC-RDI, RTP NC 27709 ¦ Computer Science Department, NC State University, Raleigh NC 27609 Abstract Survivability is a critical network design issue for opti-cal networks since even a single failure for a short durationmay result in huge data loss due to the large capacity of optical fibers. However, few studies have been done on thisissue for opticalburst switching(OBS)networks. In this pa- per, we extend our early work on the fast restoration tech-niquefor OBS networks and present a novel fast restorationmechanism based on distributed deflection routing. Com- pared to other survivability schemes, the proposed mecha-nism has the advantageof fast andlow-overheadfault man-agement process and demonstrates excellent burst blocking performance by balancing the deflected traffic load duringthe restoration process. 1 Introduction Internet traffic is growing very fast and demanding morebandwidth and better bandwidth utilization. The bursty na-tureof Internettraffic has shifted recentresearchfocus fromoptical circuit switching to optical packet switching (OPS)and optical burst switching (OBS) that feature fast serviceprovisioningandefficientbandwidthusage [4]. OBS is usu-allyviewedas a techniquebridgingopticalcircuitswitchingand optical packet switching. Since the implementation of the techniques for buffering and header processing in op-tical domains is still premature, OBS protocols were de-signed to avoid optical buffering and perform fast setup of the end-to-end data lightpath using out-of-bandheader pro-cessing. One representative protocol for OBS networks isthe Just-In-Time (JIT) signaling protocol. Supporting vari-able payload size and designed for unslotted OBS switches,JIT requires no synchronization at the switches [6]. Thebufferless mesh optical networkunder the controlof the JITsignaling protocol is referred as the JIT network  in this pa-per. § The work was done when the author was with MCNC-RDI While the majority of the previous studies have been fo-cused on resource reservation and scheduling, burstifica-tion, and performance analysis [5], few studies have beendone on the survivability issue for OBS networks. Network survivability is a critical design issue for all types of opticalnetworks since even a single failure such as a fiber cut or aninterface card malfunctionfor a short duration may result inhuge data loss due to the large capacity of optical channels.Unfortunately,the probability of such kind of failures is notlow. FCC statistics show that metro networks annually ex-perience13cuts forevery1000miles offiber, andlong-haulnetworks experience 3 cuts for 1000 miles fiber. Even thelower rate for long-haul implies a cable cut every 4 days onaverage in a network with 30000 route-miles of fiber [1].The techniques used for network survivability can bebroadly classified into two categories: preplanned protec-tion and dynamic restoration [2]. Compared to protec-tion, restoration has more efficient resource utilization, buttypically takes more time. Another advantage of dynamicrestoration is the better scalability in term of the fault man-agementoverheads,as backupconnectionsforthe disruptedservices need only to be discovered and maintained afterthe network failure(s). However, dynamic restoration re-quires long restoration time and does not guarantee the re-establishment of the disrupted services, since the backupresources may not be available at the time that a failurehappens. Therefore, protection is often used for the pre-mier long-term service connections in circuit-switched orconnection-oriented data networks. Existing restorationmechanismsalsofocusontherecoveryofdisruptedexistingconnections in the circuit-switched or connection-orienteddata networks. As to the connectionless networks, such asIP networks, the restoration is achieved by dynamic routingthrough global routing table update upon a network failure,where all new incoming connections whose initial routestraverse the failure will be droppedbefore this slow restora-tion process (up to minutes in the worst case) is completed.For the JIT-based OBS networks we are looking at, ser-vice routes are discovered at every involved node by thehop-by-hop forwarding table lookup. It is almost impossi-1  ble for a node to maintain the path (or path segment) in-formation and pre-reserve spare resources for a connectionrequirement. Therefore, fast restoration is a better choicefor the fault management of OBS networks.[9] presented a brief operation and maintenance frame-work for OBS networks. [8] studied a simple deflectionrouting scheme to achieve fault tolerance upon link fail-ures for OBS networks. In [7], a 1+1 protection schemewas studied for OBS networks. However, this study wasonlyconcernedwith the special longdurationOBS sessionswhose primary and backup paths are decided prior to actualburst transmission. Therefore only premium traffic, com-prising a small fraction of the total load in an OBS network,would be afforded this type of protection.The dominant traffic in a JIT network is short-term opti-cal bursts that are set up by dynamic route discovery usingthe hop-by-hopforwarding table lookup at the intermediatenodes. The need for the minimization of restoration timein JIT networks is emphasized by the following facts: (1)restoration of the existing short-term connections may bemeaningless because their duration may be shorter than therestoration time; (2) restoration through the global forward-ing table update, although providing an optimal routing so-lution after the network failure, is too slow to prevent theheavy data loss under heavy traffic conditions. Fast restora-tion mechanisms must be in place so that the data loss canbe minimized before the global forwarding table update iscomplete.In this paper,we focus on the fast restorationmechanismfor JIT-based OBS networks where short bursts are dom-inant. The proposed deflection-based restoration schemesare designed with the following objectives: (1) the restora-tion process is fast such that interrupted services can be re-stored in short time; (2) the overall burst loss during thefailure and restoration is as low as possible with the limitednetworkcapacity; (3) the fault managementoverheadis lowin terms of extra control message exchange and process-ing. We also show that optimization of the overall network blocking performance is achieved by balancing the trafficloadduringtherestorationprocessin orderto alleviatedete-rioratednetworkcongestionduetothenetworkfailure. Thiswork is an extension to our early work on this topic [10].The paper is organized as follows. A fault managementframework for JIT networks is presented in Section 2. Afast restoration mechanism based on deflection routing isdiscussedinSection3. Results froma comprehensivesimu-lation study are presented in Section 4. Section 5 concludesthe paper. 2 Fault management framework for JIT net-works In JIT networks, the basic routing mechanism is muchlike that of IP networks, in which every OBS node main-tains a local forwardingtable. The entries in the forwardingtable contain the next hop information for bursts per desti-nationandperFEC(ForwardEquivalentClass). OBS nodesforward the incoming burst control packets and set up theconnectionsby lookingup the next-hopinformationin theirforwarding tables. We simply use burst forwarding or burst routing to represent this connection setup process.Routing in JIT networks relies on a link state protocolsimilar to OSPF. Each node collects link state informationfor all its adjacent nodes periodically and reports the infor-mation to the routing entity in the network. The routingentity then calculates a new set of routes between every pairof OBS nodesand updates the forwardingtables in the OBSnodes. In the centralized case, there is a Routing Decision Node (RDN) in the network that performs the routing func-tionality for the entire network. In the case of distributedrouting, there is no such RDN. All nodes flood to and re-ceive from every other node with the link state information.Each node will compute the routes to other nodes and up-date the forwarding table locally. Therefore, all the OBSnodes in the network comprise the routing entity.However, in this paper, whether the routing is central-ized or distributed is irrelevant to our proposed fast restora-tion mechanism. For the purpose of better illustration, weassume centralized routing in this paper. Burst forwardingdecisions are made at OBS nodes locally, based on the for-wardingtablegivenbytheRDN. AllOBS nodesreporttheirlink state information to the RDN periodically or by inter-rupt in abnormal cases. In addition, nodes may also ex-change fault information with their adjacent nodes (neigh-bors) within the control plane.The routing algorithm implemented at the RDN is basedon the 2-shortest-path algorithm to support alternative rout-ing. For every pair of OBS nodes in the network, two dis- joint routes with the shortest overall length are computed,oneas theprimaryrouteandanotheras thealternativeroute.Therefore, the resulting forwarding table at an OBS nodecontains 2 next-hop entries per destination or FEC. Theroute for a particular burst is discovered based on a hop-by-hop paradigm. For any end-to-end burst connection, a nodealong its route is only aware of the primary and alternativenext hops for this burst given by the RDN.We only consider the single link failure scenario in thispaper. We also assume the control plane is independent of the burstdata planeandis  ¢¡£¡¥¤ reliable. Thefault manage-ment for OBS networks contains three steps: fault detectionand localization, fault notification, and service restoration.Upon detection of a link failure, a fault notification mes-2  sage is sent to the RDN for routingre-computationand thenthe RDN advises all the nodes to update their forwardingtables accordingly. This is actuallya defaultservice restora-tion mechanism as the new routes will work around thefaulty link. However, this global forwarding table updateprocess could be very slow (from several seconds up to afew minutes) according to the experience from current IPnetworks. The slow restoration process may result in an in-tolerably large amount of burst loss because all bursts thatare supposed to traverse the failed link will be discarded if no special action is taken during the update process. There-fore, we have to implement efficient fast restoration tech-niquescomplementarytotheglobalforwardingtableupdateto reduce the overall burst loss.Since the JIT network is a loss network, a feasiblerestoration mechanism needs to make efficient use of sparenetwork resources in order to minimize the burst blockingprobability. Our study shows that two types of burst block-ingcontributetotheoverallburstblockingperformance: (1) restoration blocking , which is the burst blocking during thefault detection and notification periods; (2) increased con-gestion blocking arising from the diverted traffic and the re-duced network capacity that results from the failure. How-ever, there may exist complex trade-offs between these twotypes of blocking. A fast restoration scheme with shorterfault notification time (thus smaller restoration blocking)may incur larger congestion blocking, and vice versa . Thisphenomenon will be showed latter when we present therestoration schemes.In the following section, we will present two basicdeflection-based fast restoration schemes that can be in-tegrated into the JumpStart JIT signaling protocol [3],with light management overheadand good blocking perfor-mance. Furthermore,by defining a distribution ratio ,   , wegeneralizethese two schemes to a generalrestorationmech-anism,   distributed deflection restoration . 3 Fast restoration techniques for OBS net-works Figure 1 depicts an example OBS network with 8 OBSnodes and a RDN. We assume node I  is the ingress nodefor a certain burst flow and node E  is the respective egressnode. The links represented by solid lines are the physicallinks connecting the OBS nodes (the links to RDN  are rep-resented by the dashed lines). The primary route from I  to  E  is ¡£¢   ¢¥¤¦¢¨§©¢¥ . Assuming the link between its headend  node 2 and tailend  node 3 is broken, links in thesame line style represents an alternative route decided by arestoration scheme. As mentioned before, there is at leastone alternative next hop for every entry in the forwardingtable. This information enables the JIT network to set upalternative routes to work around the faulty link. EI123456 1Forwarding table update3Distributed DeflectionLocal deflection2 RDN Figure 1. OBS fast restoration mechanisms Before the faulty link is worked around, bursts will belost in the faulty link. After the successful restoration,burst loss would still be higher than that under normal con-ditions due to the reduced network capacity. While thefault detection and processing time may normally be a con-stant, the fault notification time makes the major differenceamong different restoration schemes. Generally, we want itis preferable that the node that makes the re-routing deci-sion be close to the faulty link so that the fault notificationtime is reduced. This also leads to light fault managementoverhead as only a small amount of fault notification mes-sage transmission is needed. In the list below, we presenttwo fast restoration schemes based on deflection routing inwhich at most one-hop fault notification message transmis-sion and processing are required. They are also illustratedin Fig 3. For the purpose of comparison, we first describethe default global routing update scheme.  Scheme 0. Global routing update : When the head-end 2 (or tailend 3) detects the link failure, it informsthe RDN via the control plane. The RDN conductsthe routingre-computationand updates the forwardingtables for all nodes, and new bursts will subsequentlyfollow the new routes. For example, new bursts willfollow the route ¡¢¢¢¢ . This so-lution is optimal and the existing routing protocol canhandle it well. However, global routing table updatingis a slow process (in seconds or even minutes) due tothe long roundtrip time for the signal transmission andprocessingbetween the OBS nodes and the routing en-tity. As a result, a large amount of bursts will be lostbefore the forwarding tables are updated.  Scheme 1. Local deflection : This is similar to the tra-ditional deflection routing usually seen in a congestionresolution scheme. When the headend ¤ detects the3  linkfailure,it will automaticallypickupthealternativenext hop in the forwarding table for every new burstwhose next hop on its primary route passes the faultylink. In the example, new bursts from ¡ to  will fol-low the alternative route ¡¢   ¢¤©¢£¢¢ .This would be the fastest restorationscheme since newbursts will be deflectedto an alternativegoodlink rightafter the link failure is detected locally. Therefore, itwill incur the smallest restoration blocking. However,because all the affected bursts are deflected to one al-ternative path, this scheme would increase the conges-tion blocking.  Scheme 2. Distributed deflection : This is a novelfast restoration scheme proposed in this paper. In thisscheme, the headend ¤ will also send a different faultnotification message to all its adjacent nodes in addi-tion to the one to the RDN. This fault notification mes-sage contains the destination information for all theprimary routes passing the faulty link. After receivingthis message, each of the adjacent nodes will pick upan alternative next hop for the affected bursts that areheading to the faulty link according to their primaryroute. In the example, bursts from ¡ to  will take thenew route ¡¢   ¢¢¢¢ . Comparedwith the local deflection scheme, distributed deflectionhas the potentialto makethe re-routedtraffic moredis-tributed instead of being totally deflected to one alter-native path. In this way, less congestion and thereforeless burst loss may occur. However, this scheme re-quires extra one-hop fault notification. One possibleproblem is that, if the network traffic load is alreadyvery heavy, distributed deflection may have a negativeimpact as it may deteriorate the congestion conditionall over the network.The actual algorithm is a combination of local deflec-tionand adjacentdeflection,i.e., the affectedbursts aredeflected locally until the adjacent nodes receive thefault notification. At that time the affected bursts willbe deflected distributively.Above analysis clearly shows that the last two restora-tion schemes will provide fast restoration as at most one-hop fault notification message transmission/processing arerequired and the alternative route is pre-computed beforeany failures. Furthermore, they only add a small amount of fault management overhead to the normal network opera-tion (the alternative route information in both schemes andthe one-hop fault notification message transmission in thedistributed deflection).One interesting observation from scheme 2 is that thecapacity of the links between the headend (node ¤ ) of thefaulty link and its adjacent nodes (node   ) will not be uti-lized if all affected bursts are deflected at adjacent nodes.Therefore, we define a distribution ratio ,   , to determinethe portion of affected bursts that will be deflected at theadjacent nodes. That is, after the adjacent nodes receive thefault notification,   portion of affected bursts will be de-flected distributively, and (   ¡   ) portion of affected burstswill be forwarded to the headend node of the faulty link tobe deflected locally. With a different value of    ¢¤£ ¡ ¦¥   ¨§ ,we have a different variance of the distributed restorationscheme. When   © ¡ , it is equivalent to scheme 1, lo-cal deflection based restoration. When   ©   , it becomesscheme 2, the distributed deflection based restoration. Weuse   distributed deflection to denote the generalized dis-tributed deflection mechanism. We also note that using   only introduces a tiny amount of management complexityin the adjacent nodes. We expect that there exists an opti-mal value of    that makes the affectedbursts to be deflectedin a most balanced way such that the minimum burst losscan be achieved. 4 Simulation Study In this section, we present the results from a comprehen-sive simulation study to illustrate and compare the perfor-mance of the restoration schemes we discussed above. Thesimulation is conducted on an NSF network (14 nodes, 21links) under the JumpStart  JIT signaling protocol [3].We consider two types of burst loss (probability) in thisstudy. One is the burst loss (probability)for all the bursts inthe network (since deflected bursts may have negative im-pacts on other bursts due to the increased congestion). An-other is the loss (probability) of the affected bursts whoseprimary routes pass the faulty link. We will refer to the for-mer as the overall burst loss (probability) and the latter asthe local burst loss (probability). We are especially inter-ested in the burst loss (probability) during the restorationprocess, the time period between the link failure and theglobal routing table update.During the simulation, we let the most loaded link failafter the simulation runs enough time at the steady state.We also assume the RDN is the central node of the network and is always working. The updated new burst routes willreplacethedeflectionroutesaftertheRDN updatemessagesarrive at every node.We assume all the physical links in the network to bebi-directional and there are © §¤ wavelengths per link per direction. We also assume full wavelength conversioncapacity at every node. Two shortest alternative routes arepre-computed for every pair of nodes with the shorter oneas the primary route.We assume a Poisson burst arrivalprocess foreveryOBSnode with an offered load of  ©! , where " and # arethearrivalrateandaverageburstdurationtime,respectively.The traffic is uniformlydistributed to everyother OBS node4  as the destination.All the results are the mean burst loss probability (de-fined as the ratio of the number of lost bursts to the totaloffered number of bursts) during the link failure. They areobtained within the     ¤ confidence interval by using thebatchmeanmethod. Thefault detectiontime is set to    ¢¡¤£ and the RDN update time is fixed to ¤ ¡£¡ ¡¥£ . 0.10.110.120.130.140.150.161 2 3    B   l  o  c   k   i  n  g  p  r  o   b  a   b   i   l   i   t  y Restoration PhaseGlobal UpdateLocal DeflectionDistributed Deflection Figure 2. Overall burst loss probability vs.restoration phases ( © ¡ §¦  ) 00.20.40.60.811.21.41 2 3    B   l  o  c   k   i  n  g  p  r  o   b  a   b   i   l   i   t  y Restoration phaseGlobal Routing UpdateLocal DeflectionDistributed Deflection Figure 3. Local burst loss probability vs.restoration phases ( © ¡ §¦  ) Fig. 2 and Fig. 3 depict the overall and local burst lossprobability (y-axis) when © ¡ ¨¦  during the three network operationalphasesaroundalinkfailure. Inthex-axis,phase1 represents the normal phase when the link failure has notoccurred, phase 3 represents the phase after the global rout-ing table update has been finished, and phase 2 representsthe time period in between. We observe that the burst lossprobability is very low for both phases 1 and 3, though it isactually a little bit higher in phase 3 due to the reduction of the network capacity. However, the loss probability couldincrease significantly in phase 2. Relying only on the for-wardingtableupdatewouldincurveryhighburstloss inthisphase. However, the loss probability only increases moder-ately when the proposed fast restoration schemes are usedin this phase. For the overall burst loss depicted in Fig. 2,among the three restoration schemes, the distributed deflec-tion shows the best performance(almost no extraburst loss)followed by the local deflection. Global routing update in-curs the highest burst loss. Specifically, the improvementsfrom using the two fast restoration schemes over the globalforwardingtable update are ¤ ¦ § ¤ and ¤ ¡ §¦ ¤ , respectively.The performance improvement in this phase is more obvi-ous for the affected bursts whose primary routes pass thefaulty link before the new forwarding table update. As de-picted in Fig. 3, these bursts will be directly dropped withno fast restoration schemes in place (the burst loss probabil-ity is ¡¡¥¤ for the global routing update scheme), but only   ©§¦  ¤ and § ¤ of the bursts are lost with distributed de-flection and local deflection, respectively. We observe thatthe distributed deflection achieves the least burst loss andthe local deflection also reduces the burst loss dramaticallyover the global routing update. 00.050.10.150.20.250.30.350.2 0.4 0.6 0.8 1 1.2    B   l  o  c   k   i  n  g  p  r  o   b  a   b   i   l   i   t  y LoadGlobal Routing UpdateLocal DeflectionDistributed Deflection Figure 4. Overall burst loss probability withdifferent schemes vs. offered load  Fig. 4 and Fig. 5 illustrate the overall and local burst lossprobabilityunderdifferentofferedloads (differentvalues of   ) and different restoration schemes during the restorationperiod (phase 2). We observe that applying the proposedfast restoration schemes can dramatically reduce both over-all and local burst loss. For example, at the load level of 0.1, the blocking probability of the global forwarding ta-5
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks