Self-Adaptive Dissemination of Data in Dynamic Sensor Networks

Self-Adaptive Dissemination of Data in Dynamic Sensor Networks
of 10
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Self-Adaptive Dissemination of Data in Dynamic Sensor Networks David DorseyBjorn Jay CarandangMoshe KamDrexel UniversityData Fusion Laboratory3141 Chestnut Street, Philadelphia, PA, U.S.A { dave, jay, kam } @minerva.ece.drexel.eduChris GaughanUS Army RDECOMEdgewood Chemical Biological CenterAPG-Edgewood Area, Maryland Abstract The distribution of data in large dynamic wireless sen-sor networks presents a difficult problem due to node mo-bility, link failures, and traffic congestion. In this paper,we propose a framework for adaptive flooding protocolssuitable for disseminating data in large-scale dynamic net-works without a central controlling entity. The framework consists of cooperating mobile agents and a reinforcement learning component with function approximation and stategeneralization. A component for agent coordination is pro-vided, as well as rules for agent replication, mutation, and annihilation. We examinetheadaptabilityof this framework to a data dissemination problem in a simulation experiment. 1 1. Introduction The large-scale, dynamic, and time-varying nature of op-erational environments for ad hoc wireless mobile and sen-sor networks present formidable challenges to network de-signers. Approaches based on centralized control over thenetwork are often infeasible because they do not scale welland assume, unrealistically, a static structure in the formof routes [17] or routing tree structures [23]. Decentral-ized approaches that do not assume any structure in thenetwork [22] often rely upon a gradient that emerges asdata flow toward the sink node. The problem with thesestructure-less approaches is that they assume the gradi-ent is relatively static; all broadcasts srcinate from a sin-gle sink node and all sensor data flow to this same node 1 This project was supported in part by a US Army Competitive ILIR(In House Laboratory Independent Research) Project approved by the As-sistant Secretary of the Army for Acquisition, Logistics, and Technology(ASA(ALT)). (”convergecasting”[14]). However, if the sensor network is deployed for the purpose of communicating with mo-bile or static nodes working  within  the area of deployment,these traditional protocols are no longer suitable becauseanynodemaybethesourceofabroadcasttoallothernodes.Consider a sensor network that is deployed for use byfirst-responders battling a forest fire or in a military en-gagement where there is a threat of chemical or biologicalagents. In these scenarios, mobile networks are likely tobe deployed with little or no existing infrastructure; the in-dividual sensor nodes communicate through their wirelessinterface with other nearby nodes to form links. The effectof these local interactions among nodes is the emergence of a connected, ad-hoc wireless sensor network over the areaof deployment. The resulting network does not possess anyglobal perspective; nodes are only aware of neighbors incommunication range. The mobile nodes accompany thefirst-responders, warfighters, or unmanned ground vehicles(UGV) that move through the area of deployment. Mobilenodesmayactasbothsourcesandsinksofinformation. Thegoal is for the sensor network to provide the mobile nodeswith updated information about the entire area of deploy-ment while also acting as a medium for sharing informa-tion between mobile nodes. When a sensor is triggered ora mobile node sends a packet to a sensor node, the event isrecorded and then broadcast to neighboring nodes. Ideally,the event is broadcast to the entire network so that whena mobile node comes in communication range of a sensornode, it may obtain updated event information from acrossthe network.An obvious challenge that arises is the problem of broad-casting messages in these heterogeneous networks. Sincenodes do not need to communicate with a specific destina-tion, traditional routing protocols are inappropriate. Tradi-tional flooding protocols are designed specifically for net-work initialization and route discovery within traditionalrouting algorithms. These protocols assume that there is a  single broadcaster (a sink node or gateway) delivering mes-sages to many or all other nodes. All responses are returnedback to the same sink. Since all messages from the network are assumed to converge onto the static sink node, floodingprotocols assume a structure on the network in the form of a tree or a gradient.In order to deal with scenarios as those described above,where any node may be a potential broadcaster to all othernodes, a broadcast protocol should be adaptive. It shouldnot rely on established routes that would require mainte-nance and should be able to modify its operational modewhen the network environment changes.We propose an adaptive framework for disseminating in-formation in dynamic, large-scale networks including bothmobile nodes and stationary sensors. Our technique in-volves the use of mobile software agents that pick up and/ordropoffdatatonodeswhilerecordinginformationabouttheenvironmentontothenodesateachmigration. Agentsmakedecisions about which data to pick up and which link toselect for migration according to a state-action value func-tion, which is formed by extracting and generalizing fea-tures from the local information recorded onto the nodesby other agents. The payoff to the agent is related to thenumber of timely deliveries of event data and the energyexpended (agents consume energy at each node for compu-tation, transmission, and reception). The global objectiveis to minimize the average delay between the time a mes-sage is sent and when it is received by all other nodes whileminimizing the average energy expended across all nodes inresponsetothedisseminationtask. Theframeworkprovidesrules for agent replication, mutation, and annihilation basedon local rewards. We also incorporate reinforcement learn-ing with function approximation to provide agents with theability to make generalizations and learn about their envi-ronment through a value function and one-step feedback. 2 Dissemination in Ad-Hoc Networks Most of the current methods for disseminating data onwireless  ad hoc  sensor networks require a structure to beenforced on the network. By structure we mean a set of pre-established reliable links or routes; this includes hierar-chical structures such as clusters, or flat structures such astrees. Gradient-based algorithms such as Gradient Broad-cast (GRAB)[22] work well in situations where there is astatic sink node to which all data are delivered. However,they are not designed to deal with situations where there isno pre-defined sink node or where the network topology ishighly dynamic because they still require maintenance of the gradient. For these situations, an adaptive framework isnecessary.Research in Swarm Intelligence[3] led to many algo-rithms for adaptive routing in these dynamic networks usingthe approach from AntNet[6]. AntNet is an adaptive, dis-tributed, protocol using mobilesoftwareagentsto search foran optimal path between a source and a sink. The methodis based on the ant colony metaphor, where ”forward ants”are sent from the source node and, once they have locatedthe sink node, they trigger the release of ”reverse ants” thatupdate the routing tables on the hosts between the sourceand the sink.Other methods employ a randomized approach where anode, upon receiving a message, will rebroadcast to a ran-dom subset of its neighbors: these include work in gossipprotocols[12], rumor routing[4], and epidemic routing[10]. 3 Multiagent Reinforcement Learning An agent is defined as anything that can perceive its en-vironment through sensors and act upon the environmentthrough actuators [19]. In reinforcement learning, an agentmust learn to act through direct interaction with an en-vironment. A reinforcement signal provides feedback tothe agent so that it may evaluate its past actions and im-prove future actions accordingly. This model is attractivefor dynamic, decentralized systems because it does not relyon prior information and pre-planned strategies. Instead,agents are left to decide how to act during system operationthrough reward and punishment signals. At each time step,an agent observes its environment and selects the next ac-tion based on its own evaluation of the observation. In thefollowing time step, the agent observes the effect of its pre-vious action and receives a payoff indicating the quality of that action. This cycle is repeated while the agent attemptsto learn a policy  π  which maps features (or states) associ-ated with its observations to actions in order to maximize itsexpected discounted reward.This interaction between the agent and the environmentis usually formalized as a discrete time, finite state Markovdecision process (MDP). An MDP is a tuple  ( S,A,P,R ) ,where  S   is the state space,  A  is the action space,  P  a ( s,s  ) is the probability that action  a  in state  s  will lead to state  s  in the next time step, and  R ( s )  is the (immediate) rewardreceived after transition from  s  to  s  . The total reward theagent receives from time  t  onward is  R t  =   ∞ i =0  γ  i r t + i +1 ,where γ   ∈  [0 , 1]  is the discount rate. In cases where a modelof the system dynamics is not available (as is usually thecase), the agent will interact with the environment in orderto iteratively produce an estimate of   P  a ( s,s  ) .In Q-Learning [21], the action-value function of a givenpolicy  π  associates all state-action pairs  ( s,a )  with an ex-pected reward for performing action  a  in state  s  and fol-lowing  π  thereafter. This function (the Q-function) is theexpected value of   R t  given  ( s,a ) :  Q ( s,a ) = max π E  [ R t | s 0  =  s,a 0  =  a,a t> 0  =  π ( s t )]  (1)The update rule for  Q  is Q t +1 ( s,a ) = (1  −  α ) Q t ( s,a ) +  α [ r  +  γ   max a  Q t ( s  ,a  )] (2)where  γ   is the discount,  r  is the immediate reward and  α  ∈ [0 , 1]  is the learning rate. These Q-values are used by theagent to decide which action to take in the next time step.To allow agents to explore unvisited states, these algorithmsinclude an exploration policy such as the   -greedy policy,where agents select a random action with probability   <  1 ,and action  a  = argmax a   Q ( s,a  )  with probability  1  −   .Although well-understood algorithms with good conver-gence properties are known for solving single-agent tasks,particularly in static (stationary) environments, the use of multiple agents in the same learning environment providesnew challenges. In a multiagent system, the presence of other agents makes the environment nonstationary from thepoint of view of each agent, so it is unlikely that an agentwill be able to learn the state transition probabilities fromthe MDP model. This nonstationarity also invalidates anyof the convergence properties of single agent reinforcementlearning algorithms. However, in the context of dissemina-tion of data with multiple agents in a dynamic network, theemphasis is not on convergence, but on adaptability to otheragents and the changing environment. The focus of a multi-agent reinforcement learning algorithm in this class of prob-lems is to allow an agent to coordinate learning by incorpo-rating the rewards and value functions of other agents. For acomprehensive survey of multiagent reinforcement learningtechniques, we refer to [5]. 3.1 Function Approximation Most reinforcement learning algorithms assume the useof tables to store the action-values or state-values. In thecase of agents traversing a wireless network, however, theenergy and latency costs to carry such a table from host tohost make this impractical. Since each agent must carryits own action and state values, we require a compressedrepresentation of the value function. Recent literature inmultiagent reinforcement learning has considered the use of a function approximation to estimate the value function [1,8]. One way to represent the value function is as a weightedlinear function of a set of features (basis functions): ˜ Q ( s,a ) =  w 0  +  w 1 θ 1  +  w 2 θ 2  +  ...  +  w m θ m =  w 0  + w T  θ . (3)These basis functions are formed by extracting relevantfeatures from the environment. As an example from [19],the value of a particular state of a chess game could be com-puted using  θ i  to represent the number of each kind of pieceon the board, and the  w i  could be the value of the pieces (1for a pawn, 3 for a bishop, etc.). This method provides anenormous compression of the number of stored values anagent needs to keep while allowing the agent to make gen-eralizations from states it has visited about states it has notvisited.Each time the agent receives a reward, the weights w  areupdated using a version of the delta rule for online least-squares: w i  ←  w i  +  α  r  +  γ   ˆ Q ( s  ,a  )  −  Q ( s,a )   ∂Q ( s,a ) ∂w i ,  (4)where  s  and  a  are the next states and next actions. Themodification of the feature weights is proportional to thedifference between the received reward and the estimatedvalue of the state-action pair. Since (3) is defined to be lin-ear in θ , (4) reduces to w i  ←  w i  + α  r  + γ   ˆ Q ( s  ,a  ) − Q ( s,a )  θ i ( s,a ) , (5)which is easier to compute. These weights represent whatthe agent has learnt through interaction with the envi-ronment and serves as the agent’s ”memory”, since anychanges in  w  will affect the evaluation of all other states.If an agent receives a large reward for taking an action  a and moving to a state s , then the value of  w i  will be modi-fied in proportion to the extent that the value θ i  participatedin this decision. 4 Proposed Method 4.1 Components Before presenting the outline of the algorithm, we firstintroduce the components of the framework. 4.1.1 Events We refer to any data that are produced by sensors in the net-work as a result of a detection as an event. However, it is notnecessary that events srcinate exclusively from sensor de-tections; an event may also be an annotation on a collabora-tive whiteboard application used by first responders (for anexample of such a collaborative application for first respon-ders, see [2, 15]). Each event has a time-dependent numeri-cal value determined by its age (how much time has passedsince the event was detected), itspriority class (higher prior-ity events have greater initial values), and its expiration date  (the amount of time for which an event is relevant). We de-note the value of event  i  created at time  t 0  with priority  p and expiration  x  as  v i ( t − t 0 ,p,x )  or simply  v i ( t ) . Wedenote the set of events on a host as E  h  and the events thatan agent is carrying as E  a . 4.1.2 Swarm Agents There have been several investigations into the use of mo-bile software agents for exploring and providing decentral-ized services in wireless  ad hoc  networks [16, 7, 18]. Amobile agent is a composition of software and data whichis able to autonomously move from one host to anotherwhile continuing its execution at each destination. Mobileagents are well suited to adaptive communication protocolsbecause they are goal-oriented and can continue to operateeven after the host from which they srcinate is removedfrom the network. The mobile agents used in the dissem-ination framework are composed of a link-action value es-timator,  ˜ Q ( ,a ) , a decision policy,  π , a parameter set  w ,an event payload, and a fixed memory size for storing im-portant statistics. We refer to these as  swarm agents . Thelink-actionestimatorcomputesthevalueoftheagentchoos-ing a link,   , and an action,  a . The choice of     is selectedfrom the set of neighbors of the current host as well as thecurrent host (agents may elect not to migrate). The actionis selected from the action space, A , which is comprised of the events the agent will carry to the next host, the num-ber of clones to create, the decision to live or die, and thedecision to mutate the agent’s parameters,  w , with anotheragent’s parameters. Swarm agents collaborate through mes-sages left on the hosts to evaluate links and actions. 4.1.3 Swarm Agent Collaboration The messages left by swarm agents for collaboration arecontained in a  visit entry  at each host they visit. These en-tries contain information about the agent’s payload as wellas information about the local environment as perceived bythe agent. For example, the visit entries might contain: •  The quality of the link (  LQ ) upon which the agent mi-grated; this is a function of the received signal strengthat this host and the number of migration failures theagent experienced on this link. •  The amount of energy the last host has consumed ( en-ergy ). •  The amount of time the agent spent at this host and thelast host it visited ( delay ) •  The events that the agent carried in its payload to thishost and from this host ( events ) •  Event meta-data (  MD ) used to represent any data el-ements that the agent encountered at the last host butdid not pick up •  The value of the events at the last host ( value ) •  The agent’s parameter set,  w •  The agent’s average reward  ˆ R  •  The number of times the agent has replicated ( clones ) •  The amount of time the agent has been alive ( age )The data in these visit entries are sorted into link advertise-ments and agent advertisements on the hosts, as shown inTables 1 and 2. When an agent arrives at a new host after amigration, it will use the information in the link advertise-ments to determine the value of migrating across a link. If alink does not have an advertisement, then the agent will cat-egorize the link as  unexplored  . Because agents always takethe set of events that will maximize their expected reward,agents may elect to leave some events on hosts it has visitedand take only the meta-data describing the event to the nexthost. The use of meta-data for exchanging information be-tween nodes in a broadcast protocol was first introduced in[13].The agent will also evaluate its own performance byviewing agent advertisements and comparing its own aver-age reward with that of other agents. In [11], the authors usea genetic crossover method between swarms of randomlywandering agents on a network. The objective was to con-tinuously mutate the agent populations in order to find anoptimal memory size for agents that would result in the re-duced latency of event delivery. In swarm dissemination,the agents mutate their parameter set  w ; when an agentthat has recently visited is performing significantly better(according to the advertisement), the agent may perform acrossover mutation between its own parameter set and thatof the successful agent (Section 4.1.7). Visit entries and ad-vertisements are deleted from the host after a set period of time to avoid crossovers and decisions based on stagnantinformation. 4.1.4 Reward At each time step, agents receive a reward R   for deliveringan event to a host that has not already received the event.The reward is proportional to the number of events thatare delivered at each migration and the value of the events(which decreases with time). The agent is also penalized forconsuming energy at each migration; the amount of energythat is consumed when an agent migrates is related to thenumber of events the agent is carrying. If an agent migrateswith no events, it is still penalized a constant amount  c  to  Table 1. Table of state values for each linkto a neighbor of the current host. The val-ues are updated periodically between hostsor through agent visit entries. Link  LQ events/ MD value energy delay 1  - - - - - 2  - - - - - 3  - - - - -... - - - - - Table 2. Table of state values for each agentto recently visit the current host. The valuesare updated through agent visit entries. Agent  ˆ R   w  clones age 1  - - - - 2  - - - - 3  - - - -... - - - -carry its executable code and its memory. If the agent iscarrying N   events, the reward is computed as R   =  a 1 N   i =1 I  ( v i ) − a 2 N   + c  (6)where a 1 , a 2 , and c are constants, and I  ( · )  is the indicatorfunction I  ( v i ) =   v i  if host did not have the event 0  otherwise .The agent keeps a running weighted average of these re-wards  ˆ R  which is updated at each time step according to ˆ R  ( t )  ←  R  ( t ) + (1 − α ) ˆ R  ( t − 1)  (7) 4.1.5 Link-Action Value Approximation As described in Section 3.1, function approximation cansignificantly reduce the amount of information that swarmagents carry while also allowing agents to generalize unvis-ited states from ones that they have visited. Swarm agentsform the function approximation using the data contained inthe link advertisements from Table 1. We refer to the datain the table as the  features ; each row contains features for asingle link from the current host.The first feature is the link quality,  LQ , which representsthe probability that an agent will be able to successfully mi-grate across this link. If an agent attempts to migrate, butthe packet is dropped due to interference, then the agentwill have to retransmit (and consume more energy at thishost). The value of   LQ  is a function of the received sig-nal strength, which indicates the strength of the signal re-ceived from the neighbor host, as well as the number of failed migration attempts reported by agents on this link.This function is computed so that more recent reports aregiven greater weight (this is true for all values in the adver-tisement table). The basis function  θ 1  that computes thisfeature is a simple sigmoid function mapping the  LQ  valueon the interval  [ − 1 , 1] .The second feature is the  events/ MD . This feature is theset of events that the neighbor on this link is known to pos-sess as of the most recent visit entries; if the event is rep-resented only by meta-data, then the event is possessed bythe neighbor but not this host. With this information, twoadditional basis functions,  θ 2  and  θ 3  are formed. The firstcomputes the maximum value of the events that the agentexpects to ”pick up” at this neighbor; the second is the max-imum value of the events the agent expects to ”drop off”upon migration. If hosts periodically update each other withthe value of the events they currently possess, then the thirdfeature,  value , can be incorporated into θ 2  and θ 3  to verifythe validity of the visit entry information. In the case wherethere is no information about this link from visit entries, anadditional basis with a boolean value is used to indicate thatthis link is unexplored.The  energy  feature represents the amount of energyknown to have been consumed by the neighbor on a link.We used a sigmoid function to represent the value of thisfeature. The  delay  feature represents the amount of timeagents spend at this neighbor. We assume in our experi-ments that if multiple agents reside on the same host thatthey are placed into a queue to wait until the CPU is avail-able. The value also represents the congestion on this link.The basis function for this feature is the same as the oneused for the  energy  feature.The value of a link is computed with these basis func-tions using (3), replacing s with  . For each link, the action(set of events to carry to the neighbor on a link) that maxi-mizes the value of   ˆ Q ( ,a )  is denoted  ˆ Q ∗  .After an agent has reached the next host, having selectedaction  a  and link     and received reward  R  , the weights forthe linear function are updated using: w i  ←  w i  + α  r  + γ   ˆ Q (   ,a  ) − Q ( ,a )  θ i ( ,a ) , (8)Note that agents are also evaluating their link selection   and next action,  a  with respect to the current state and thelast action taken when updating  w . That is, agents update w  after   they have selected what link and action to take nextand incorporate the estimated value of this decision into thelearning update, as shown in Figure 1.
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks