Bayesian networks: Real-time applicable decision mechanisms for intelligent agents in interactive drama

Bayesian networks: Real-time applicable decision mechanisms for intelligent agents in interactive drama
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  Bayesian Networks: Real-Time Applicable Decision Mechanisms forIntelligent Agents in Interactive Drama Maria Arinbjarnar and Daniel Kudenko  Abstract —The growing use of intelligent agents in virtualrealities calls for agents that can interact with users in a fluentand believable way. Intelligent agents are expected to react inan increasingly human like manner, showing distinct charac-teristics and to express emotion. Which calls for increasinglylarger datasets for the agents to make informed and plausibledecisions. Intelligent agents need to be able to process largeknowledge bases and make decisions based on a wide rangeof relevant factors in real-time in order for the user to findthem believable. This is very important for autonomous agentsin emergent interactive drama because they need to respondfluently to user interactions in a manner that resembles that of a real actor.Bayesian networks (BNs) are particularly suitable to imple-ment intelligent agents decision mechanism because they sup-port transient emotions and decision making and accommodatefor conflicts between the virtual agents goals. In their basicform, BN reasoning algorithms do not scale well, since the costof updating values in a BN is NP-hard in the worst case. Wepropose efficient and scalable BN techniques using relevancereasoning, that are suitable for the needs of our autonomousagents in directed emergent drama and allow for real-timedecision making. I. I NTRODUCTION In the early days of computer games the most populargames would have very limited story like Pack man orTeatris. If there was a story then it was mostly in the gamemanual. The emphasise was on game play and the gameswere highly replayable. Replayability is a measure of theentertainment value of playing the game more than once.This means that you play the same game repeatedly andyour skill in playing the game grows with practice. if thereplayability is high the player’s enjoyment and immersionwill grow with repeated play.Next came first person shooters (FPS) which have missionbased stories. The player is given some background storyat the start of the game and an ultimate goal, for instanceescape the Nazi research camp and kill anything that moveswhile you are at it. The game-play is divided into levelsthat you need to complete. Each level has goals that mustbe completed such as collecting an item and killing bosses.Between levels the story is advanced.FPS games are not very replayable in their basic singleplayer form. The reason for this is that ones you haveplayed through it then when you play it again there areno surprises, no suspense. You know where the bosses areand how to deal with them, there are no significant changes Maria Arinbjarnar and Daniel Kudenko, Department of Computer sci-ence, The University of York, Heslington, YO10 5DD, York, UK, email:, ). between each game play and no new challenges. MultiplayerFPS are on the other hand highly replayable because thenthe player is constantly up against a new challenge. Inrecent years single player FPS games have evolved towardshaving the opponents, the non-player characters (NPCs),more intelligent to provide a greater challenge for the player.They will simulate the normal behaviour of people in similarsituations, including going to sleep, eating and in generalgoing about their daily lives. One of the most recent andgreatly popular games in these series is Assassin’s creed II[39] where the player is an assassin that is assigned a numberof assassination jobs. In assassin’s creed as in many otherrecent FPS there is a much greater emphasise on the storyand character development than before. The player gets anin-depth character description of the assassin and why he isin this job. The story progresses gradually through the gameand is delivered in seamless cut scenes. The player is playingthe protagonist and as the story progresses the protagonistfaces many moral dilemmas and challenging revelations thatdramatically change his perspectives and shape his character.This is very typically of other recent games such as Fallout3 [6] and Heavy rain [34]. Heavy rain carries this characterdevelopment even further and allows the player to makedecisions on what the character does based on the charactersemotions and thoughts, forcing the player to make challeng-ing moral decisions. There is a very clear trend towardsmaking the NPCs more intelligent more autonomous in orderto give the player a more fulfilling game experience. Agood example of this is S.T.A.L.K.E.R. [16], there the NPCsactively simulate the behaviours of soldiers or mercenariesthat try their best to kill the player. Their behaviour is verybelievable and they provide challenging game play, they donot simply continuously walk the same circle. In this respectthere is a clear disparity between the increased playability of FPS which has every potential of providing rich replayability,as becomes clear when these games offer multiplayer mode,and the still pre scripted unchanging storyline that the playeris forced to follow. The popularity of titles like Heavy Rainand Assassins creed clearly demonstrate that there is a marketfor games that provide richer story worlds and characterdevelopment. What clearly needs to be accomplished is thatthe character development and story becomes as involvedas the game play that it is not being counter productive toreplayability because it stays fixed to a pre scripted story.There needs to be a merge between drama and game-play.The drama should emerge around the player, influencedby the player actions and still provide a structured storyexperience. One way to accomplish this is to make the NPCs 978-1-4244-6297-1/10/$26.00c  2010 IEEE 427  even more autonomous so that they can enact a drama for theplayer based on abstract goals rather than pre scripted story-lines. The NPCs need to be believable in their performance,this means that they should not violate the expectations of the audience. For example they need to stay in charactersuch that a calm, gentile character remains calm and gentileunless extremely provoked e.t.c. The NPC behaviour shouldbe plausible in order to suspend disbelief. Effectively theagents should demonstrate humanlike behaviour which callsfor large knowledgebases and reasoning under uncertainty.BNs are specifically suitable for this task because they arecausal and they provide a means of calculating action utilitiesbased on probabilities in a highly efficient manner. BNs arean effective way of incorporating psychological elementssuch as traits and emotions which are very important forbelievability [5], [7], [19], [17], [23], [38]We present an approach, demonstrating the advantages of Bayesian networks (BNs) [20] as the autonomous agents’decision mechanism. In the worst case, belief updatingalgorithms for BNs are NP-hard [11]. There are severalapproximation algorithms; forward simulation, straight sim-ulation, likelihood weighting, and randomised approximationschemes. Such approximation algorithms have also beenshown to be NP-hard for the worst case [12]. BNs complexitygrows with the number of variables and connections betweenvariables. The key to reducing BNs complexity is to reducethe size of the network that requires updating in eachreasoning step. We show that we can do this by extractinga relevant part of the network and only update the valuesin this reduced network. We use relevance reasoning [25] toreduce the complexity of computing inference by extractinga subset of variables from a BN that need to be updated inorder to compute the inference of the target variable. Forthis to be possible we determine which variables need tobe updated in order to compute the inference of the targetvariable and store the object in a hash table to be used whenrelevant again.II. D IRECTED E MERGENT D RAMA In order for the drama to become an integral part of game-play and support it rather than counter it we have designedan architecture called Directed Emergent Drama (DED) [1].As seen in figure 1 it consists of a director, schemas, actorsand a player. The player only ever interacts with schemasand actors and never the director. The actors receive all theirinformation from the player and from schemas, they do notinteract directly with the director.  A. Director  The director overlooks the emergence of the drama anduses schemas to direct the drama by giving the actorsappropriate schemas to play out.The drama can not move between acts until the objectivesof the acts have been adequately satisfied. As an example, If adrama goal for act I is to introduce characters, the drama willnot move from act I to act II until characters’ key traits havebeen exposed. If a character is to be intelligent, playful and        Director        Schemas        Actors        Player  c    T        Q        k        C' E Fig. 1. The DED architecture [1] curious then the character needs to have played out actionsthat are intelligent for a value above a given threshold T   ,and the same for playfulness and curiosity.Skilfully winning a chess game or making 3-4 correctestimates about, for instance, the age of old furniture orshowing good arithmetic skills would be sufficient. Thealgorithm summarises the percentiles to see if it has reachedthe threshold T   . The director’s role is to give the actorsan opportunity to show their characteristics by choosingschemas that would be a good fit.The director uses known structures as an aid in picking asuitable schema to develop an interesting drama. Freytag’sPyramid or “dramatic arc”, [15], as shown in figure 2 is veryuseful. This referred to as a ‘dramatic arc’ [24] in this paper.The dramatic arc outlines the rise and fall that can be foundin an engaging narrative. The story will start with an incitingincident, which aims to capture the the audience interest ( a ).Followed by a steadily climbing suspense as the plot thickensin order to further captivate the audience ( b ). Until the storyreaches its climax ( c ). Followed by the resolution as the plotuncurls and the player learns the full truth ( d  ) after whichthe drama may reach closure ( e ).ab+cde                v  v  v  v  v  v  v  Fig. 2. Freytag’s Pyramid or “dramatic arc” [15] The director is not planning every detailed move of theactors, but rather the overall dramatic structure which makeshis planning of acts and episodes a much more tractableproblem.  B. Schemas Each schema has a finite set of roles that are annotatedas being essential or non-essential. It is only necessary tofill the essential roles to successfully execute a schema, thenon-essential roles add variety and increase flexibility. Eachrole is annotated with a finite set of characteristics that itsupports. The characteristics also have a numerical value 428 2010 IEEE Conference on Computational Intelligence and Games (CIG’10)  attached to them, this represents to what degree the displayof this characteristic is supported by the role.The director uses the set of characteristics to match theroles to actors, deploying the schemas that best complimentthe various characteristics of the characters. The directoris not in a good position to make decisions about directinteractions with the player, because the director would needto be constantly aware of everything that takes place – includ-ing the internal state of every character in the drama. Thiswould quite rapidly escalate into an intractable computationproblem for the director. C. Actors Actors play characters in the drama, their main task isto decide on an appropriate action if any. There are threeprimary conditions for acting; response to stimuli, responseto internal process, response to a request to act made byother actors.1) Response to Stimuli: When an actor or a player acts,it will inform each of the actors that share the same sceneof the action taken. When an actor is informed of an actionthen a separate internal thread first checks whether the actionis directed at the actor and whether the action is detailed inany of the drama-schemas that the actor is currently in. If neither applies then the action is not relevant to the actorand the actor will not attempt a response, the actor will addany gained knowledge from the action to its knowledge base,(for instance information regarding the fulfilment of schemagoals and info on the current location of other actors).If the action is relevant to this actor then the actor willstart a separate thread that evaluates an optimal response.The algorithm is as follows: For actor a 1 and actor a 2 .- a 2 Sends out a notification to allactors on the scene that it has takenan action.- a 1 Reads the notification and realisesthat it was a question addressed to it.- a 1 Computes a set of optimal responsesto the question with respect to itsknowledge base, character, situation,emotions and its currentgoals.- a 1 adds the set of optimal responses toan array of other applicable actions withadded weight to indicate priority.- a 1 then evaluates what the set ofoptimal actions is and picks one toexecute. As seen in the algorithm, the actor needs to first add his setof optimal responses to an array containing other applicableactions because the actor may be preoccupied with somethingelse. For example the actor may be talking on the telephoneand not ready to answer. 2) Response to Internal Process: The actor will also actdue to an internal process that initiates an action. This is toengage the player if the player has been inactive or if thereare unfulfilled drama goals that will progress the drama butthe player has not initiated actions that would lead to them.All actors have both drama related goals and character relatedgoals. The drama related goals are of the form of aidingthe drama progress as expected. For instance, actors havethe goal of revealing relevant clues to the user. The actorsalso have character related goals such as showing specificcharacteristics or hiding their lack of alibi in order to portraya believable character. 3) Response to Request: [3] During the unfolding of the drama each actor is both responsible for acting in aconvincing role in the drama and to fulfil drama-specificschema goals. For instance, in an interrogation where twocharacters John and David are present and David has a verystrong false alibi then the actor agent playing David can makea request of the actor playing John that he has John revealthe flaw in the alibi as David would hardly sabotage hisown cover directly. John might then say ”How could youhave caught the 08:15 train when I saw you at 08:20 inthe garden”. This does not require advanced logistics. TheBayesian networks facilitates this type of sentence generationso long as it is domain specific and the Bayesian networksfacilitates such domain specific querying due to the fact thatthey are structured into specific domains. The actor playingJohn searches for any knowledge in the specific domain thatwill increase the utility of the opportunity variable.The algorithm is as follows: Where David’s actor is a 1 and John’sactor is a 2 .- a 1 recognises that he needs to revealthat his alibi is faulty.- a 1 recognises that he can’t reveal thatdirectly to the user in a believable way.- a 1 sends a request to a 2 in the form ofdrama-goal: reveal opportunity of a 1 .- a 2 queries his knowledge base foractions that will satisfy the goal.- a 2 adds any action that satisfiesthe goal in the request to an array ofpossible actions with added weight toindicate priority.- a 2 then evaluates what the set ofoptimal actions is and picks one toexecute. As seen in the algorithm, a 2 will not necessarily act on therequest. a 2 will first query its knowledge base for relevantactions and after adding them, with suitable weights, to theset of other relevant actions, a 2 will pick an action from thewhole set of possible actions based on a 2 ’s characteristicsand other goals. This means that a 2 may or may not aid therequesting actor depending on context and a 2 ’s priorities. 2010 IEEE Conference on Computational Intelligence and Games (CIG’10) 429  This is necessary as it may not fit either a 2 ’s knowledgebase or a 2 ’s current interaction with the user to comply with a 1 ’s request.III. T HE BN ARCHITECTURE BNs are very good to reason about uncertainty such as inhuman behaviour because they provide the means to deter-mine the probability of the various possible causal outcomes.The variables in a BN can represent anything ranging fromsystem mechanics (e.g. whether there is fuel in a car orhow fast the car can accelerate) to the intricacies of humantraits and emotions. BNs offer the possibility of mapping thetransiency of emotions and the resulting transient decisionsthat humans make. There has been a growing interest withinPsychology and Philosophy as well as in Computer Scienceon using BNs to model human thinking [17], [23], [38]. Weuse causality extensively in our daily lives from deciding howto best manage to get the kids to school on time, to troubleshooting why the car will not start, and also when decidingon actions when highly influenced by emotions like anger orfear.The variables in the agents BNs represent knowledge itemssuch as hair colour, shoe size, relationship status, scenesand objects in the virtual world and the causal connectionsbetween these. Each state of a variable can be used to makea speech action. For example, in figure 3, the variable shoesize with states ranging from 36 - 45 can be used by theagent to say that Kenneth’s shoe size is 42. For example, theagent has been given evidence about basic attributes such asshoe size. This means that the agent has evidence for state42 of the shoe size variable, see figure 3. Fig. 3. A small example to demonstrate sentence creation The agent can use this to form the speech “Kenneth’sshoe size is 42” by mapping the state of the variable to thecorresponding authored sentence, e.g. “ { agent } shoe size is { value } ” = Kenneth’s shoe size is 42. If the agent is notcertain, as in figure 3, the sentence can be Kenneth’s shoe size could be 42. We format sentences in this way by authoringthem in XML and attaching tags that indicate what the inputand goal variable is. In this example the input variable isshoe size and the goal is suspect.The example in figure 3 is just a small piece of the wholenet. We use object oriented BNs (OOBNs) [21] to createan agents network from a script file containing descriptionsof around 150 variables. These variables describe the bareessentials of a mystery plot such as the motive, means,opportunity and character features. From these variables amuch larger network is created to represent the agents beliefsof itself and of other agents in the drama. The agent also hasbeliefs of what each of the other agent beliefs are of eachother. The resulting BN has over 3000 variables.IV. T HE PROBLEM 3000 variables is not considered to be a very big network to implement a virtual agent. We expect that an averageagent in an interactive drama could easily have a fewhundreds of thousand variables. However, even this smallBN takes several seconds to update with basic BN reasoningalgorithms, which clearly is not sufficiently fast for real-timeapplications. Not only is it not fast enough but it also leavesno room for all the other processes, such as graphics andanimation that are highly demanding on resources in anyhigh graphics computer game.V. R ELEVANCE R EASONING Relevance Reasoning is a well-known technique for BNs,that uses d-separation to determine which variables needto be updated to compute inference on one or more targetvariables given evidence [25]. D-separation holds when twodistinct variables A and B in a network are not affected byeach other receiving evidence. In other words, A can be givenevidence without it affecting the inference of B and viceversa [20]. Relevance Reasoning identifies the variables thatneed to be updated based on which variables receive evidenceand which variables become affected and are needed tocorrectly compute the inference of the target variables. Thismethod greatly speeds up the updating of the network inthe general case. The worst case is if all the variables inthe network are affected by the evidence and are needed tocompute the target variable. This means that in the worstcase it is NP-hard.For example if we look at the small BN in figure 4. If the input variable I  is C and the target variable T  is L thenonly variables C, D, E, G, I and L need to be updated to getcorrect values in L .VI. T HE PROCESS When an agent needs to decide on a speech action, theagent will take the following steps:1) Find applicable sentences2) Evaluate each sentence3) Choose an optimal sentence 430 2010 IEEE Conference on Computational Intelligence and Games (CIG’10)   B  D  F  ¨  ¨ %r  r j¨  ¨ %r  r j¨  ¨ %r  r j  A  C  E  G  ¨  ¨ %r  r j¨  ¨ %        ©  H  I  ¨  ¨ %r  r j¨  ¨ %r  r j  J  K  L    c c  M  N Fig. 4. A small Bayesian net to explain relevance reasoning  A. Applicable sentences Fig. 5. A small example to demonstrate context First the agent finds a set of applicable sentences toevaluate, by querying the BN for a set of sentences thatsatisfy the goal and are contextual to the input. The inputis any speech act that is causally connected to what was lastsaid in the conversation. For example in figure 5 then allthe variables are contextual to the Motive variable. If anyof their states is given value, it will affect the belief of the Motive variable. For example if the Lostafortune variableis instantiated to true, it will positively effect the belief of the IsSwindled variable, increasing the probability that thesuspect was being swindled by the victim.The agents first decides which actions are applicable. Thisthe agent does by means of relevance reasoning as describedabove. T  (target) is the goal of the agent and I  (input) isfor instance a speech action that the agent is evaluatingto realise a goal. The agent uses T  and I  to extract thecorresponding object O . O is extracted from a fully updatedBN with the values that the variables in O had when in theBN. Because the BN is fully updated then the values in eachvariable are valid and up to date. The input values are thengiven as evidence to I  and the values are updated for allthe variables in O . The values of  T  are then read and theirdistance (difference between the intended values of the goaland the inferred values of the goal variable) from the goal isstored. Doing this repeatedly for all the actions that the agentis evaluating, results in a set of applicable actions. Using thenet in figure 4 as an example, if the C variable is I  and L is T  then we extract (C, D, E, G, I, L) as O . C then receivesevidence and only the variables in O are updated. The resultis the values in the L variable.For example, see figure 5, if we want a sentence that saysthat motive is true = 100% and if we put evidence in the  Lost a fortune variable, true = 100%. The sentence could be:“Kenneth could have a motive because he lost a fortune”. The motive variable would then have true = 98% which makesthe difference between the target 100% and the value 98%be 2 which is stored with the possible sentence in an array of applicable actions. We choose only those actions that reducethe gap between the target value and the initial value.  B. Evaluate sentences When we have a set of applicable actions, that is actionsthat we have just extracted because they contribute towardsthe goal as discussed above. It now remains to filter theapplicable actions for those actions that satisfy the charactertraits and mood. This can be accomplished with severalexisting personality models. For instance the personalitymodel created by Ball and Breese for Microsoft [5]. theapplicable sentences should be qualified in some way bywhat type of personality they cater to and how they willbe influenced by emotion. We mark each possible actionwith a range of traits and emotions that the action wouldbe characteristic of. This means that if the agent is angrythen the agent will be more likely to choose actions thatindicate anger such as accusing someone. The agents traitssuch as intelligence or arrogance will also make the agentmore likely to choose a sentence that demonstrates this. Forinstance an intelligent agent will explain more why a suspecthas a motive.Using BNs each sentence is evaluated and those that bestmatch the characters traits and emotions are entered into aset of optimal action. From the set of optimal actions oneaction is chosen at random and executed.VII. T ESTS & R ESULTS To test our method we generated a Bayesian network from a script that encompasses a agents belief about itself,its belief about other agents and its beliefs about otheragents beliefs about other agents. This network contains 3348variables, 1 decision variable and one value variable, 3638arcs, 10606 states and 58610 parameters.The tests were run on a laptop with Windows XP, Intel(R)Core(TM)2 Duo CPU 2.59 GHz, 3.00GB RAM. TABLE IR ESULTS QueriesWithout RR With RRin seconds in seconds50% 10 sec -30 sec < 1 sec30% 30 sec - 3 min < 1 sec20% > 3 min < 1 sec 2010 IEEE Conference on Computational Intelligence and Games (CIG’10) 431
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!