Government & Nonprofit

A Subsymbolic and Visual Model of Spatial Path Planning

A Subsymbolic and Visual Model of Spatial Path Planning
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A Subsymbolic and Visual Model of Spatial Path Planning  David Reitter and Christian Lebiere Department of PsychologyCarnegie Mellon UniversityPittsburgh, PA cl@cmu.eduKeywords:Cognitive Modelling, Navigation, Vision, Search, Cognitive Imagery, ACT-R ABSTRACT: Planning a path to a destination, given a number of options and obstacles, is a common task. We sug-gest a two-component cognitive model that combines retrieval of knowledge about the environment with search guided by visual perception. In the first component, subsymbolic information, acquired during navigation, aids in the retrievalof declarative information representing possible paths to take. In the second component, visual information directs thesearch, which in turn creates knowledge for the first component. The model is implemented using the ACT-R cognitivearchitecture and makes realistic assumptions about memory access and shifts in visual attention. We present simula-tion results for memory-based high-level navigation in grid and tree structures, and visual navigation in mazes, varyingrelevant cognitive (retrieval noise, visual finsts) and environmental (maze and path size) parameters. 1 Introduction Planning how to get to an intended location is and haslong been an essential challenge for many species. Fromthe simple task to reach the fruits in a tree or the win-dow in an office filled with desks, to the more complexproblem of driving to a destination in an urban jungle, weneed to employ cognitively efficient, yet practically effec-tive strategies to reach our goals. Navigation, like generalplanning, is about making choices on the way to reach-ing a goal from an initial state. Many levels of cognitivemechanisms are relevant to this process, including percep-tual scanning, knowledge-based search and memory-baseddecision-making. A general theory of navigation and plan-ning has to capture all those levels, and most crucially en-able their interaction.One intent of our approach is to capture the latter level,that of which higher-level cognitive processes play a cru-cial role. We are particularly focused on the role that sub-symbolic mechanisms play in this process for a number of reasons: they are more likely to be strongly constrainedby the cognitive architecture, and thus to generalize betteracross domains, and they are also likely to yield the moresignificant advantages over purely symbolic knowledge-based and perceptual approaches. For instance, in chessplaying humans can compete against computers that cansearch several orders of magnitude more variations thanksto the sophisticated but nearly instantaneous evaluation of positions that allows them to select only very few possiblemoves for further search. While that evaluation leverageslarge amounts of expertise accumulated over years of prac-tice, a symbolic, explicit evaluation of that knowledge baseis out of the question given time constraints and thus thekey mechanisms enabling it must be subsymbolic.The second level we intend to address is to integrate per-ceptual scanning with the navigation process. Among theguiding hypotheses in developing the visual componentswas the idea that most local planning problems can besolved visually, with minimal involvement of higher cog-nition (memory and rule-based algorithms). Specifically,situations that provide a subject with a visual impressionof much of the path allow them to ground turning pointsalong the path in visible locations and retrieve them fromthere. More complex problems require the recognition of previously visited locations, and yet more complex plan-ning problems require non-geometrical knowledge aboutreachable goals for many existing locations. Even then,subsymbolic information is used to estimate the effort in-volved in taken a specific route, and to retrieve a good inter-mediatestepgivenstartandtargetpoints. Asinanyrationalmodel, suchinformation isacquired inawaythatoptimizesthe mean performance.Thus, this paper describes the components of a dual-strategy model that leverage subsymbolic and declarative  knowledge as well as a visually guided heuristic withina validated cognitive framework. The goal behind thiswork is to provide an integrated model of visual- and non-visual planning, applicable to combined macro- and micro-planning tasks such as searching a building and all of itsrooms. Such a model will be implemented in an exist-ing cognitive architecture such as ACT-R (Anderson et al.,2004; Anderson, 2007) with perception-side heuristics de-veloped for the task. The cognitive architecture leveragesnot just pre-defined, domain-specific symbolic knowledge,but also makes it clear how subsymbolic knowledge is ac-quired to optimize task performance. In this paper, wepresent two models that implement high-level and visualsearch within the ACT-R framework. We will discuss re-sults of simulations with prototypical navigation problems.Our models, even though primarily predictive, show realis-tic results with respect to cognitive limits. 2 Recent work Models of spatial navigation have addressed path findingas well as the representation mechanisms. For instanceknowledge of locations may be arranged in two- or moredimensional arrays (e.g., Kosslyn, 1980; Glasgow & Pa-padias, 1998), or multiple layers of spatial representations(Kosslyn, 1994). A representation supporting path plan-ning will not only need to store distances between lo-cations, but also the affordances of connections betweenthem: some routes may be unavailable or suboptimal. Onthe side of path finding algorithms, Fum & del Missier(2000) present data showing how humans develop spatialplans in a 2D environment that contained a varied numberof obstacles. Subjects performance was measured by thetime to find the path and the number of unnecessary stepstakencomparedto theoptimalpathtothegoal(errors). Thenumber of turns was found to be a crucial predictor of plan-ning time. The model proposes that subjects choose locallyoptimal paths (a hill-climbing strategy) and minimize thenumber of turns; they do not develop a complete plan be-fore committing to initial steps. Our model is similar in thatthe cognitive level also prefers to backtrack locally in orderto avoid long-term memory needs. In the visual model pre-sented in this paper, visually guided local planning resultsin long, straight lines with few turns.The integration of cognitive architectures with path find-ing approaches represents a third field of work. (Chan-drasekaran, 2006; Dye, 2007) attempt to blend cognitiveand perceptual factor in navigation and planning applica-tions in their work on implementing diagrammatic reason-ing in cognitive architectures. Lathrop & Laird (2007) ex-tend the Soar architecture with a visual system that is ableto render imaginary drawings in tasks that involve reason-ing about the relationships of geometric objects. As theyshow, these tasks can be carried out on a non-visual, largelysymbolic (or arithmetic) level. Visual imagery, however,speeds up the process (in line with data). Within the con-text of the ACT-R architecture, much work has been doneon the problem of spatial planning and representation, in-cluding issues of adaptivity in planning (Fu, 2003), encod-ing of spatio-temporal stimuli (Johnson et al., 2002), visu-alization capacity of spatial paths(Lyon et al., 2008), spatialperspective taking for planning (Hiatt et al., 2004) and ar-chitectural modules for neurally plausible navigation in 3Dspaces (Schunn & Harrison, 2001). In this paper, we ad-dress the case of route planning based on information thatis available externally, rather than purely held in memory. 3 Model overview Our two models combine in planning a method to arrive ata goal point, given a current location and a partially observ-able environment, which constrains individual steps.The representation in the first model is abstract andmemory-based. Given the immediate surroundings, op-tions for a next step can be determined. The abstract rep-resentation allows us to store locations, their local options(i.e., outgoing paths) and how useful these options are withrespect to a given goal. The abstract situation is encodedas declarative knowledge, indicating the possibility to tra-verse from point A to point B . Such situations may be re-called with the help of cues: this could be a goal location C  , but also a preceding traversal. This makes goal-directedplanning possible and predicts that sequences of decisionsrather than just individual steps are learned and combinedduring path planning. The traversal of a maze according tosuch abstract representations amounts to traversing a graphstructure. The abstract representation is grounded in per-ceptual components of the architecture in visual concepts(landmarks). Landmarks may serve as cues to retrievelocation-based information during planning. It should benoted that the abstract representation of affordances alonewouldbeinsufficienttoexplainclassicmentalimagerydata(Shepard & Metzler, 1971).The second model concerns the visual system, which hasaccess to the part of the visual scene that it attends to atthe time (cf., the visual representations in Glasgow & Pa-padias’s (1998) model). Our visual model represents pos-sible paths as largely straight lines from the point of atten-tion to reachable (immediate) goals. Given a first-personperspective, such lines will equal lines of sight; given atwo-dimensional (2D) representation of the maze (as in ourexperiments), possible paths are detected as straight linesthat are uninterrupted by walls. In the latter (2D) case,  knowledge of previous decisions is strongly grounded inthe visual world. Memory that would otherwise be declara-tive on the abstract level is externalized in the visual scene.The role of the perceptual, visual module that the visualmodel depends on is to identify traversable shapes and se-lect promising ones: the adopted heuristic is to choose theroute that ends up as close to the goal as possible. Thus,the visual module is to convey a sense of distance from theend of straight lines to the goal. Naturally, the two mod-els have strengths and weaknesses. Learning to navigatearound a city, for instance, will be a memory task, whilemost navigation in a park unknown to the subject would bebetter suited to the visual approach. The two models are in-tended to combine; we will describe our approach for theirintegration. 4 High-level Planning Model 4.1 Model The memory-based model primarily leverages the subsym-bolic mechanisms of declarative memory in the ACT-R ar-chitecture. While subsymbolic mechanisms also play alarge role in procedural memory, there were a number of reasons for focusing initially on declarative subsymbolicprocesses: • Declarative memory provides a more direct integra-tion path with symbolic and perceptual informationsince those sources of information are initially (andperhaps largely) stored in declarative memory. • Decision-making typically follows a path by whichit starts with declarative processes that are then ulti-mately (if possible) compiled into procedural struc-tures. As such, a declarative account is an enablingaccount to a subsequent procedural one. • Subsymbolic procedural processes largely consist of a utility calculus determining the selection of produc-tion rules. Since it bears a strong resemblance to rein-forcement learning techniques that have been appliedextensively to navigation and planning tasks, there aremore limited possibilities for improvement there. • Declarative subsymbolic mechanisms reflect morecomplex and discriminating statistical and semanticfactors than procedural subsymbolic mechanisms thatgive them greater power in complex domains, as de-termined from experience in applying ACT-R to thegame of Backgammon (Sanner et al., 2000).Before describing the subsymbolic mechanisms, we willbrieflysketchoutthesymboliclevelofthemodelthatlever-ages those mechanisms, more specifically the declarativerepresentation of the problem and the production rules thatmanipulate them. Declarative memory items, chunks , maybe of two basic types: location chunks that define statesof the system, and path chunks that define transitions fromone state to another. In addition, there is one basic goalchunk type representing the planning process that holdsthree pieces of information: the current state, the desired(goal) state and process information such as the currentstep of the process and ultimately an intermediate state de-termined by that level of planning. These goals are con-structed during path planning and will enter declarativememory. There, they constitute a record for purposes of backtracking as well as learning across planning episodesthat would allow previous partial solutions to be reused. Assuch, one can view specific paths between states as a spe-cial case of past planning solutions. Similarly, the produc-tions that act on those representations are equally straight-forward: • The key production retrieves a path from memory thisis the key step that leverages declarative subsymbolicprocesses. • A subsequent production checks if the path starts atthe current location if so it advances along the pathand subgoals the process of moving from the endpointof that path to the final destination. • If that is not the case, another production subgoalsthe process of getting from the current location to thepaths starting point, and changes the goal to get fromthat point to the final destination for later resumption. • If a subgoal (such as the one set by the previous pro-duction) has been completed, then another productionattempts to retrieve the next highest goal and resumeit. • If no such higher goal exists, the process terminates insuccess. • If no path can be found from the current location thathas not already been taken (to avoid looping), the pro-cess terminates in failure.To focus on determining the ability of subsymbolic mecha-nisms to find the right path, we avoided in this basic modelthe use of backtracking mechanisms that could assure suc-cessful planning through exhaustive consideration of allpossible routes. However, such a process could be eas-ily integrated to leverage the record of previous (sub)goalsin declarative memory, and would be triggered at the laststep of the process described above. The key step is there-fore the retrieval from memory of the next path to be con-sidered, which attempts to maximize the activation of thechunk retrieved. Memory activation is a sum of three terms  (not counting a stochastic noise term to be discussed later),each of which aims to capture another statistical factor inthe path selection process. The first term (a.k.a base-level)attempts to capture frequency and recency, following thepower laws of practice and forgetting, respectively, and canbe interpreted as the log odds that a given path will be theright one. In contrast to this context-free term, the secondterm (a.k.a. spreading association) attempts to capture thelog likelihood (in Bayesian terms) that this path is the rightone given the current context elements. Technically, eachcontext element (in practice, the current and destination lo-cations) spreads activation to related chunks, in this casepaths between locations. In practice, the additive spreadingactivation mechanism embodies the na¨ıve Bayes assump-tion of independence between context elements, so there isno guarantee (and indeed it often fails) that a path relatedto (not necessarily connected to) both locations would bea good one to get from one to the other. Hence the neces-sity of a more explicit semantic factor reflected in the thirdterm (a.k.a. partial matching), which imposes a penaltyon chunks proportional to their dissimilarity with the pat-tern specified in the retrieval request. In practice, since thelocation chunks to be retrieved are represented with theirstarting and destination locations, we desire for the startingand destination locations to be as similar as possible to thecurrent and final destination locations, respectively. Thesefactors added together effectively balance the requirementsthat paths selected in the retrieval process reflect the con-straints favoring common paths (a powerful heuristic re-flected in all human actions), those often taken from and tothe specific locations, and those that make progress towardthe final goal.While those factors should ultimately be learned from ex-perience using architectural learning mechanisms, in theabsenceofamodelintegrating thevariouslevels previouslydescribed, we have set these parameters to reflect idealizedvalues reflecting their semantics and the convergence pointof the learning processes. Accordingly, the base-level acti-vation of a path chunk is set to the log odds that that pathwould be on the shortest path between two locations, av-eraged over all possible starting and ending location pairs.The strength of association from a location to a path is setto the log odds that the path is part of the shortest path fromthat location to or from another, averaged over all other lo-cations. Finally, the similarities between locations usingin matching the path starting and destination locations areset to reflect the semantics of the domain, reflecting a gen-eral sense of proximity, basically as a negative exponentialfunction of distance between the points. Unlike the firsttwo factors which reflect the statistics of previous problem-solving experience, the third factor most likely reflect theinteraction with the domain itself such as resulting from thevisual level. The use of these activation factors is similar to 0.0 0.2 0.4 0.6 0.8 1.0         0  .        4        0  .       5        0  .        6        0  .       7        0  .        8        0  .        9        1  .        0 NoiseP(Success)P(Perfect Solution)Opt/Length Figure 1: Model performance degrades with increasednoise levels w.r.t. three measures: success rate, rate of per-fect solution and optimal-to-obtained-path length ratio.that of Lyon et al.’s (2008) model of 3D path planning. 4.2 Simulation We tested this high-level planning model in two idealizedbut naturally scalable environments: trees and grids . A treestructure is meant to approximate the hierarchical organi-zation of structures such as buildings, with heavily trav-elled central connectors (elevators, main hallways) and in-creasingly localized destinations (wings, rooms). A gridstructure is meant to approximate the regular pattern of citystreets. We ran our model over structures of both types overa range of complexity, topping (mostly for incidental com-putational reasons) at trees of depth 4 (approximately themost levels of organization in hierarchical networks suchas large buildings of road networks) and 5x5 grids (whichmight seem small but provides many combinatorial possi-bilities and could be applied in a self-scaling manner). Theonly parameter that we manipulated in the model was theamount of activation noise. This was designed both to re-flect the stochastic nature of human cognition (which mayseem like a purely limiting factor but is actually a power-ful feature to avoid both predictability and local minima insearch, e.g. West et al., 2005) and temper the assumptionthat the activation calculus parameters were set to idealizedvalues that learning mechanisms would not perfectly reach.Figure 1 displays the performance of the model as a func-tion of activation noise, averaged over all environmentstructures and sizes, in terms of three measures. Theyare the probability of the planning process to successfullyreach the goal (far from assured given the constraint of   0 2 4 6 8         0  .        7        0  .        8        0  .        9        1  .        0 LengthP(Success)P(Perfect Solution)Opt/Length Figure 2: The model predicts lower success rates for longersolutions. When a solution is found, it is usually less than25% longer than the optimal solution.never using the same path twice and the lack of backtrack-ing), the probability of finding the shortest path assumingsuccess, and the ratio of shortest path to actual path length(again assuming success). While, as expected, all three per-formance measures decrease sharply (in remarkably corre-lated fashion) as a function of activation noise, a typicalnoise value of 0.25 used in many models of memory-baseddecision-making (e.g., Gonzales & Lebiere, 2005) provideover 90% performance on all measures.Figure 2 displays the performance of the model as it scalesto paths of increasing lengths, averaged over all noise val-ues plotted in the previous figure. An interesting patternarises for paths of length 2 to 6, displaying an inverse rela-tionship between decreasing probability of success but in-creasing probability of finding a path of minimal length as-suming success. This pattern is however not present forpaths of length 7 and 8, which consist only (for artefactualreasons) of grid structures. In general, the probability of success seems to scale well (over this admittedly limitedrange) as a function of path length, especially consideringthe wide sampling of high noise values reflecting in thisaverage.Future experiments will involve somewhat more realisticenvironmental structures, including small world networksas a generalization of tree structures that better reflects theoverall road network, and probabilistically introducing dis-abled or one-way paths in grid structures, to more accu-rately reflect traditional city environments (e.g. Manhat-tan). 5 Visual Navigation Model 5.1 Mazes In line with our srcinal motivation to model navigationtasks that are closer to real-life navigation over short dis-tances, we also investigated path planning using mazes ,which allows us to manipulate the complexity of the task along several dimensions. Our mazes are of size 10 by 10squares or larger. They are relatively easy to solve: theydo not generally contain many long dead-end routes, re-quiring the model to backtrack only occasionally. Other,more complex mazes, would define the task primarily as amemory problem: there, remembering previous branchingpoints is the most important sub-task of solving algorithms.Mazes were generated using a dynamic programming algo-rithm commonly known as Eller’s algorithm . Start and endpoints were always located on opposites sides of the maze. 5.2 Model The visual navigation model always chooses the best routealongalineofsightfromthecurrentlocation; thebestrouteis the one that transports us as close to the goal as pos-sible. Routes that avoid bringing the model to previouslyvisited parts of the territory are preferred. If a route bringsus away from the goal, then we are careful to detect alter-native routes along the way: the model inspects the areasleft and right at each step, stopping when there is a way out.Such a way out is likely to be more useful than to retreatfurther from the goal.In the case of mazes, visual navigation relies on the recog-nition of  passages . Such passages are stretches of straightlines, translating to a lineof sight in the corresponding first-person perspective environment. The visual model recog-nizes passages beginning at the location of visual attention.The model then commits to the passage that brings it theclosest to the goal. We call the intermediate target locationchosen this way the next stop .While traveling to the next stop, the model notes its loca-tions; the most recent ones are accessible in form of  visual finsts (Pylyshyn, 1989). This mechanism allows the modelto distinguish previously visited locations from novel por-tions of the maze. While making its way to the next stop,the model does not normally inspect the surroundings forpossiblealternativeroutes. Weexpectourmodeltobecom-patible with more recent, graded views of visual salience(Byrne, 2006), but the model presented here does not re-quire any subsymbolic representation beyond the raw im-age of a maze.
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks