Gadgets

Saccades and Fixating Using Artificial Potential Functions

Description
This paper presents a mathematical model for saccadic motion and fixations. We relate this issue to the problem of motion planning and show that a family of artificial potential functions can be used for creating saccadic motion. The advantage of
Categories
Published
of 6
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Transcript
  Saccades and Fixating Using Artificial PotentialFunctions B. Deniz ˙Ilhan, ¨Ozg¨ur Erkent and H. Is¸ıl BozmaIntelligent Systems Laboratory, Electrical and Electronic Engineering,Bogazici University, Bebek 34342 Istanbul Turkey  Abstract —This paper presents a mathematical model forsaccadic motion and fixations. We relate this issue to the problemof motion planning and show that a family of artificial potentialfunctions can be used for creating saccadic motion. The advan-tage of this approach is that finding the next fixation point doesnot require an explicit visual search - which is computationallycostly and may be problematic in real-time applications. Rather,the system naturally ‘slides’ from the current fixation into thenext. Thus real-time performance on cheap hardware can easilybe achieved. Experimental results serve to provide insight intothe performance of a robot APES implementing this approach. I. I NTRODUCTION Inspired by human vision, an attentive robot works byallocating its limited computational resources to only theinteresting parts of a visual scene. This is done by saccades,rapid eye movements that direct the optical axis from thecurrent fixation to the next such that the fovea which is thehigh resolution area around the fixation point - overlaps withthis interesting area. It has been proposed that this is achievedthrough visual search on a saliency map - a two-dimensionalmap that encodes the “interestingness” of the objects in thevisual scene. Hence, the problem of how to select new fixationpoints is treated separately from the problem accomplishingthe saccadic motion necessary for moving to this new point.In this paper, we present an alternative approach - in whichthe two stages are integrated in a unified framework. In thisapproach, a family of artificial potential functions – each of which encodes the saliency map features around the currentfixation point – is defined and given the current fixation point,the next fixation is generated simply by ‘sliding’ into theequilibrium point of the associated surface. The advantage of the approach is twofold: there is no need for an explicit visualsearch and the camera actuator commands are automaticallygenerated. In turn, the camera finds a sequence of fixationpoints without exhaustive search and in real-time. Of course,the ‘psychophysical correctness’ of the fixation points thusgenerated is prone to experimentation and is beyond the scopeof the paper. Furthermore, since different saliency features- including top down features - can be used to define anartificial potential function, shifts of attention can be madeto occur in a ‘programmable’ manner. In this paper, weshow the construction of potential function using one of themost common saliency features – image gradients. However,in general they could easily be constructed based on otherfeatures such as color, texture or previously memorized top-down features. The organization of this paper is as follows: Inthe rest of this Section, we briefly overview related literature.In Section 2, the theoretical framework of fixation generationis presented. In Section 3, we describe the construction of ourparticular family of artificial potential functions. Experimentalresults in real scenes are then discussed. The paper concludeswith a summary.  A. Saccadic Research Studies in vision science have revealed that biologicalsystems view their surroundings by performing saccades ina continual manner between different points of interest, oneach of which fixation is maintained briefly [6]. Indeed, it hasbeen estimated that humans make more than three saccades persecond [12]. Hence, an intrinsic part of vision is to understandsaccadic movements - that is how the eyes “move to the rightplace at the right time so efficiently and seemingly effortlessly”[12]. Saccades are categorized as “stimulus-driven” when thereis no deliberation and voluntary in case of deliberate selection[13]. Some results suggest that stimulus-driven and exploratorysaccades are generated in neural circuits that operate in dif-ferent spatial coordinates [13]. In attentional models based onthe spotlight or the zoom-lens metaphors, a beam of attentionof either fixed or varying size is described as moving either inan analog or discrete manner with mixed supporting evidence[14]. Some findings indicate that as the eye drifts further fromits mean position, the more likely a saccade is to occur [1].Furthermore, saccades tend to be concentrated on regions of complexity [5], [6]. The plausibility of a mathematical modelwhere a saliency measure is formulated as the sum of sine-waves with fundamental frequency equivalent to the spatialfrequency of the requested attention distribution is shown in[14].  B. Selective Vision Mimicking visual findings in some way, researchers haveproposed attentive vision where camera gaze is directed tonew visual targets [10], [11]. Various theoretical models thataccount for the targeting of fixation points have been proposed.Most are based on visual search mechanisms that generateemergent sequences - i.e. sequences computed a step at a timethat are analogous to motor behavior mediated by perceptualfeedback and a small amount of high-level control [4], [7],[9], [15]. Alternatively, explicit sequences where ‘memorized’ 1-4244-0259-X/06/$20.00 ©2006 IEEE 5819 Proceedings of the 2006 IEEE/RSJInternational Conference on Intelligent Robots and SystemsOctober 9 - 15, 2006, Beijing, China  sequence representations are retrieved in a manner analogousto skilled motor behavior are also utilized [8]. However, in allthese approaches, explicit visual search is required. The nextfixation point is computed first and then the camera saccadesto this new point. In this paper, we propose an approach thatintegrates the two stages. In this model, a family of artificialpotential functions parameterized by the fixation point andencoding the saliency surface is defined. Given the currentfixation point, the next fixation is found simply by ‘sliding’into the equilibrium point of the associated artificial potentialsurface. Since different saliency features - including top downfeatures - can be used to define an artificial potential function,explicit or emergent sequences can both be generated. C. Problem Statement  Suppose that we have a robot that is exploring its scenethrough saccadic motion. Let the robot’s image plane bedenoted by  P  . Note that  P   is a compact and connectedcomponent. Suppose the robot is fixated on the point  x k  ∈  P  .Each point  x  ∈  P   in the state space denotes a candidatepoint for new fixation point  x k +1  given the current fixationpoint. The ‘naive’ fixation generation problem may be statedas follows: Given an image I and an initial fixation point  x 0 ,define a family of control strategies and a switching law suchthat the robot saccades to a sequence of fixation points  { x k } , k  = 1 , ··· ,K  .II. S ACCADES  U SING  A RTIFICIAL  P OTENTIAL  F IELDS The idea of using “potential functions” for the specificationof robot tasks with a view of control problems in mind waspioneered by Khatib [2] and Koditschek [3] in the contextof obstacle avoidance and navigation. Our studies show thatit may be also preferred in fixation generation. Moreover,there are theoretical reasons to prefer the ’natural control’methodology to some of the other traditional approaches: Thelack of formal analysis of the limits of robustness constitutesa major drawback of many methods proposed in vision. Inorder to compensate for this, an extensive testing is alwayscarried out. In contrast, the potential field approach sets aformal framework for studying the special properties of thesystem.Let a set of “saliency” functions  ϕ x k  :  P   →  [0 , 1]  – acollection of smooth scalar valued maps on the state spaceparameterized by  x k . Each function is constructed in a manneras to encode the salient points in the neighborhood of the as-sociated fixation point. The measure of saliency can be definedfrom pre-determined features such as intensity gradients, color,texture, high-level representations depending on the applica-tion. However, regardless of the chosen saliency measure, thefunctions should attain minimum at the potential interestingpoints and should be maximal over uninteresting points. Letus presume that the camera’s dynamics can be satisfactorilydescribed by a simple first order model. Suppose it is currentlyfixated on  x k . The dynamical system governing motion towardthe next fixation point can be defined via constructing agradient field as,  ˙ x  =  − D x ϕ x k ( x ) where  x (0) =  x k , k  =0 , 1 , ··· . This closed loop system inherits the critical quali-tative behavior of gradient trajectories. Any local minimumis then designated as the next fixation point  x k +1 . First, letus note that each function in this family admits local minimasince any smooth function attains a minimum on a compactset. Furthermore, in general the potential generating functionsmay not have a unique minimum. Suppose the integral curve of  ˙ x  through the initial condition  x k  is denoted by  − D tx ϕ x k ( x k ) .If   − D tx ϕ x k ( x k +1 ) = 0  implies full rankness, then the limit set lim t →∞  − D tx ϕ x k ( x k )  is some isolated singularity. Otherwise,it can have a nontrivial manifold of minima. Thus, smallperturbations in the location of the current fixation point couldresult in large deviations in the location of the next fixationpoint.  A. Algorithm Saccadic motion is achieved by switching to a new con-troller everytime a new fixation point is reached. Once thecontrol law is selected, the cameras slide into their new fixationpoint by simply moving accordingly. Visual findings indicatethat there are two types of memory present here:1) Inhibition of return - The process by which the currentlyattended location is prevented from being attended again,2) Short-term memory - The process by which the last fewfixations are recalled and are being prevented from beingattended.Both processes are still not well known. In our case, thefollowing mechanisms are devised. •  Foveal inhibition: If there is a tendency to fixate in thecurrent fovea, the camera stops and does a random jumpin its visual field. •  Visual field inhibition: If there is a tendency to fixatenear the boundary of the visual field, the camera stopsand does a random jump in the new visual field. •  Memory inhibition: If there is a tendency to fixate neara previously fixated point, the system randomly jumps toan arbitrary point within its visual field. The previouslyfixated fovea locations are held in a first-in first-outmemory.Finally, an algorithm for saccadic behavior based on a switch-ing mechanism is constructed as follows:1) Define k as the fixation iteration2) Consider the current fixation point  x k . Move to a point x  on the minimum potential boundary of the fovealinhibition region.3) Switch to the control law  ˙ x  =  − D x ϕ x k ( x )  and movethe camera until the camera reaches a critical point orthe boundary:4) If the camera has moved into fovea or to one of theprevious fixation points, do a random jump in the visualfield.5) If the camera has moved into the boundary visual field,do a random jump in the new visual field.6) Increment k and go to step 1. 5820   B. Construction of Artificial Potential Function We now describe the construction of potential functions. Inthis case, we use image intensity gradients as salient features.However, let us note that the measure of saliency can easily bemodified to be based on other features such as color, texturedepending on the application. We proceed in a manner similarto [12].The function  ˆ ϕ  :  P   →  (0 , ∞ )  encodes the current fixationpoint as well the saliency measure as: ˆ ϕ x k ( x )   = 1 β  ( x,x k )  k  ∈ Z + (1)The denominator encodes the distance saliency measure. Itmay consist of several terms as: •  The current feature of interest: For simple, bottom-upprocessing this may be simply defined as  β  1 ( x ) = ∇ I  ( x ) T  ∇ I  ( x ) . •  Maximal coverage: The next fixation point should bemaximally away from the current fixation point. This maybe simply defined as  β  2 ( x,x k ) = ( x − x k ) T  ( x − x k ) .The saliency measure  β   is then constructed as the weightedaverage of these terms as:  β  ( x,x k ) =  w 1 β  1 ( x )+ w 2 β  2 ( x,x k ) .The zero level set  β  − 1 (0)  – denoted by  ∂P   – entails pointsin the image plane which do not draw any interest.Since  ˆ ϕ x k  blows up on  ∂P  , it is not admissible. In orderto make  ˆ ϕ  admissible, it is squashed by the function  σ  :(0 , ∞ )  →  [0 , 1] , defined by  σ ( x ) =  x 1+ x . The function isconstructed as the composition: ϕ x k ( x ) =  σ  ◦  ˆ ϕ x k ( x )  (2) Fig. 1. Image plane orientations.Fig. 2. Image transformation. C. Coordinate Transformation As soon as the camera starts moving away from the currentfixation point, the image planes change as shown in Fig.1.Hence, the fixation point and the current point at which theimage intensity gradients are calculated are at different planes.Hence, in order to measure any physical relation betweenthem,  x k  is projected onto the current image plane as shownin Fig.2. Using trigonometric identities, it is easy to show that |  x − x k  | =  W  I  tan( θ )2tan( φ )  where  W  I   denote the image plane width.In order to move the camera, the gradient vector which isof Cartesian nature needs to be transformed to the angularvelocities. Let  θ  and  φ  are the pan and tilt angles respectively,as shown in Fig.3(left). The arc we want to move along  φ direction is shown in Fig.3(right). The dependence of angle  θ is analogous. It is easy to see that the angular velocity com-ponents are directly proportionally to the Cartesian velocitycomponents: δφ  = 1 rδy δθ  = 1 rδz where  r  is the focal length. Fig. 3. Left: Computation of gradient vector; Right: Dependence of   φ  onthe gradient vector.Fig. 4. APES robot III. E XPERIMENTAL  R ESULTS A series of experiments have been conducted using APES– an attentive robot designed and built in our laboratory [9] asshown in Fig.4. The goal of the first set of experiments wasto observe the reaction of APES towards a simple scene inthe guidance of artificial potential functions. For comparison 5821  purposes, a sample scene 1 as shown Fig.5(top-left) was usedin the experiments. The fovea size was set to  30  ×  30  pixels.Suppose it is fixated on the area as shown in the same figure.The artificial potential surface components are as shown inthe rest of Fig.5 for weighting coefficients  w 1  = 0 . 0001  and w 2  = 0 . 001 . Fig. 5. Top-left: Simple scene and current fovea; Top-right: The featuremeasure  β  1 ; Bottom-left: The maximal coverage measure  β  2 ; Bottom-right:Total saliency  β  . Next, three saccadic paths with different initial fixationpoints was recorded on the same scene. Let each scanpathbe denoted by  x ki  , where  i  = 1 , ··· ,R  stands for the  i th scanpath,  k  = 1 , ··· ,K   for  k th fixation. A sample saccadicpath projected on the image is shown in Fig.6(left). The pan-tilt trajectories in spherical coordinates are shown in Fig.6(right) on a sphere. As shown in different graphs (Fig.7, Fig.10(bottom-right) and Fig.11), similarity behavior is increasingbetween  20 th and  25 th fixations. This increase in similarityindicates that reaching a saliency region may last about 20-25fixations and after about 5-10 fixations, these regions are leftusually with random jumps to different routes. Fig. 6. Left: The trajectory of the scanpath on a simple scene. Right: Pan-tilttrajectories in spherical coordinates. The similarity between scanpaths  i  and  j  was measured as 1 The scanpaths generated by human observers looking at this scene havebeen reported in [5]. follows: S   = 1  R  ∗  K  ∗ R  i =1 R  j =1 j  = iK   k =1 min l ∈{ 1 ,...,K  } | ( x ik  −  x kl  ) | 2 (3)where   R   =  R ( R  −  1)  indicates the permutation of R over 2.The variation of scanpath similarity with respect to the numberof saccades is shown in Fig.7. It is observed that the scene iscovered by about 20-25 saccades before a random jump to anew scene occurs. 0 10 20 30 40405060708090Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e Fig. 7. Total Distance vs. Number of Fixations. 0 10 20 30 40 50 60020406080100Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e Fig. 8. Left: Part of the inclined image of the simple scene perceived byAPES. Right: Scanpath similiarity measure vs. number of fixations. As an extension to the first set of experiments, an inclinedversion of the same simple scene was shown to the APES.A part of the inclined image is shown in Fig.8 (left). Therobot performed the operation starting from nine differentinitial fixation points which were in the same square regionof vicinity with a length of 10 pixels. The similarity measureamong scanpaths is shown in Fig.8 (right). As in the caseof Fig.7, after first few fixations, the total distance amongfixation points got smaller. After around 25-30 fixations, thedistance started to increase. However, since nine differentinitial fixations were compared in Fig.8 instead of threefixations as in Fig.7, the total distance among fixations of ninedifferent scanpaths did not vary as much as in three differentinitial fixations case.In the second set of experiments, APES explores part of the lab as shown in Fig.9(top) – starting from three differentregions in this scene. The robot is allowed to make sixtyfixations per run. In each region, eight different initial fixation 5822  points are chosen by randomly moving the robot’s headmanually. There is no target area for exploration in the scene. Itis observed what the APES’ reaction will be towards a randomscene without any target. However, it must be noted there canbe regions which will attract the attention of fixations, whichwill be called as attractive regions. Pan-tilt trajectories of asample scanpath in spherical coordinates are shown in Fig.9(bottom) on a sphere. The similarity of the scanpaths vs thenumber of fixations for each region is shown in Fig. 10(top-left, top-right and bottom-left) respectively. It is interestingto observe that the similarity measure for each region hasdifferent properties – which indicates that the induced saccadicmotion is of different characteristic. Indeed, the first regionis explored after twenty fixations or so since there is cyclicbehavior. The second region keeps still unexplored after sixtyfixations since there does not seem to be any cyclic saccadicmotion. In the third region, all the fixations seem to be inclose proximity, but there does not seem repetitive motion.The distance vs. number of fixations for all trajectories can beseen in Fig.10 (bottom-right). Fig. 9. Top: A scene taken from the experimental environment. Bottom:Pan-tilt trajectories in spherical coordinates of a sample scanpath. In the last series of experiments, the robot is movedmanually to ten different initial fixation points. For eachinitial fixation, it is made to repeat its saccadic motion threetimes – with again sixty fixations per run. The similaritymeasure for each initial point is computed and the averageof these similarity measures is shown in Fig.11. Even after 60fixations, the saccades do not repeat themselves –there are nocycles in the saccadic motion, the scene is not explored fully.Furthermore, let us note that very little perturbation of initialfixations causes large deviations in the saccadic trajectory dueto two factors: First, in a real environment, there are reallymore than one highly attractive region. Secondly, with fovealinhibition and short-term memory, saccades can have random jumps. These two factors together cause the robot to followdifferent trajectories and reach different regions of interest inthe environment. 0 10 20 30 40 50 600102030405060Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e 0 10 20 30 40 50 600102030405060Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e 0 10 20 30 40 50 600102030405060Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e 0 10 20 30 40 50 600102030405060Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e Fig. 10. Top-left; Top-right; Bottom-left: Scanpath similarity measure vsfixation no for each of the three regions. ; Bottom-right: Combined scanpathsimilarity measure vs fixation no. 0 10 20 30 40 50 60050100150200250Number of Fixations    T  o   t  a   l   D   i  s   t  a  n  c  e Fig. 11. Average similarity measure of 3 scanpaths at 10 different startingpoints. In summary, our experiments reveal the following resultsin respect to realizing saccadic motion with artificial potentialfunctions: •  As shown in Fig.5, even simple constructs as those usedare capable of generating plausible fixations. •  Unfortunately, even with these simple constructs, thegenerated surfaces do not have a unique minimum –which indicates that different initial fixations can leadto different saccadic motion behaviors. One way of elim-inating this is to apply preprocessing which can aid ingetting some of the local minima. •  As is seen in the sample pan-tilt trajectories, in general,cycling behavior does not occur even with 60 fixationsor so. When the number of the fixations increased from45 to 60, wider regions were scanned. •  The saliency functions can be reprogrammed for differentfeatures of interest.IV. C ONCLUSION This paper relates the problem of saccadic motion andfixation to that of motion planning and proposes using artificial 5823
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks