Saccades and Fixating Using Artiﬁcial PotentialFunctions
B. Deniz ˙Ilhan, ¨Ozg¨ur Erkent and H. Is¸ıl BozmaIntelligent Systems Laboratory, Electrical and Electronic Engineering,Bogazici University, Bebek 34342 Istanbul Turkey
Abstract
—This paper presents a mathematical model forsaccadic motion and ﬁxations. We relate this issue to the problemof motion planning and show that a family of artiﬁcial potentialfunctions can be used for creating saccadic motion. The advantage of this approach is that ﬁnding the next ﬁxation point doesnot require an explicit visual search  which is computationallycostly and may be problematic in realtime applications. Rather,the system naturally ‘slides’ from the current ﬁxation into thenext. Thus realtime performance on cheap hardware can easilybe achieved. Experimental results serve to provide insight intothe performance of a robot APES implementing this approach.
I. I
NTRODUCTION
Inspired by human vision, an attentive robot works byallocating its limited computational resources to only theinteresting parts of a visual scene. This is done by saccades,rapid eye movements that direct the optical axis from thecurrent ﬁxation to the next such that the fovea which is thehigh resolution area around the ﬁxation point  overlaps withthis interesting area. It has been proposed that this is achievedthrough visual search on a saliency map  a twodimensionalmap that encodes the “interestingness” of the objects in thevisual scene. Hence, the problem of how to select new ﬁxationpoints is treated separately from the problem accomplishingthe saccadic motion necessary for moving to this new point.In this paper, we present an alternative approach  in whichthe two stages are integrated in a uniﬁed framework. In thisapproach, a family of artiﬁcial potential functions – each of which encodes the saliency map features around the currentﬁxation point – is deﬁned and given the current ﬁxation point,the next ﬁxation is generated simply by ‘sliding’ into theequilibrium point of the associated surface. The advantage of the approach is twofold: there is no need for an explicit visualsearch and the camera actuator commands are automaticallygenerated. In turn, the camera ﬁnds a sequence of ﬁxationpoints without exhaustive search and in realtime. Of course,the ‘psychophysical correctness’ of the ﬁxation points thusgenerated is prone to experimentation and is beyond the scopeof the paper. Furthermore, since different saliency features including top down features  can be used to deﬁne anartiﬁcial potential function, shifts of attention can be madeto occur in a ‘programmable’ manner. In this paper, weshow the construction of potential function using one of themost common saliency features – image gradients. However,in general they could easily be constructed based on otherfeatures such as color, texture or previously memorized topdown features. The organization of this paper is as follows: Inthe rest of this Section, we brieﬂy overview related literature.In Section 2, the theoretical framework of ﬁxation generationis presented. In Section 3, we describe the construction of ourparticular family of artiﬁcial potential functions. Experimentalresults in real scenes are then discussed. The paper concludeswith a summary.
A. Saccadic Research
Studies in vision science have revealed that biologicalsystems view their surroundings by performing saccades ina continual manner between different points of interest, oneach of which ﬁxation is maintained brieﬂy [6]. Indeed, it hasbeen estimated that humans make more than three saccades persecond [12]. Hence, an intrinsic part of vision is to understandsaccadic movements  that is how the eyes “move to the rightplace at the right time so efﬁciently and seemingly effortlessly”[12]. Saccades are categorized as “stimulusdriven” when thereis no deliberation and voluntary in case of deliberate selection[13]. Some results suggest that stimulusdriven and exploratorysaccades are generated in neural circuits that operate in different spatial coordinates [13]. In attentional models based onthe spotlight or the zoomlens metaphors, a beam of attentionof either ﬁxed or varying size is described as moving either inan analog or discrete manner with mixed supporting evidence[14]. Some ﬁndings indicate that as the eye drifts further fromits mean position, the more likely a saccade is to occur [1].Furthermore, saccades tend to be concentrated on regions of complexity [5], [6]. The plausibility of a mathematical modelwhere a saliency measure is formulated as the sum of sinewaves with fundamental frequency equivalent to the spatialfrequency of the requested attention distribution is shown in[14].
B. Selective Vision
Mimicking visual ﬁndings in some way, researchers haveproposed attentive vision where camera gaze is directed tonew visual targets [10], [11]. Various theoretical models thataccount for the targeting of ﬁxation points have been proposed.Most are based on visual search mechanisms that generateemergent sequences  i.e. sequences computed a step at a timethat are analogous to motor behavior mediated by perceptualfeedback and a small amount of highlevel control [4], [7],[9], [15]. Alternatively, explicit sequences where ‘memorized’
142440259X/06/$20.00 ©2006 IEEE
5819
Proceedings of the 2006 IEEE/RSJInternational Conference on Intelligent Robots and SystemsOctober 9  15, 2006, Beijing, China
sequence representations are retrieved in a manner analogousto skilled motor behavior are also utilized [8]. However, in allthese approaches, explicit visual search is required. The nextﬁxation point is computed ﬁrst and then the camera saccadesto this new point. In this paper, we propose an approach thatintegrates the two stages. In this model, a family of artiﬁcialpotential functions parameterized by the ﬁxation point andencoding the saliency surface is deﬁned. Given the currentﬁxation point, the next ﬁxation is found simply by ‘sliding’into the equilibrium point of the associated artiﬁcial potentialsurface. Since different saliency features  including top downfeatures  can be used to deﬁne an artiﬁcial potential function,explicit or emergent sequences can both be generated.
C. Problem Statement
Suppose that we have a robot that is exploring its scenethrough saccadic motion. Let the robot’s image plane bedenoted by
P
. Note that
P
is a compact and connectedcomponent. Suppose the robot is ﬁxated on the point
x
k
∈
P
.Each point
x
∈
P
in the state space denotes a candidatepoint for new ﬁxation point
x
k
+1
given the current ﬁxationpoint. The ‘naive’ ﬁxation generation problem may be statedas follows: Given an image I and an initial ﬁxation point
x
0
,deﬁne a family of control strategies and a switching law suchthat the robot saccades to a sequence of ﬁxation points
{
x
k
}
,
k
= 1
,
···
,K
.II. S
ACCADES
U
SING
A
RTIFICIAL
P
OTENTIAL
F
IELDS
The idea of using “potential functions” for the speciﬁcationof robot tasks with a view of control problems in mind waspioneered by Khatib [2] and Koditschek [3] in the contextof obstacle avoidance and navigation. Our studies show thatit may be also preferred in ﬁxation generation. Moreover,there are theoretical reasons to prefer the ’natural control’methodology to some of the other traditional approaches: Thelack of formal analysis of the limits of robustness constitutesa major drawback of many methods proposed in vision. Inorder to compensate for this, an extensive testing is alwayscarried out. In contrast, the potential ﬁeld approach sets aformal framework for studying the special properties of thesystem.Let a set of “saliency” functions
ϕ
x
k
:
P
→
[0
,
1]
– acollection of smooth scalar valued maps on the state spaceparameterized by
x
k
. Each function is constructed in a manneras to encode the salient points in the neighborhood of the associated ﬁxation point. The measure of saliency can be deﬁnedfrom predetermined features such as intensity gradients, color,texture, highlevel representations depending on the application. However, regardless of the chosen saliency measure, thefunctions should attain minimum at the potential interestingpoints and should be maximal over uninteresting points. Letus presume that the camera’s dynamics can be satisfactorilydescribed by a simple ﬁrst order model. Suppose it is currentlyﬁxated on
x
k
. The dynamical system governing motion towardthe next ﬁxation point can be deﬁned via constructing agradient ﬁeld as,
˙
x
=
−
D
x
ϕ
x
k
(
x
)
where
x
(0) =
x
k
, k
=0
,
1
,
···
. This closed loop system inherits the critical qualitative behavior of gradient trajectories. Any local minimumis then designated as the next ﬁxation point
x
k
+1
. First, letus note that each function in this family admits local minimasince any smooth function attains a minimum on a compactset. Furthermore, in general the potential generating functionsmay not have a unique minimum. Suppose the integral curve of
˙
x
through the initial condition
x
k
is denoted by
−
D
tx
ϕ
x
k
(
x
k
)
.If
−
D
tx
ϕ
x
k
(
x
k
+1
) = 0
implies full rankness, then the limit set
lim
t
→∞
−
D
tx
ϕ
x
k
(
x
k
)
is some isolated singularity. Otherwise,it can have a nontrivial manifold of minima. Thus, smallperturbations in the location of the current ﬁxation point couldresult in large deviations in the location of the next ﬁxationpoint.
A. Algorithm
Saccadic motion is achieved by switching to a new controller everytime a new ﬁxation point is reached. Once thecontrol law is selected, the cameras slide into their new ﬁxationpoint by simply moving accordingly. Visual ﬁndings indicatethat there are two types of memory present here:1) Inhibition of return  The process by which the currentlyattended location is prevented from being attended again,2) Shortterm memory  The process by which the last fewﬁxations are recalled and are being prevented from beingattended.Both processes are still not well known. In our case, thefollowing mechanisms are devised.
•
Foveal inhibition: If there is a tendency to ﬁxate in thecurrent fovea, the camera stops and does a random jumpin its visual ﬁeld.
•
Visual ﬁeld inhibition: If there is a tendency to ﬁxatenear the boundary of the visual ﬁeld, the camera stopsand does a random jump in the new visual ﬁeld.
•
Memory inhibition: If there is a tendency to ﬁxate neara previously ﬁxated point, the system randomly jumps toan arbitrary point within its visual ﬁeld. The previouslyﬁxated fovea locations are held in a ﬁrstin ﬁrstoutmemory.Finally, an algorithm for saccadic behavior based on a switching mechanism is constructed as follows:1) Deﬁne k as the ﬁxation iteration2) Consider the current ﬁxation point
x
k
. Move to a point
x
on the minimum potential boundary of the fovealinhibition region.3) Switch to the control law
˙
x
=
−
D
x
ϕ
x
k
(
x
)
and movethe camera until the camera reaches a critical point orthe boundary:4) If the camera has moved into fovea or to one of theprevious ﬁxation points, do a random jump in the visualﬁeld.5) If the camera has moved into the boundary visual ﬁeld,do a random jump in the new visual ﬁeld.6) Increment k and go to step 1.
5820
B. Construction of Artiﬁcial Potential Function
We now describe the construction of potential functions. Inthis case, we use image intensity gradients as salient features.However, let us note that the measure of saliency can easily bemodiﬁed to be based on other features such as color, texturedepending on the application. We proceed in a manner similarto [12].The function
ˆ
ϕ
:
P
→
(0
,
∞
)
encodes the current ﬁxationpoint as well the saliency measure as:
ˆ
ϕ
x
k
(
x
)
= 1
β
(
x,x
k
)
k
∈
Z
+
(1)The denominator encodes the distance saliency measure. Itmay consist of several terms as:
•
The current feature of interest: For simple, bottomupprocessing this may be simply deﬁned as
β
1
(
x
) =
∇
I
(
x
)
T
∇
I
(
x
)
.
•
Maximal coverage: The next ﬁxation point should bemaximally away from the current ﬁxation point. This maybe simply deﬁned as
β
2
(
x,x
k
) = (
x
−
x
k
)
T
(
x
−
x
k
)
.The saliency measure
β
is then constructed as the weightedaverage of these terms as:
β
(
x,x
k
) =
w
1
β
1
(
x
)+
w
2
β
2
(
x,x
k
)
.The zero level set
β
−
1
(0)
– denoted by
∂P
– entails pointsin the image plane which do not draw any interest.Since
ˆ
ϕ
x
k
blows up on
∂P
, it is not admissible. In orderto make
ˆ
ϕ
admissible, it is squashed by the function
σ
:(0
,
∞
)
→
[0
,
1]
, deﬁned by
σ
(
x
) =
x
1+
x
. The function isconstructed as the composition:
ϕ
x
k
(
x
) =
σ
◦
ˆ
ϕ
x
k
(
x
)
(2)
Fig. 1. Image plane orientations.Fig. 2. Image transformation.
C. Coordinate Transformation
As soon as the camera starts moving away from the currentﬁxation point, the image planes change as shown in Fig.1.Hence, the ﬁxation point and the current point at which theimage intensity gradients are calculated are at different planes.Hence, in order to measure any physical relation betweenthem,
x
k
is projected onto the current image plane as shownin Fig.2. Using trigonometric identities, it is easy to show that

x
−
x
k

=
W
I
tan(
θ
)2tan(
φ
)
where
W
I
denote the image plane width.In order to move the camera, the gradient vector which isof Cartesian nature needs to be transformed to the angularvelocities. Let
θ
and
φ
are the pan and tilt angles respectively,as shown in Fig.3(left). The arc we want to move along
φ
direction is shown in Fig.3(right). The dependence of angle
θ
is analogous. It is easy to see that the angular velocity components are directly proportionally to the Cartesian velocitycomponents:
δφ
= 1
rδy δθ
= 1
rδz
where
r
is the focal length.
Fig. 3. Left: Computation of gradient vector; Right: Dependence of
φ
onthe gradient vector.Fig. 4. APES robot
III. E
XPERIMENTAL
R
ESULTS
A series of experiments have been conducted using APES– an attentive robot designed and built in our laboratory [9] asshown in Fig.4. The goal of the ﬁrst set of experiments wasto observe the reaction of APES towards a simple scene inthe guidance of artiﬁcial potential functions. For comparison
5821
purposes, a sample scene
1
as shown Fig.5(topleft) was usedin the experiments. The fovea size was set to
30
×
30
pixels.Suppose it is ﬁxated on the area as shown in the same ﬁgure.The artiﬁcial potential surface components are as shown inthe rest of Fig.5 for weighting coefﬁcients
w
1
= 0
.
0001
and
w
2
= 0
.
001
.
Fig. 5. Topleft: Simple scene and current fovea; Topright: The featuremeasure
β
1
; Bottomleft: The maximal coverage measure
β
2
; Bottomright:Total saliency
β
.
Next, three saccadic paths with different initial ﬁxationpoints was recorded on the same scene. Let each scanpathbe denoted by
x
ki
, where
i
= 1
,
···
,R
stands for the
i
th
scanpath,
k
= 1
,
···
,K
for
k
th
ﬁxation. A sample saccadicpath projected on the image is shown in Fig.6(left). The pantilt trajectories in spherical coordinates are shown in Fig.6(right) on a sphere. As shown in different graphs (Fig.7, Fig.10(bottomright) and Fig.11), similarity behavior is increasingbetween
20
th
and
25
th
ﬁxations. This increase in similarityindicates that reaching a saliency region may last about 2025ﬁxations and after about 510 ﬁxations, these regions are leftusually with random jumps to different routes.
Fig. 6. Left: The trajectory of the scanpath on a simple scene. Right: Pantilttrajectories in spherical coordinates.
The similarity between scanpaths
i
and
j
was measured as
1
The scanpaths generated by human observers looking at this scene havebeen reported in [5].
follows:
S
= 1
R
∗
K
∗
R
i
=1
R
j
=1
j
=
iK
k
=1
min
l
∈{
1
,...,K
}

(
x
ik
−
x
kl
)

2
(3)where
R
=
R
(
R
−
1)
indicates the permutation of R over 2.The variation of scanpath similarity with respect to the numberof saccades is shown in Fig.7. It is observed that the scene iscovered by about 2025 saccades before a random jump to anew scene occurs.
0 10 20 30 40405060708090Number of Fixations
T o t a l D i s t a n c e
Fig. 7. Total Distance vs. Number of Fixations.
0 10 20 30 40 50 60020406080100Number of Fixations
T o t a l D i s t a n c e
Fig. 8. Left: Part of the inclined image of the simple scene perceived byAPES. Right: Scanpath similiarity measure vs. number of ﬁxations.
As an extension to the ﬁrst set of experiments, an inclinedversion of the same simple scene was shown to the APES.A part of the inclined image is shown in Fig.8 (left). Therobot performed the operation starting from nine differentinitial ﬁxation points which were in the same square regionof vicinity with a length of 10 pixels. The similarity measureamong scanpaths is shown in Fig.8 (right). As in the caseof Fig.7, after ﬁrst few ﬁxations, the total distance amongﬁxation points got smaller. After around 2530 ﬁxations, thedistance started to increase. However, since nine differentinitial ﬁxations were compared in Fig.8 instead of threeﬁxations as in Fig.7, the total distance among ﬁxations of ninedifferent scanpaths did not vary as much as in three differentinitial ﬁxations case.In the second set of experiments, APES explores part of the lab as shown in Fig.9(top) – starting from three differentregions in this scene. The robot is allowed to make sixtyﬁxations per run. In each region, eight different initial ﬁxation
5822
points are chosen by randomly moving the robot’s headmanually. There is no target area for exploration in the scene. Itis observed what the APES’ reaction will be towards a randomscene without any target. However, it must be noted there canbe regions which will attract the attention of ﬁxations, whichwill be called as attractive regions. Pantilt trajectories of asample scanpath in spherical coordinates are shown in Fig.9(bottom) on a sphere. The similarity of the scanpaths vs thenumber of ﬁxations for each region is shown in Fig. 10(topleft, topright and bottomleft) respectively. It is interestingto observe that the similarity measure for each region hasdifferent properties – which indicates that the induced saccadicmotion is of different characteristic. Indeed, the ﬁrst regionis explored after twenty ﬁxations or so since there is cyclicbehavior. The second region keeps still unexplored after sixtyﬁxations since there does not seem to be any cyclic saccadicmotion. In the third region, all the ﬁxations seem to be inclose proximity, but there does not seem repetitive motion.The distance vs. number of ﬁxations for all trajectories can beseen in Fig.10 (bottomright).
Fig. 9. Top: A scene taken from the experimental environment. Bottom:Pantilt trajectories in spherical coordinates of a sample scanpath.
In the last series of experiments, the robot is movedmanually to ten different initial ﬁxation points. For eachinitial ﬁxation, it is made to repeat its saccadic motion threetimes – with again sixty ﬁxations per run. The similaritymeasure for each initial point is computed and the averageof these similarity measures is shown in Fig.11. Even after 60ﬁxations, the saccades do not repeat themselves –there are nocycles in the saccadic motion, the scene is not explored fully.Furthermore, let us note that very little perturbation of initialﬁxations causes large deviations in the saccadic trajectory dueto two factors: First, in a real environment, there are reallymore than one highly attractive region. Secondly, with fovealinhibition and shortterm memory, saccades can have random jumps. These two factors together cause the robot to followdifferent trajectories and reach different regions of interest inthe environment.
0 10 20 30 40 50 600102030405060Number of Fixations
T o t a l D i s t a n c e
0 10 20 30 40 50 600102030405060Number of Fixations
T o t a l D i s t a n c e
0 10 20 30 40 50 600102030405060Number of Fixations
T o t a l D i s t a n c e
0 10 20 30 40 50 600102030405060Number of Fixations
T o t a l D i s t a n c e
Fig. 10. Topleft; Topright; Bottomleft: Scanpath similarity measure vsﬁxation no for each of the three regions. ; Bottomright: Combined scanpathsimilarity measure vs ﬁxation no.
0 10 20 30 40 50 60050100150200250Number of Fixations
T o t a l D i s t a n c e
Fig. 11. Average similarity measure of 3 scanpaths at 10 different startingpoints.
In summary, our experiments reveal the following resultsin respect to realizing saccadic motion with artiﬁcial potentialfunctions:
•
As shown in Fig.5, even simple constructs as those usedare capable of generating plausible ﬁxations.
•
Unfortunately, even with these simple constructs, thegenerated surfaces do not have a unique minimum –which indicates that different initial ﬁxations can leadto different saccadic motion behaviors. One way of eliminating this is to apply preprocessing which can aid ingetting some of the local minima.
•
As is seen in the sample pantilt trajectories, in general,cycling behavior does not occur even with 60 ﬁxationsor so. When the number of the ﬁxations increased from45 to 60, wider regions were scanned.
•
The saliency functions can be reprogrammed for differentfeatures of interest.IV. C
ONCLUSION
This paper relates the problem of saccadic motion andﬁxation to that of motion planning and proposes using artiﬁcial
5823