A Vision Agent for Robotics: Implementation on a SIMD Machine

An artificial high-level vision agent for the localisation and description of the movements of the SPIDER robot arm of the EUROPA/PAT systems during its operations is presented. The agent uses real-time image processing operations implemented on
of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  A Vision Agent for Robotics: Implementation on a SIMD Machine Ignazio Infantino*, Antonio Cimò †  , Antonio Gentile †  , Antonio Chella †    * ICAR-CNR,sez. di Palermo, c/o CUC, Viale delle Scienze, 90128, Palermo, Italy †   DINFO, University of Palermo, Viale delle Scienze, 90128, Palermo, Italy Abstract.  An artificial high-level vision agent for the localisation and description of the movements of the SPIDER robot arm of the EUROPA/PAT systems during its operations is presented. The agent uses real-time image processing operations implemented on SIMPiL (SIMD Pixel Processor) to manage the large computational workloads and I/O throughput requirements. The vision agent, which includes a description agent, generates the perception grounded predicates obtained by image sequences and provides a 3D estimation of the arm movements. This provides the scientist user of SPIDER with meaningful feedback of his operations on the arm during a scientific experiment. This paper evaluates the effectiveness of the SIMPil design on this type of important image applications. The simulation results support the design choices and suggest that more complex, multistage applications can be implemented to execute at real-time frame rates. 1 Introduction An artificial high-level vision agent for the localisation and description of the movements of the SPIDER robot arm of the EUROPA/PAT systems [7, 9] during its operations is presented. The agent uses some real-time image processing operations implemented on SIMPiL. The SIMD Pixel Processor (SIMPil) [17], is a focal plane imaging system, being designed at Georgia Tech, which incorporates a specialized SIMD architecture with an integrated array of image sensors. Many SIMD systems [11, 12, 13] have been used for image processing before, but they achieve  performance and generality at the expense of I/O coupling and physical size. Advances in current semiconductor technology have enabled the integration of CMOS image sensors with digital  processing circuitry [15]. In a focal plane imaging system, images are optically focussed into the sensor array, or focal plane, and are made available to the underlying processing engine in a single operation. Because large amount of data parallelism is inherent in low level image processing applications, SIMD (Single Instruction Multiple Data) architectures provide an ideal programming model [16]. This work is an improved and powerful extension of the vision agent presented in [2, 3]. The described software module is related to the interpretation of sensory data in the framework of an ASI project. It aims at the application of AI techniques to the design and realisation of an effective and flexible system for the supervision of the SPIDER. The arm (a 7 d.o.f. robot arm) will work on board of the International Space Station (ISS) [4]. The framework project is a Italian three years research project sponsored by the Italian Space Agency (ASI) and involving AI researchers from the Universities of Rome, Turin, Genoa, Palermo, Parma, from the IP-CNR of Rome and from the IRST of Trento. The main aim of the vision agent is the advancement of the state of the art in the field of artificial vision for space robotics by introducing and integrating artificial vision techniques that offer opportunity for providing the SPIDER arm operations with effective great degrees of autonomy. In the following, Sect. 2 describes the SIMPiL architecture used to support the vision agent and its components. Sect. 3 describes the low-level processing operations, the utilisation of an active contour technique to individuate and segment the shape of the SPIDER arm, and performing the 3D reconstruction of the location of its links. Sect. 4 deals with the SIMPiL implementation details. Sect. 5 describes the experimental results and the performance of the system. Finally, Sect. 6 outlines some conclusions and future development.    2 The SIMPiL architecture The SIMD Pixel Processor (SIMPil) architecture consists of a mesh of SIMD processors on top of which an array of image sensors is integrated. A diagram for a 16-bit implementation is illustrated in Fig. 1. Each processing element includes a RISC load/store datapath plus an interface to a 4×4 sensor subarray. A 16-bit datapath has been implemented which includes a 32-bit multiply-accumulator unit, a 16 word register file, and 64 words of local memory (the ISA allows for up to 256 words). The SIMD execution model allows the entire image projected on many PEs to be acquired in a single cycle. Large arrays of SIMPil PEs can be simulated using the SIMPil Simulator, an instruction level simulator. Early prototyping efforts have proved the feasibility of direct coupling of a simple processing core with a sensor device [17]. A 16 bit prototype of a SIMPil PE was designed in 0.8 ìm CMOS process and fabricated through MOSIS. A 4,096 PE target system has been used in the simulations. This system is capable to deliver a peak throughput of about 1.5 Tops/sec in a monolithic device, enabling image and video processing applications that are currently unapproachable using today's portable DSP technology. The SIMPil architecture is designed for image and video processing applications. In general, this class of applications is very computational intensive and requires high throughput to handle the massive data flow in real-time. However, these applications are also characterized by a large degree of data parallelism, which is maximally exploited by focal plane processing. Image frames are available simultaneously at each PE in the system, while retaining their spatial correlation. Image streams can be therefore processed at frame rate, with only nominal amount of memory required at each PE [19]. Figure 1: The SIMPiL architecture 3 Localisation and description of the movements of a robotic arm The artificial vision agent generates the exact 2D description of the arm movements obtained by image sequences, providing the necessary information to perform 3D estimation of the arm movements, thus allowing the scientist user of SPIDER to receive meaningful feedback of his operations on the arm during a scientific experiment. This work is an improved and powerful extension of the vision agent presented in [2, 3].The capabilities of the vision agent include the followings: - individuate and segment the SPIDER arm also in contrasted and irregular backgrounds; - perform a 3D estimation of the position of the arm by camera images;  - interpret complex movements of the arm acquired by a camera in terms of symbolic in terms descriptions. The implemented computer vision agent is based on three main components: - the perception component; - the scene description component; - the visualisation component. The perception component of the agent processes the image data coming from a video camera that acquires the operation of the robotic arm and take advantage from SIMPiL system to process the images. The main task of this component is to estimate the position of the arm in the acquired image. It should be noted that the estimation, which is generated solely by the visual data, may be useful also for fault identifications of the position sensor placed on the joints of the arm. 3.1. Active contour for the extraction of the arm shape. The images acquired by the camera are processed by the contour module that extracts the arm contours by a suitable algorithm based on active contours. This task is resolved by two step: - contour initialisation and individuation of the set of points describing the robotic arm; - tracking of the arm in the sequence. In order to extract just the information related to the arm, some filtering operations are performed: - noise reduction by 5x5 median filter; - automatic tresholding; - subtraction of image intensities between two sequentially frames. The arm shape is extracted in a simple way and in short time using the evolution of the active contour using the GVF (Gradient Vector Flow) methodology [10]. 3.2. GVF Snake model. The snake is a deformable curve that moves in the image under the influence of forces related to the local distribution of the grey levels [8, 10]. When the snake reaches an object contour, it is adapted to its shape. In this way it is possible to extract the object shape of the image view. The snake as an open or closed contour is described in a parametric form by v(s) = ( x(s), y(s) ), where x(s), y(s) are x,y coordinates along the contour and s is the normalized arc length (s ∈  [ 0, 1]). The snake model defines the energy of a contour, named the snake energy, Esnake to be: ∫   += 10int )))(())((())(( ds sv E  sv E  sv E  image snake  The snake used in this framework has only edge functional which attracts the snake point at high gradient: 22 )),(*(  y x I G E  E  edgeimage  ∇−==  σ  This is the image functional proposed by Kass [8]. It is a scale based edge operator that increase the locus of attraction of energy minimum. G ó  is a Gaussian of standard deviation sigma which controls the smoothing process prior to edge operator. Minima of Eedge lies on zero-crossing of G ó * ∇ 2 I(x,y) which defines edges in Marr-Hildreth [5, 6] theory. Scale space filtering is employed, which allows the snake to come into equilibrium on heavily filtered image, and then the level of filtering is reduced, increasing the locus of attraction of a minimum.  The gradient vector flow (GVF) field is defined to be the vector field v(x,y) = ( u(x,y), v(x,y)) that minimizes the energy functional: dxdy f v f vvuu  y x y x 222222 )(  ∇∇++++= ∫∫  µε  Particular advantages of the GVF snake over a traditional snake are its insensitivity to initialisation and ability to move into concave boundary regions (for example see figure 1). Another useful computational feature is that the external force is defined statically, unchanged in time and independent from position of the contour itself. In this way the calculation of the external force is done at the beginning and is unchanged during contour evolution. The external force moving the contour is based on an edge extraction function with uniform absolute value. This choice in fact addresses the problem of weakly vector field near concavities of the followed contour. Starting from arbitrary location of a set of points, the contour is captured after some iterations, when the equilibrium of the forces is reached [8]. Figure 2: Gradient Vector Flow technique applied to capture SPIDER arm shape. 3.3. Localisation and segmentation of the arm shape. The implemented snake allows to extract the arm shape in a simple way and in short time. Fig. 3 shows the evolution and result of the contour module. From extracted arm snake it is possible to estimate the position of the links of the arm in the image plane, i.e., without the depth information (see figure 3.a). The arm links of the arm are located in two steps: - individuation of lines that characterise the different segments of the arm and grouping based on parallelism; - localisation of links by intersection of the lines. 3.4. Reconstruction of 3D position of the arm links Let us consider a generic link i of the arm at time t; the link is characterized by its 3D coordinates [x i (t), y i  (t), z i  (t)]. A generic posture of the SPIDER arm at time t is characterized the vector x(t) which individuates the seven links of the arm:  = ))(),(),(( ))(),(),(( ))(),(),(( )( 777222111 t  z t  yt  x t  z t  yt  x t  z t  yt  x t  x M  The snake information allows us to estimate the first coordinates of each link, i.e., their projection in the image plane: = )),(),(( )),(),(( )),(),(( )(' 772211 LMLL t  yt  x t  yt  x t  yt  x t  x   The third coordinate is obtained using standard computer vision algorithms [5, 6]. A generic 3D  point X generates the point w on image: [ ]  X t  R K  X  P w ˆ|ˆˆ  == λ   where P is the 3x4 projection matrix, decomposable on calibration matrix K, rotation matrix R and translation vector t. X and w are indicated using homogeneous coordinates. The seven link points are used to estimate the projection matrix and to perform triangulation on the images of the sequence. The process of reconstruction supposes that the reference coordinate system is placed on link which is fixed (label 8 in figure 3), and the distances between links are known. In this way we obtain the position in the reference system of the links (see fig. 4a). Figure 3: Evolution and final result of the GVF contour module. The implemented system also works if the image sequence does not show all the seven links (depending from arm position respect to camera): the recovery the position of the hidden links it is simply performed using the data reconstructed of the visible ones. In this way the following scene description component can aim only on the classification of the movements. Moreover, it is generated a graphic 3D representation of the arm movements receiving as input the data coming
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks