Reconfigurable Image Processing Architectures - Research and Current State of Art at the AGH Technical University

In the present paper three heterogeneous architectures are discussed for complex systems, dedicated to real-time image processing and analysis. Usefulness and applicability of the solutions described here results from the fact that the systems are
of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
   Abstract  : In the present paper three heterogeneousarchitectures are discussed for complex systems, dedicated toreal-time image processing and analysis. Usefulness andapplicability of the solutions described here results from the factthat the systems are reconfigurable, and their final architecturecan always be tailored to the specific needs of the specificproblem (mainly related to the applications of the describedsystems in the fields of automation and robotics). In the paperauthors describe the ultimate solution, found after long-timestudies, as well as the evolution of the system architecturesresulting from the technological progress, the increase of expectations from the real-time visual systems and practicalexperience gathered by the authors during the 15 years longperiod. Attention has been focused on the increasing flexibility of the reconfigurable hardware-software systems, which is aguarantee that the technology will keep up with the increasingrequirements of the users. In particular it has been shown thatthe hardware systems based on latest large densityreprogrammable devices are able to achieve many utilitarianfeatures like object-oriented nature, scalability or adaptability,which has been up to now reserved for the software systems. Keywords : Image Processing, Configurable Computing, FieldProgrammable Gate Array, Hardware-Software Co-design,Heterogeneous Architecture. I.   I NTRODUCTION  In the early 80-ties it has become obvious, taking intoaccount the small computing power of the general purposeprocessors, that for the realization of image processing andanalysis hardware acceleration is necessary. Extensive reviewof references discussing the studies and realizations from thatperiod is contained in the papers [1]–[5].  1  Rector of the AGH Technical University, Kraków, Poland, 2  Department of Automatics, Biocybernetic Lab, the AGH TechnicalUniversity, Kraków, Poland, 3  Department of Electronics, the AGH Technical University, Krakow,Poland, 4  Department of Automatics, Biocybernetic Lab, the AGH TechnicalUniversity, Kraków, Poland, Hardware realization of reconfigurable real-time videosystems is the subject of research in the Departments of Automatics and Electronics in the AGH Technical University,Kraków, Poland since 1986.During that period several studies have come into being,concerning the architecture of video systems adapted to thereal-time operation (required by the planned applications inthe problems of automation and robotics). For the researchpurposes concerning these applications three working systemsfor image processing and analysis have been constructed,which in every case have been based on the latest technologyavailable at the time for solutions of dedicated electronicsystems.The experience gathered during that period as well asseveral studies, both theoretical and practical, lead the authorsto a conclusion that the field of solution concepts for thearchitectures of dedicated vision systems is much more stablethan the rapidly varying (at the pace dictated by the progressof electron technology) realization possibilities of hardwaresolutions. Therefore, after finding that during the constructionof several visual processing hardware systems an srcinalmethodology have emerged as a byproduct, the authors havedecided to describe some elements of that methodology in thepresent work.The specific feature of this methodology is the use of dedicated hardware resources, controlled by software, for therealization of image processing algorithms, paralell nature of the applied processing, adaptation to the pipeline nature of thevideo signal, the possibility of hardware reconfiguration forthe given algorithm executed at the moment, the ability toselect the hardware resources depending on the specificnature of the algorithm being executed and several otherunique features, described below during the presentations of particular solutions. The primary observation that has been made in thebeginninig of the described research was the possibility of diversification of the computing resources, effectivelycarrying out the image preprocessing operations, dependingon the implementation of the further stages of image analysis(see Fig.1) [1]. Therefore all the systems described here makeuse of heterogeneous architectures, considerably diversifyingthe method of hardware-software realization of particularstages of image processing and analysis. Now severalcharacteristic element of that heterogeneity will be discussed. Reconfigurable Image Processing ArchitecturesResearch and Current State of Artat the AGH Technical University Ryszard Tadeusiewicz Marek Gorgoñ Kazimierz Wiatr Zbigniew Mikrut  Fig. 1. Architecture of complex image computing system.  At the stage of data acquisition a solution is required for theproblem of the interlaced to non-interlaced signal conversion.When software methods are applied the task requires thecomposition of the source image from two half-imagesgenerated by a typical TV camera. Such a realizationintroduces a delay time, required for completion of the fullframe, before the actual image processing action. Yet thesame task can be solved in a much easier way, withoutintroducing any delays, by application of a specializedhardware (properly used multiport memory). On the other hand the image preprocessing requires theexecution of a great number of relatively simple operations of processing the individual pixels and their groups. Theoperations can be parallelized and executed using relativelysimple dedicated computational elements workingsynchronously, but it requires the elaboration of thearchitectures of many dedicated hardware tools (specializedprocessors) adapted to realization of various operationsencountered in the preliminary processing. Particularattention should be focused on the processors that can be usedfor the hardware realization of linear and nonlinear operationsof the context image filtering, including the reprogrammableconvolution processors and the processors capable of real-time execution of the median filtration concepts. The stage of image analysis is characterized byconsiderable reduction of image data volume (e.g. the regionsof interest )  and the extraction of information from the imageframe. Great computational complexity, characteristic for theanalysis algorithms, as well as the recursive data processing,which is required by e.g. image segmentation, brings thenecessity of application of essentially different hardwareresources (than the ones applied for preprocessing) foralgorithms implementation at that stage. The distinctness of hardware solutions applied at the image analysis stage resultsalso from the fact that during the image analysis usually therealization of floating-point operations is required, whileduring the preprocessing stage the fixed-point arithmetic isapplied as a rule . The review of architectures for image processing andanalysis, elaborated in the course of 15 years at the AGHTechnical University in Krakow, presented below is at thesame time an illustration of progress in the technology of digital electronics and also an illustration of application of available resources to the realization of hardwarearchitectures working in real-time. Because of their particulartailoring to the line-sequence nature of the video signal suchsystems can be described as Image Data Processing (IDP).The example of architecture construction for those systemsand the solution of particular problems related to thearchitectures of individual functional blocks, can provide anillustration for the opinion that the progress of technology, inparticular the FPGA devices technology, allows not only thegreater integration of the systems, increase of the pixel clock frequencies (and therefore the possibility of higher resolutionimages processing in real-time) but also an increase in theflexibility of the systems. The hardware structure is based onthe modern FPGA technology: due to the possibility of fastand thorough reconfiguration (also in the Run TimeConfiguration mode) it acquires the necessary flexibility andadaptability, characteristic for the software implementations.Due to the implementation of hardware and hardware-software design languages (i.e. VHDL, HLL, RTR) the effectof scalability of the created solution is also achieved as wellas the possibility of referring to the object methodologyduring the design process.Several innovations proposed in the present work include,although they are not limited to, the following ones:-   the design and realization of comprehensive systemsfor processing, analysis and visualization of digitalimages-   parallelization of the algorithms for image processingand analysis, which is essentially related to theresources of the constructed hardware system(Hardware-Software Codesign)-   srcinal architectures of the computational elementsused in the operations of image processing andanalysis and novel methods of their connection andsynchronization-   implementation of software environments, enablingthe realization of algorithms (applications ) in theconstructed hardware systems-   elaboration of verification and testing procedures forthe real-time 2D image systems.II.   T HE CESARO-2 S YSTEM  The multiprocessor video system CESARO-2 has beenrealized in Biocybernetics Lab, in the years 1986-1989.]. TheCESARO-2 system, even if realized in the TTL&CMOStechnology, can be classified as a system executing a real-time image processing. The CESARO-2 system presents an essential conceptualachievement and a certain example of an integral software-hardware solution for image processing and analysis. In thestructure of the discussed system three principal subsystemscan be distinguished:-   video subsystem (image acquisition, real-timecalculation of the histogram);-   real-time subsystem (preprocessing: median filtering,convolution, binarization);-   the control subsystem (the system management, imageanalysis, robot control).Structure of the system is presented in Fig.2. More detaileddescription of the whole CESARO-2 video system can befound in papers [6], [7]. Below the interest has been focused -because of the aim and scope of the present work - merely on Image Pre -processingImageAnalysisImageRecognition imageimage (filtered)descriptionidentification  that part of the system, which is responsible for the concurrentpreprocessing of the recognized image.The basic component of the real-time subsystem (Fig. 3)consists of specialized processors, together with theiraccompanying multiplexer blocks, used for bus switching.The bus type used in the system can be classified as cross-barmatrices. The processing takes place with elimination of anyredundant image transmission processes, still a time-consuming operation of writing the whole picture frame isnecessary after each operation executed in the specializedprocessor and then a multiplexer switching. Fig. 2. The CESARO-2 System Architecture. The design of the control subsystem is based on fourindustrial-standard boards, containing V30 processors(ver.80286) and it enables supervision over the system, itsactivation and testing and the execution of image analysisoperations. Reconfiguration of the CESARO-2 system elements hasbeen realized by a software-hardware method. The elementswhich could be reconfigured included the execution sequenceof the processing operations (i.e. median filtering,convolution, binarization) in the real-time subsystem. Thereconfiguration has been realized in software from themanagement system level by means of the bus multiplexershardware. Additionally in the convolution processorreprogramming of the kernel coefficients was also possible.The system worked on a monochromatic image 256x256pixels in size, with 8-bit pixel representation.In the convolution processor the greyscale color depth hasbeen reduced to 16 shades of grey. The system has been fittedwith the ability of executing operations on a binary image.Table 1 presents the operation times in the CESARO-2system. TABLE I C ESARO -2   P ROSESSING E LEMENTS PERFORMANCE (1989)Image processingelementPixel clock frequency(in circuit)Computation of Image Frame(256x256x8 bits)2 × 2 convolution5 MHz26.32 ms3 × 3 convolution5 MHz26.42 msSobel (2 iterations)5 MHz79.26 msSobel(dedicated procesor)5 MHz26.42 mssubtraction of two frames5 MHz39.32 mshistogram5 MHz26.21 mssurface calculation5 MHz26.21 msBoolean operations (binaryimages)5 MHz /8 binarypixels4.91 ms III.   D EDICATED PIPELINED ARCHITECTURE FOR IMAGEPROCESSING D E P I A R .The goal of the authors' works was to develop amultiprocessor architecture which – due to the computationelements used and to their interconnection – would result in avery short implementation time of the image pre-processing.In particular, the time has been reduced down to 40 ms by theprovided image acquisition standard of 25 Hz frequency(PAL system, interlace to non-interlace conversion). Theefficiency of effective use of the multiprocessor structure isrelated to the optimized assignment of the computation tasksto various processors and to the proper data transfer betweenthem, as well as to their synchronized operation. Suchrequirements necessitate that not only specialized hardwareprocessors are used in the image processing, but dedicatedarchitectures of multiprocessor systems as well. Fig. 3. Real-time Subsystem of CESARO-2. A survey of possible architectures of multiprocessor systemshas been performed by the authors [8], [9]. Considering thatthe vision data to be processed are great blocks of data (for an DataMemorySharedMemoryFramesBinaryImageFramesDMAMUX / DEMUXMUX / DEMUXMUX / DEMUXSpecial.ProcessorSpecial.Procesor2Special.ProcessornRTClock ControlerControlSub.PI/OMFrameMemoryVideo Mux A/CHistogramModuleVIDEO SUBSYSTEMREAL-TIME SUBSYSTEMCONTROL SUBSYSTEMDataMemorySharedMemoryFramesDedicatedProcessors'UnitControlUnit CONTROL AT Board1AT Board2AT Board 0AT Board3PC HOSTRS 232(RobotControl)  image of 512 pixels x 512 pixels the block capacities are 256kB), the duration of their transmission between the processorsis equally important as the duration of each operationperformed by the processors. The most effective here is themultiprocessor pipelined system based on MISD architecture(Multiple Instruction-stream Single Data-stream)implemented in FPGA structures (Fig. 4). A/D  PLP PLBUS InstructionData Data DataFPGA FPGA   MEMORY L InstructionDataFPGA   Data ... Instruction Fig. 4. MISD architecture of specialized hardware processors. For the purpose of video signal pipelined processing, a busstandard has been developed for the cooperation of specialized processors performing the image pre-processing.In the pipelined mode the video data (8 bits) are transferred aswell as the control signals securing the synchronizedoperation of the processors. Such solution of the pipelinedbus enables making use of various independently designedprocessor modules configurated in a system according to whatis needed. For each one of them it`s. possible that itsoperating position in the pipeline is physically changed, dueto which the algorithm of the video signal processing can beflexibly formed and matched with the specific conditions.Extra opportunities to shape the form produced in the imagetransformation system are a result of a routine selection of factors (e.g. the convolution matrix) engaged in the hardwareprocessor process, transferred from the external bus (e.g.VME bus) level which is not engaged in the pipelined transferof the vision data (Fig. 5).The pipelined architecture in Fig. 5 shows hardwareprocessors P interconnected by a pipelined bus composed of the video data and the control signals [10], [11]. Thehardware processors are accessible from the VME bus level.The logic module IL serves the interrupt signals and theirtransfer onto VME bus.Pipelined processing of a video signal from the camera claimsvery high marks from the pipelined processor which has toprocess the pixel completely, before the next portion of information (next pixel) comes. The time available for thepipelined processor is strictly related to the samplingfrequency of an A/D converter which is connected to theanalogue camera output. This time is resulting from the timeof a signal image processing by the camera and its divisioninto lines and pixels.  VME Bus P P  Adress Bus [13 bits] Data Bus [8 bits] Control Bus Data VideoInterrupt linesProcessorINPUTSOUTPUTS ILP Interrupts Logic Fig. 5.   A specialized hardware processors architecture. Usually for the purpose of image analysis, a square(geometrically) field of image is provided, which is dividedinto square pixels. In order to preserve the square field, thereduced length of a line to be analyzed is: 3/4 x 512/575 x 52µs (52 µs - standard duration of the visible part of a singleline). The above considerations result in the samplingfrequency of the analogue/digital (A/D) converter, whichvalue is represented by the equation:  f KN t  MHz ver hor    =∗= 1475, (1)where: K = N hor  /N ver = 4/3 image proportions; N hor = 575number of the horizontal lines visible on the screen; t hor = 52µs active time of line scanning (PAL).The video signal is transferred as a series of samples (for thesystem here described, it has been assumed: 1 sample = 1pixel = 8 bits) in the image frames following each other. Eachframe is composed of 512 lines a 512 pixels in each line. Thedata flowrate through the bus is 15 MB/s. The resolution of abus for video data processing is 8 bits; thus, an 8-bit input busof video data (PD_ INO..7) and a same output bus(PD_OUTO..7) is connected to each module.Image synchronization for each processor module is achievedby the input signals (PH_IN, PV_IN) of horizontal andvertical extinction (fig. 4) and by the output signals(PH_OUT, PV_OUT) respectively. The output signals aregenerated by the control logic of each module, and theyappear with a delay corresponding to the one resulting fromthe duration of the video signal processing by the particularprocessor (fig. 5). The subsequent samples (pixels) areintroduced into the module during the growing edge of thestrobe signal of video data (P_STB_IN). At the moduleoutput, a corresponding output signal (P_STB_OUT) isgenerated, with the same reservation as for the extinctionsignals.The above structure is very much competitive in view of thefinite capabilities of conventional microprocessors to enhancetheir computation power and of the operating frequency of their clocks. With this structure, the cycles of instruction anddata reception are eliminated, and the operations themselvesare performed in parallel. The performance time of severalexemplary operations of image pre-processing are presentedin the Table II [13].  Imageprocessingelement  IC components Pixel clock frequency(in circuit)Computation of Image Frame(512x512x8 bits)medianfilteringXC4005-515 MHz17.37 ms2 × IDT722103 × 3convolution (2directions)2 × IMS-A110XC4003-515 MHz17.44 ms3 × 3convolution XC4010E2 × IDT7221015 MHz17.37 mslook-up-tableXC4005-515 MHz17.30 mssubtraction of two framesXC4004-515 MHz17.30 mshistogramXC4005-515 MHz17.30 ms The pipelined bus module for testing was placed in a cassettewith VME bus. The works were supervised by a real-timeoperation system OS-9 installed on FORCE SYS68K/CPU32module (Motorola MC68030 microprocessor) together withSYSTEM-PAK I/MGR graphic package operating inconjunction with EKF SAGA 6 /7842 graphic controller.IV.   T HE RETINA VIDEO SYSTEM The RETINA Module is a reconfigurable, highly universalhardware-software platform for implementation of a widevariety of algorithms for Digital Image and Signal Processing.The previous version of retina-like re-mapping board, basedon TTL - logic elements and ISA-bus, was constructed in theBiocybernetic Laboratory in the 90-ies [14].The RETINA Module had been designed as a dedicatedsystem to perform of the Log-Polar transform, however it isworth mentioning, that the placement of the FPGA device inthe centre of the module's architecture allows its deepreconfiguration [15], [16]. Fig. 6. The RETINA Module: block diagram of the hardware platform forrealization of the Video Processor.  The module's construction has been based on the VirtexXCV300-6BG432 FPGA device, 32-bit floating pointDSP96002 signal processor, BT812 video signal processorand a PCIS5933 bridge. Essential elements of the system arethree blocks of fast SRAM memory, assigned for image datastorage (Fig. 6). Their independent operation enables theparalelization of some stages of the transformation algorithm,what increases the efficiency of the system. The module isprovided with some extra peripheral devices, essential for therealization of the system's auxiliary functions. The devicesare: the real time clock, the RS232 transceiver, the sensor of the FPGA device's temperature and the ROM memory storingthe data and program for the DSP processor. Thus theplatform is a 32-bit microprocessor system, enhanced by theflexibility and computational resources of the FPGA device. The role of the FPGA device exceeds beyond the rangedetermined by the image processing algorithm. It implementsthe environment necessary for a proper functioning of theDSP processor and memory service, and it also creates a pathof data exchange with the supervising computer via the PCIbridge, by implementing a Cross-bar Switch for externalbuses. The listed functions, realized for external devices andfunctional blocks implemented in the FPGA, have beenintegrated in the FPGA controller module [17]. Fig. 7. The block diagram of the image processing in the Retina board, forthe application of Log-Polar transform, with specification of particularfunctional elements, data streams and communication paths betweenprocesses. The DSP device realizes further stages of the post-processing.The particular utility of this unit in the signal processingmanifests itself during the realization of floating-pointarithmetical operations. For instance in the process of Log-Polar transform the unit executes the normalization operationfor the contents of RETINA matrix. In addition the DSP unitis also able to execute the system program, controlling theboard. Due to that the module can also operate as astandalone unit, without a supervising computer.V.   T HE I MPLEMENTATION OF I MAGE P ROCESSING O PERATIONS IN N EW G ENERATION FPGA D EVICES .The FPGA devices of capacities greater than severalhundred thousands of logical gates allow integratedimplementation of the image processing and analysisalgorithms in one reprogramable chip. Due to the extension of  TABLE II D E P I A R   P ROSESSING E LEMENTS P ERFORMANCE (1997).
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks

We need your sign to support Project to invent "SMART AND CONTROLLABLE REFLECTIVE BALLOONS" to cover the Sun and Save Our Earth.

More details...

Sign Now!

We are very appreciated for your Prompt Action!