Short Stories

A Vision Chip for Color Segmentation and Pattern Matching

A Vision Chip for Color Segmentation and Pattern Matching
of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  EURASIP Journal on Applied Signal Processing 2003:7, 703–712c  2003 Hindawi Publishing Corporation AVisionChipforColorSegmentationandPatternMatching RalphEtienne-Cummings Iguana Robotics, P.O. Box 625, Urbana, IL 61803, USADepartment of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USAEmail: PhilippePouliquen Iguana Robotics, P.O. Box 62625, Urbana, IL 61803, USADepartment of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, USAEmail: M.AnthonyLewis Iguana Robotics, P.O. Box 625, Urbana, IL 61803, USAEmail: tlewis@iguana-robotics.comReceived 15 July 2002 and in revised form 20 January 2003 A 128(H) × 64(V) × RGB CMOS imager is integrated with region-of-interest selection, RGB-to-HSI transformation, HSI-basedpixel segmentation, (36bins × 12bits)-HSI histogramming, and sum-of-absolute-di ff  erence (SAD) template matching. Thirty-twolearned color templates are stored and compared to each image. The chip captures the R, G, and B images using in-pixel storagebefore passing the pixel content to a multiplying digital-to-analog converter (DAC) for white balancing. The DAC can also be usedto pipe in images for a PC. The color processing uses a biologically inspired color opponent representation and an analog lookuptable to determine the Hue (H) of each pixel. Saturation (S) is computed using a loser-take-all circuit. Intensity (I) is given by thesum of the color components. A histogram of the segments of the image, constructed by counting the number of pixels falling into36 Hue intervals of 10 degrees, is stored on a chip and compared against the histograms of new segments using SAD comparisons.We demonstrate color-based image segmentation and object recognition with this chip. Running at 30fps, it uses 1mW. To ourknowledge, this is the first chip that integrates imaging, color segmentation, and color-based object recognition at the focal plane. Keywordsandphrases:  focal plane image processing, object recognition, color histogramming, CMOS image sensor, vision chip,VLSI color image processor. 1. INTRODUCTION CMOS-integrated circuits technology readily allows the in-corporation of photodetector arrays and image processingcircuits on the same silicon die [1, 2, 3, 4, 5, 6]. This has led to the recent proliferation in cheap and compact dig-ital cameras [7], system-on-a-chip video processors [8, 9], and many other cutting edge commercial and research imag-ing products. The concept of using CMOS technology forcombining sensing and processing was not spearheaded by the imaging community. It actually emerged in mid ’80sfrom the neuromorphic engineering community developedby Mead and collaborators [10, 11]. Mead’s motivation was to mimic the information processing capabilities of biolog-ical organisms; biology tends to optimize information ex-traction by introducing processing at the sensing epithe-lium [12]. This approach to sensory information processing,which was later captured with terms such as “sensory pro-cessing” and “computational sensors,” produced a myriad vi-sion chips, whose functionality includes edge detection, mo-tion detection, stereopsis, and many others (examples can befound in [13, 14, 15, 16]). The preponderance of the work on neuromorphic vi-sion has focused on spatiotemporal processing on the in-tensity of light (gray-scale images) because the intensity canbe readily transformed into a voltage or current using ba-sic integrated circuit components: photodiodes, photogates,and phototransistors. These devices are easily implemented  704 EURASIP Journal on Applied Signal Processingin CMOS technologies using no additional lithography lay-ers. On the other hand, color image processing has been lim-ited primarily to the commercial camera arena because threeadditional masks are required to implement R, G, and B fil-ters [17]. The additional masks make fabrication of color-sensitive photodetection arrays expensive and, therefore, notreadily available to researchers. Nonetheless, a large part of human visual perception is based on color information pro-cessing. Consequently, neuromorphic vision systems shouldnot ignore this obviously important cue for scene analysisand understanding. This paper addresses this gap in the sili-convisionliteraturebyprovidingperhapstheonlyintegratedlarge array of color photodetectors and processing chip. Ourchip is designed for the recognition of objects based on theircolor signature.There has been a limited amount of previous work onneuromorphic color processing. The vast majority of colorprocessing literature addresses standard digital image pro-cessing techniques. That is, they consist of a camera that isconnected to a frame grabber that contains an analog-to-digital converter (ADC). The ADC interfaces with a digitalcomputer, where software algorithms are executed. Of thefew biologically inspired hardware papers, there are clearly two approaches. The first approach uses separate imagingchips and processing chips [18], while the second approachintegrates a handful of photodetectors and analog process-ing circuitry [19]. In the former example, standard cam-eras are connected directly to analog VLSI chips that demul-tiplex the video stream and store the pixel values as volt-ages on arrays of capacitors. Arrays as large as 50  ×  50 pix-els have been realized to implement various algorithms forcolor constancy [18]. As can be expected, the system is largeand clumsy, but real-time performance is possible. The sec-ond set of chips investigate a particular biologically inspiredproblem, such as RGB-to-HSI (Hue, saturation, and inten-sity) conversion using biologically plausible color opponentsand HSI-based image segmentation using a very small num-ber of photodetectors and integrated analog VLSI circuits[19]. Clearly, the goal of the latter is to demonstrate a con-cept and not to develop a practical system for useful im-age sizes. Our approach follows the latter, however, we alsouse an architecture and circuitry that allow high-resolutionimaging and processing on the same chip. In addition, weinclude higher-level processing capabilities for image recog-nition. Hence, our chip can be considered to be a func-tional model of the early vision, such as the retina and vi-sual area #1 (V1) of the cortex, and higher visual corticalregions, such as the inferotemporal area (IT) of the cortex [20, 21]. 2. COLORSEGMENTATIONANDPATTERNMATCHING In general, color-based image segmentation, object identifi-cation, and tracking have many applications in machine vi-sion. Many targets can be easily segmented from their back-grounds using color, and subsequently can be tracked fromframe to frame in a video stream. Furthermore, the tar-gets can be  recognized   and tagged using their color signa-ture. Clearly, in the latter case, the environment must beconfigured such that it cooperates with the segmentationprocess. That is, the targets can be colored in order to fa-cilitate the recognition process because the recognition of natural objects based solely on color is prone to false posi-tives. Nonetheless, there are many situations where color seg-mentation can be directly used on natural scenes. For ex-ample, people tracking can be done by detecting the pres-ence of skin in the scene. It is remarkable that skin, fromthe darkest to the lightest individual, can be easily trackedin HSI space, by constructing a model 2D histogram of the Hue (H) and saturation (S) (intensity (I) can be ig-nored) of skin tone in an image. Skin can be detected inother parts of the image by matching the histograms of these parts against the HS model. Figures 1 and 2 show an example of a general skin tone identification task, imple-mented in Matlab. Conversely, specific skin tones can be de-tected in a scene if the histogram is constructed with specificexamples. The latter will be demonstrated later using ourchip.Color imagers, however, provide an RGB color represen-tation.Fortheaboveexample,aconversionfromRGBtoHSIis required. There are other benefits of this conversion. ThemainadvantageoftheHSIrepresentationstemsfromtheob-servation that RGB vectors can be completely redirected un-der additive or multiplicative transformations. Hence, colorrecognition using RGB can fail under simple conditions suchas turning on the light (assume a white source; coloredsources manipulate the color components in a more pro-found way). HS components, however, are invariant underthese transformations, and hence are more robust to vari-ations in ambient intensity levels. Equation (1) shows how HSI components are derived from RGB [19, 22]. Notice that HandSarenota ff  ectedifR  →{ R+ a,a R  } ,G →{ G+ a,a G } ,and B →{ B+ a,a B } . In the equation, R, G, and B have beennormalized by the intensity, that is, R   /  I  =  r  , G  /  I  =  g  , andB  /  I = b :H = arctan  √  3[  g  − b ]2  ( r  −  g  ) + ( r  − b )   ,  (1a)S = 1 − 3  min( r,g,b )   ,  (1b)I = R+G+B .  (1c)The conversion from RGB to HSI is, however, nonlinear andcan be di ffi cult to realize in VLSI because nonlinear func-tions, such as arctangent, cannot be easily realized with ana-log circuits. Here, we present an approach for the conversionthat is both compact (uses small silicon area) and fast. It isalso worth noticing that the HSI conversion uses color op-ponents ( r  −  g  ,  r  − b ,  g  − b ). Although we have made no at-tempt to mimic biological color vision exactly, it is worthnoticing that similar color opponents have been identified inbiological color processing, suggesting that an HSI represen-tation may also be used by living organisms [19, 20, 21, 23]. Figure 3 shows the color opponent receptive fields of cells inthe visual cortex [23]. Figure 4 shows how we implemented  A Vision Chip for Color Segmentation and Pattern Matching 705 (a)0 . 120 . 10 . 080 . 060 . 040 . 02020151050 20151050(b) Figure 1:(a)Examplesofskintonesobtainedfromvariousindivid-uals with various complexions. (b) The HS histogram model con-structed from picture in (a). Figure  2: Skin tone segmentation using HS histogram model inFigure 1. Black pixels have been identified. On-center O ff  -centerR  − GY − B+ −−−−−−−− + −−−−−−−−− ++++++ − ++++++++ −−−−−−−− ++ −−−−−−−−−− ++++++ −− ++++++ Figure  3: Color opponent receptive fields in the visual cortex.Unipolar o ff  - and on-cells of G − B and  Y  − B are used to constructthe HSI representation. Imagingarray Imagingarray Imagingarray +  −  +  −  +  −   R  − B R  − G G − B Figure  4: Color opponent computation performed by the chip.Bipolar R  − B, R  − G, and G − B are used to implement the HSIrepresentation in (1). color opponents on our chip. Using these color opponents,the RGB-to-HSI conversion is realized. 3. CHIPOVERVIEW We have designed a 128(H) × 64(V) × RGB CMOS imager,which is integrated with analog and digital signal process-ing circuitry to realize focal plane region-of-interest selec-tion, RGB-to-HSI transformation, HSI-based segmentation,36-bin HSI histogramming, and sum-of-absolute-di ff  erence(SAD) template matching for object recognition. This self-contained color imaging and processing chip, designed as afront-end for microrobotics, toys, and “seeing-eye” comput-ers, learns the identity of objects through their color signa-ture. The signature is composed of a (36bins  ×  12bits)-HSIhistogramtemplate;aminimumintensityandminimumsat-uration filter is employed before histogramming. The tem-plate is stored at the focal plane during a learning step. Dur-ing the recognition step, newly acquired images are com-pared to 32 stored templates using the SAD computer. Theminimum SAD result indicates the closest match. In addi-tion, the chip can be used to segment color images and iden-tify regions in the scene having particular color characteris-tics. The location of the matched regions can be used to track objects in the environment. Figure 5 shows a block diagramof the chip. Figure 6 shows a chip layout (the layout is shown  706 EURASIP Journal on Applied Signal Processing  X   Block select register  X   Pixel scanning registerDummy row 1      Y     B     l   o   c     k   s   e     l   e   c    t   r   e   g    i   s    t   e   r      Y     P    i   x   e     l   s   c   a   n   n    i   n   g   r   e   g    i   s    t   e   r Selectedblock 128 × 64 × R   , G  , Bpixel array Dummy row 2R   , G  , BcolumncorrectR   , G  , Bscalerscalememory NormalizeR   , G  , B  →  r,g,b intensity Saturation computerHue computerI H S12-b36  , 12-b counters (S,I) threshold test(H) decode →  36 bins18-b   Sub12-b   Sub   Sub   Sub     T   e   m   p    1    T   e   m   p    2    5    T   e   m   p    2    T   e   m   p    8    T   e   m   p    3    2    8     b   a   n     k   s   o     f    4   p   a   r   a     l     l   e     l    t   e   m   p     l   a    t   e   s    3    6      ×     1    2  -     b    i    t    S    R    A    M    t   e   m   p     l   a    t   e   s    T   e   m   p     l   a    t   e   m   e   m   o   r   y   c   o   n    t   r   o     l     l   e   r Template matchingsum-of-absolute di ff  erences Figure  5: Computational and physical architecture of the chip. because the light shielding layer obscures the details). To ourknowledge, this is the first chip that integrates imaging, colorsegmentation,andcolor-basedobjectrecognitionatthefocalplane. 4. HARDWAREIMPLEMENTATION 4.1. CMOSimaging,whiteequalization,andnormalization In the imager array, three current values, corresponding toR, G, and B, are sampled and held for each pixel. By storingthe color components in this way, a color filter wheel canbe used instead of integrated color filters. This step allowsus to test the algorithms before migrating to an expensivecolor CMOS process. When a color CMOS process is used,the sample-and-hold circuit in Figure 7 will be removed. AnR, G, and B triplet per pixel, obtained from on-chip filters,will then be provided directly to the processing circuit.No change to the scanning or processing circuitry will berequired. To facilitate processing, a current mode imagingapproach is adopted. It should be noted, however, thatcurrent mode imaging is typically noisy. For our targeted ap-plication, the noisiness in the image does not pose a problemand the ease of current mode processing is highly desirable.Current mode imaging also provides more than 120dB of dynamic range [10], allows RGB scaling for white correctionusing a multiplying DAC and RGB normalization using atranslinear circuit [24]. The normalization guarantees that alarge dynamic range of RGB currents are resized for the HSItransformer to operate correctly. However, it limits the speedof operation to approximately 30fps because the transistorsmust operate in subthreshold.Forreadout,thepixelscanbegroupedintoblocksof1 × 1(single pixel) to 128 × 64 (entire array). The blocks can be ad-vanced across the array in single or multiple pixel intervals.  A Vision Chip for Color Segmentation and Pattern Matching 707 Imager array      T   e   m   p     l   a    t   e   m   a    t   c     h    i   n   g Image processingStored templates Figure  6: Chip layout (light shield layer obscures all details in mi-crograph). Vdd d   Sample R Sample G Sample B Vdd m Reset R G BRow select(a) Vdd m Scaled R Intensity Scaled GScaled BI biasScaled B  V   a1  V   a1I bias     B   n   o   r   m     =     I     b    i   a   s .    B    /     (    R    +    G    +    B     ) (b) Figure  7: (a) Schematic of the pixel. (b) Schematic of the normal-ization circuit. Each block is a subimage for which an HSI histogram is con-structed, and can be used as a learned template or a test tem-plate. The organization of the pixels and the scanning meth-ods are programmable by loading bit patterns in two scan-ning registers, one for scanning pixels within blocks and theother for scanning the blocks across the array.Figure 7 shows the schematic of the pixel and a portionof the RGB normalizer. The output currents of the pixel areamplified using tilted mirrors, where  Vdd d < Vdd m . Inlight intensity for which this array is designed, a logarithmicrelationship is obtained between light intensity and outputcurrent [25]. Logarithmic transfer functions have also beenobserved in biological photoreceptors [26]. This relationshiphas the additional benefit of providing wide dynamic rangeresponse. A reset switch is included to accelerate the o ff  -transition of the pixel. Not shown in Figure 7b is the scalingcircuit that simply multiplies the RGB components by pro-grammable integer coe ffi cients from 1 to 16. The scaling isused to white balance the image because silicon photodiodesare more sensitive to red light than to blue.The normalization circuit computes the ratio of eachcolorcomponenttothesumofthethree(i.e.,intensity)usingthe translinear circuit in Figure 7b. The circuit uses MOS-FETs operating in subthreshold so that the relationship be-tween the gate-to-source voltages and the currents throughthe devices is logarithmic. Hence, the di ff  erence of these volt-ages provides the logarithm of the ratio of currents. By usingthevoltagedi ff  erenceasthegate-to-sourcevoltageofanothertransistor, a current is produced which is proportional to thisratio (i.e., the anti-log is computed). This function is easily implemented with the circuit in Figure 7b, however, becauseall transistors must operate in subthreshold, that is, with very small currents on the order of  ∼ 1nA, the circuit can be slow.Using larger transistors to allow larger bias currents is coun-tered by the increased parasitic capacitance. With a parasiticcapacitance of   ∼  2fF and a bias current of 1nA, a slew rateof 2 µ s/V is obtained, while at 30fps, the circuit needs a timeconstant of  ∼ 3300  /  (128 × 64) = 4 µ s. This circuit limits thespeed of the system to a maximum speed of 30 frames persecond despite the relatively small size of the array. In fu-ture designs, this speed problem will be corrected by usingan above threshold “normalization” circuit that may not beas linear as the circuit depicted in Figure 7b. 4.2. RGB-to-HSIconversion The RGB-to-HSI transformer uses an opponent color for-mulation, reminiscent of biological color processing [19].The intensity is obtained before normalization by summingthe RGB components (see Figure 7b). To compute the satu-ration of the color, the function in (1b) must be evaluated foreach pixel. Since the minimum of the three normalized com-ponents must be determined, an analog loser-take-all circuitis used. It is often di ffi cult to implement a loser-take-all, so awinner-take-all is applied to 1 −{ r,g,b } . The circuit is shownin Figure 8. The base winner-take-all circuit is a classical de-sign presented in [27, 28]. For the determination of the Hue of the RGB values, thefunction in (1a) must be computed. Since this computationrequires an arctangent function, it cannot be easily and com-pactly implemented in VLSI. Hence, we used a mixed-signal
Similar documents
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks