A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays

A Steered-Response Power Algorithm Employing Hierarchical Search for Acoustic Source Localization Using Microphone Arrays
of 13
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
  IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 19, OCTOBER 1, 2014 5171 A Steered-Response Power Algorithm EmployingHierarchical Search for Acoustic SourceLocalization Using Microphone Arrays LeonardoO. Nunes  ,StudentMember,IEEE  , WallaceA. Martins  ,Member,IEEE  , MarkusV.S. Lima  ,Member,IEEE  ,Luiz W. P. Biscainho  , Member, IEEE  , Maur ´ õ cio V. M. Costa, Felipe M. Gonçalves, Amir Said  , Fellow, IEEE  , andBowon Lee  , Senior Member, IEEE   Abstract—  The localization of a speaker inside a closed envi-ronment is often approached by real-time processing of multipleaudio signals captured by a set of microphones. One of the leadingrelated methods for sound source localization, the steered-re-sponse power (SRP), searches for the point of maximum powerover a spatial grid. High-accuracy localization calls for a densegrid and/or many microphones, which tends to impracticallyincrease computational requirements. This paper proposes anew method for sound source localization (called H-SRP), whichapplies the SRP approach to space regions instead of grid points.This arrangement makes room for the use of a hierarchical searchinspired by the branch-and-bound paradigm, which is guaranteedto  Þ nd the global maximum in anechoic environments and shownexperimentally to also work under reverberant conditions. Besidesbene Þ ting from the improved robustness of volume-wise searchover point-wise search as to reverberation effects, the H-SRPattains high performance with manageable complexity. In par-ticular, an experiment using a 16-microphone array in a typicalpresentation room yielded localization errors of the order of 7 cm,and for a given  Þ xed complexity, competing methods’ errors aretwo to three times larger.  Index Terms—  Sound source localization, steered-responsepower, microphone array, computational complexity, hierarchicalsearch, branch-and-bound. Manuscript received June 17, 2013; revised November 22, 2013; acceptedJune 16, 2014. Date of publication July 16, 2014;date of current version August28, 2014. The associate editor coordinating the review of this manuscript andapproving it for publication was Dr. Pina Marziliano. This R&D project is a co-operation between Hewlett-Packard Brasil Ltda. and COPPE/UFRJ, being sup- ported with resources of Informatics Law (no. 8.248, from 1991). L. O. Nunes,W. A. Martins, M. V. S. Lima, and L. W. P. Biscainho would like to thank alsoCAPES, CNPq, and FAPERJ agencies for funding their research work.L. O. Nunes, W. A. Martins, M. V. S. Lima, L. W. P. Biscainho, andM. V. M. Costa are with the Signal, Multimedia, and TelecommunicationsLab—DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro,Rio de Janeiro, RJ 21941-972, Brazil (e-mail:;;;; M. Gonçalves was with Signal, Multimedia, and TelecommunicationsLab—DEL/Poli & PEE/COPPE, Federal University of Rio de Janeiro, Riode Janeiro, RJ 21941-972, Brazil. He is now with the COPPEAD, FederalUniversity of Rio de Janeiro, Rio de Janeiro, RJ 21941-918, Brazil ( Said was with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA.He is now with LG Electronics Mobile Research, San Jose, CA 98008 USA(e-mail: Lee was with Hewlett-Packard Laboratories, Palo Alto, CA 94304 USA.He is now with the Department of Electronic Engineering, Inha University, In-cheon 402-751, South Korea (e-mail: Object Identi Þ er 10.1109/TSP.2014.2336636 I. I  NTRODUCTION S OUND source localization (SSL) [1], [2]  Þ nds use in avariety of practical systems ranging from communications(e.g., teleconference systems) to medical applications (e.g.,hearing aids), to mention just a few [3], [4]. In order to localize an acoustic source, one must necessarily rely on some sort of spatial information such as that su pplied by a microphone array(MA).Among the SSL techniques devised for MAs, two familiesof algorithms are usually the prevalent choices [2]: the  Þ rstis explicitly based on the time-difference of arrival (TDoA),whereas the second relies on maximizing the steered-response power (SRP) of a beamformer. TDoA-based methods, amongwhich the most popular techniques use the TDoAs estimated by the generalized cross-correlation (GCC) [5]–[7], require rel-atively few numerical operations to localize a source, as com- pared to SRP-based algorithms. However, the performance of TDoA-based methods is highly affected by noise and reverber-ation [2], [8], which mi ght hinder their use in practical applica-tions. In such situations, SRP-based methods, whose classicalversion (hereafter called C-SRP) [2], [8] is the most widelyused, are more appro priate for their robustness to acoustical is-sues inherent to the application environment.In order to estimate a source position, the C-SRP method isapplied over a grid of prede Þ ned spatial points, which repre-sent source location candidates. High localization accuracy canonly be achieved at the cost of increasing either the number of gridpointsorthenumberofmicrophones(usually,thelargerthenumber of captured sound signals, the higher the attained spa-tialdiversity).Therefore,theburdenofthepoint-wisesearchfor the source position drives the computational complexity of theC-SRP algorithm, whose increase can turn real-time operationimpractical, thus rendering the algorithm useless in most appli-cations of  interest [9].In an attempt to address this issue, several methods thatmodify the search process have been proposed. For instance,in [10] the authors devised a search strategy for the C-SRP  based on the stochastic region contraction (SRC) algorithm,which enables the estimation of source location without nec-essarily evaluating the objective function associated with theC-SRP for every grid point. Similarly to the SRC method, thecoarse-to- Þ ne region contraction (CFRC) [11] tries to  Þ nd thesource position by progressively reducing the search spaceaccording to a set of heuristics. An improvement of the SRCmethod relying on particle  Þ ltering was proposed in [12]. In 1053-587X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See for more information.  5172 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 19, OCTOBER 1, 2014 addition, low-energy contributions are disregarded in the searchstage of the C-SRP in [9].All thepreviousmethodsdecreasethenumberof arithmeticaloperations by avoiding the evaluation of the C-SRP objectivefunction for every point in the search grid. This eventuallymeans that the true source position can sometimes be missed,since there is no formal guarantee that the points which aredisregarded in the search process are not good candidates for the source location. This disadvantage becomes more evidentin the presence of reverberation, which induces many localmaxima in the objective function of the related method.This paper proposes a novel SRP-based localization methodfor a single source which is able to reduce the computationalcomplexity of the search stage, as compared to the C-SRP, andis theoretically guaranteed to converge to the global maximum(that indicates within the partitioned 3-D search space whichsubregion contains the source position) under anechoic condi-tions—while keeping good performance in practical non-ane-choic environments. The key idea is the implicit explorationof portions of the search space. The proposed search strategy,herein called  hierarchical search , adapts the branch-and-bound(B&B) paradigm [14] to tackle the SSL problem by de Þ ning anobjective function related to acoustic activity whose value com- putedforagiven3-Dspatialregionisneversmallerthanforanyof its subregions in ideal conditions. Simulation results under realistic reverberant conditions show that the proposed method(referred as H-SRP) can achieve good accuracy with relativelylow computational burden. It should be pointed out that the ideaof a volumetric SRP using hierarchical search, although differ-ently formulated, was srcinally proposed in [13].The paper is organized as follows. Section II provides anoverview of the proposed hierarchical search, describing howthe B&B paradigm can be tailored for SSL applications; therole played by the bounding function is emphasized, as wellas the conditions it must satisfy to guarantee convergence tothe global maximum of the related objective function. Thechoice of a bounding function that meets such conditions inan anechoic environment is addressed in Section III, whichalso includes a brief discussion on the effects of reverberation.Section IV considers practical aspects regarding the implemen-tation of the proposed H-SRP method, which is compared with previous work in Section V. Experiments using both arti Þ -cially-generated and recorded signals are shown in Section VI.Conclusions are drawn in Section VII. The proofs of threetheorems are left to Appendices A, B, and C.  Notations:  The symbols , and denote the set of real,integer, and natural numbers, respectively. The set of nonnega-tive real numbers is represented by . In addition, vectors aredenoted by lowercase boldface letters and is the Euclideannorm.Giventwosets and ,thenotation denotesthesetcontaining the elements of which are not in . The roundoperator takes as an argument a real number and returns the in-teger number which is closest to the argument. For a function, and given a subset , the set iscalled the image of .II. B&B-I  NSPIRED  H IERARCHICAL  S EARCH Aspreviouslyexplained,mostofthecomputationalburdenof theC-SRP isduetoitssearch process,which requiresthesearchspacetobedividedintoagridofpointsthatmustbevisitedonceeach. Moreover, for a given MA and a prede Þ ned sampling fre-quency, one can only increase the accuracy of the position es-timates by increasing the number of points within the grid, i.e.,turning it denser. Some methods try to circumvent this issue byavoiding visiting all grid points [9], [10]. However, since the C-SRP objective function may exhibit multiple local maxima,such methods fail to guarantee convergence to the global max-imum (that determines the actual source position) from a deter-ministic point of view. 1 Therefore, the following dilemma arises: in order to assureconvergence to the actual source position, no grid point should bedisregarded;ontheotherhand,exhaustivesearchthroughthegrid is usually too complex for real-time applications. A naturalway to address this problem is by performing an implicit explo-ration of the search space [13]. The branch-and-bound (B&B) paradigm [14], [15], srcinally developed for discrete and com- binatorial optimization problems, seems to be well-tailored for this purpose since it guarantees convergence to the global max-imum of its corresponding objective function.B&B-based algorithms work with search spaces that can bedivided into nested subspaces, each subspace seen as a node ina dynamically generated tree structure [14]. Rather than eval-uating the underlying objective function for all possible nodesin the tree, B&B-based algorithms work with a bounding func-tion, 2 which helps one decide how new nodes will be gener-ated through the branching process. In summary, the main com- ponents of a general B&B algorithm are [14]: (i) selection of the node to process, (ii) bounding function calculation, and (iii)node branching. By their proper shaping (and their adaptationto the SSL problem), this paper develops the proposed  hierar-chical search .To start with, the proposed hierarchical search considers thata node will correspond to a 3-D spatial region. In this case, theroot node is the entire Euclidean search space (e.g., a meetingroom). The bounding process uses a bounding function, whichis proposed in Section III, that is associated with the presence(or absence) of the sound source within the given spatial region.Here, the objective function and the bounding function turn outto be the same. As for the branching process, it is simply theway a node is subdivided in order to generate new subregions(new nodes). Thus, the proposed hierarchical search operates by dividing the search space (root node) into smaller regions(nodes)—which constitutes the branching process—and thencalculating for each spatial region the value of the boundingfunction—which constitutes the bounding process.The role of the bounding function is to allow parts of the re-lated search space to be implicitly evaluated, i.e., the whole Eu-clidean search space can be explored without explicitly visitingeachpoint.Mathematically,theproposedhierarchicalsearchre-lies on an appropriate bounding function , inwhich is a family of sets over the full search space, with denoting the power set 3 of . An ancil-lary technical condition is that the elements of are compact 1 In fact, the term “ the  global maximum” is loose, since there may even existmore than one global maximum, depending on the chosen array geometry andthe acoustical characteristics of the environment. 2 This name comes from the fact that this function is always smaller than or equal to the objective function of the related problem. 3 The power set of a set is the set of all subsets of .   NUNES  et al. : SRP ALGORITHM EMPLOYING HIERARCHICAL SEARCH FOR ACOUSTIC SOURCE LOCALIZATION 5173 and connected sets (e.g., cuboids). The hierarchical search se-quentially subdivides the search space into subregions up to a prede Þ ned minimum “size” , which is related to thestopping criteria of the search algorithm. 4 During this process,the bounding function plays a key role in discarding sets that donot require any further subdivision. Using to represent a listof pairs of subregions to be evaluated, together with their related bounding values , and with being the maximum bounding value found at a given iterationof the algorithm and being its corresponding subregion (i.e.,), the proposed hierarchical search is as follows.1) (Initialization) Let , and.2) If then stop: the search is complete and isthe subregion of size not larger than with the largest bounding value.3) Sample one element from list , and let.4) (Bound) If then go to Step 2.5) If size , then let and ,and go to Step 2. Otherwise, go to Step 6.6) (Branch) Divide into distinct subregions, such that their interiors are disjoint and, and then compute .7) Letand go to Step 2.Fig. 1 illustrates how the hierarchical search operates. The Þ rst step of the algorithm consists of sub-dividing the searchspace and computing the bounding function for each subregion(Fig. 1(a)). Then, the subregion with the largest bounding valueis selected and further subdivided until the region of sizeis reached (Fig. 1(b)). Finally, in order to guarantee that the cur-rent maximum value corresponds to the global maximum, anyother region that has a bounding value larger than (or equal to)that of the current maximum must be subdivided until a newmaximum is found or all bounding values are lower than that of the current maximum (Fig. 1(c)). Alternatively, the algorithmcan be represented in a tree-format where each node is a sub-division of the search space (in the case of the example whereeach region is subdivided into two regions, one would have a binary tree). In this case, the global maximum is found when allleaves of the tree have bounding values lower than that of thecurrent maximum.From the illustration presented in the previous paragraph, itis possible to note that the potential of the hierarchical searchin reducing the complexity of the search stage relies on its ca- pability of discarding large regions without further exploration/subdivision.In this section the fundamentals of the proposed hierarchicalsearchwereshown,buttheboundingfunctionwasnotstated.Sofar, any function satisfying both the following properties is suf  Þ cient to guarantee convergence to the globalmaximum:1) If is a leaf that contains the source and is aleaf that does not, then .2) If is a subregion of , then .In the next section, a function that meets these properties will be proposed. 4 More information on how this quantity is chosen is given in Section IV.D.Fig. 1. Illustration of three different stages of the B&B algorithm for a 2-Dsearch space. An equivalent tree-representation of each stage is also provided.The nodes are visited according to a “best  Þ rst search” strategy. The numbersinside each region/node represent the bounding value (output of the boundingfunction) for the given region. III. B OUNDING  F UNCTION Beforedescribingtheproposedboundingfunction,itisworth Þ rst presenting some known properties of the C-SRP methodthat naturally lead to the proposed bounding function. Thus, theideahere isto start by showing how theC-SRP techniquesolvesthe problem of localizing a source which emits a signal that ismaximally concentrated in both space and time domains in ananechoic environment. It turns out that the interpretation of thissimple toy example will be very instructive for further develop-ments. After this discussion, it will be shown how the C-SRPmethod can be modi Þ ed to deal with acoustic signals which arenotconcentratedintime,suchaswideband/speechsignals.Sucha presentation ordering will point out that “counting TDoAs” isindeed the key aspect for the construction of the bounding func-tion of the proposed hierarchical search.  A. C-SRP  The C-SRP method steers a microphone array beam to manylocations searching for the acoustic source position. This searchis based on maximizing the power of the output signal of a beamformer. Hence, the C-SRP maximizes the following ob- jective function:(1)where denotes a candidate for theacoustic source position and is the possibly  Þ ltered ver-sionofthediscrete-timesignalacquiredbythe thmicrophone,for , where denotes the number of microphones in the array. In addition, would bethe time-lag due to the propagation from the source positionto the th microphone.It is possible to rewrite (1) as follows [8]:(2)  5174 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 19, OCTOBER 1, 2014 where represents the discrete-time Fourier transformof  , and is the discreteTDoA of a signal emitted at position to microphones and. Now, consider the application of the C-SRP method into twodifferent setups, namely: localizing an impulse and localizing awideband/speech signal both within an anechoic environment. 1) Localizing an Impulse in an Anechoic Environment:  As-sumethatasingleacousticsource,locatedatposition ,emits a pulse signal . In this case, considering ananechoic environment and disregarding the attenuation factorsassociated with sound propagation, one has (based on (2))(3)where any discrete TDoA can be rewritten as(4)in which denotes the th microphone position,is the sampling frequency 5 related to the capturedsignals, and is the speed of sound. Note thatfor any .In this very idealized, yet instructive, setup, one can clearlysee how the C-SRP method aggregates the TDoAs associatedwith each microphone pair in order to estimate the true source position. Indeed, (3) shows that, for each position in the 3-DEuclideanspace,theC-SRPmethodsimplyassociatesanumber that quanti Þ es  how many actual TDoAs  (associated withthe true source position )  match exactly  the TDoAs computedas if the source were in position . In other words, countingTDoAs is the bottom line here. 2) Localizing an Acoustic Source in an Anechoic Environ-ment—The Role of PHAT Filtering:  Now, consider a more real-istic type of acoustic signal , which can represent a speechsignal, for example. In order to preserve exactly the same in-tuitive and useful result obtained in (3), one should modify theC-SRP objective function from (2) to the following form, whichde Þ nes the C-SRP-PHAT method:(5)in which the phase transform (PHAT)  Þ lter is de Þ ned as(6)By using this new objective function, one arrives once moreat(7) 5 Throughout this paper it will be assumed that the error induced by samplingis negligible, i.e., the limited time resolution does not impair the localizabilityof the source. Therefore, for anechoic environments, the interpretation of the C-SRP-PHAT objective function when dealing with genericsource signals is the same as that of the C-SRP associated with aspatial-temporalwell-localizedsourcesignal (i.e.,countingTDoAs is the key idea).  B. H-SRP  Consider how one can implement the search for the optimal point which maximizes(5). In order to implement digitally sucha process, one must somehow discretize the space, for instance, by de Þ ning a grid of 3-D points. An alternative procedure is todivide the entire search region in a  Þ nite number of 3-D com- pact and connected spatial regions, such ascuboids.In this case,for each microphone pair there is not necessarily only a singleTDoA, but rather there may be many TDoAs associated with agiven spatial region. Indeed, assuming that is a 3-D con-nected spatial region, then the set , de Þ ned as(8)containsalldiscreteTDoAsassociatedwiththemicrophonepair which are images of some point within , withstanding for thecontinuous TDoAs. Note that the limited temporal resolution related to the sam- pling process of the acquired signals induce a limited spatialresolution over the entire search region as well. Indeed, for eachinteger , there are in Þ nitely many 3-D spatial points that sat-isfy the relation , which can be ex- pressed as(9)In other words, all points in between the hyperboloidscannot be distinguished, sincethey are mapped into the same discrete TDoA .Therefore, following the idea of counting TDoAs, a naturalmodi Þ cation of (7) when dealing with a volumetric region is(10)which can be rewritten, based on (5), as(11)Observe that, in (7), for each spatial  point  , one accumulatesthe actual TDoAs which exactly match the TDoAs of   the  re-ferred spatial point, considering all microphone pairs. Hence,what matters for electing a candidate spatial point as the source position is the number of times the hyperboloids in (9) passthrough/contain that point. On the other hand, in (10), for eachspatial  region , one accumulates the actual TDoAs that exactlymatch with the TDoAs of   any  spatial point within the referredregion, considering all microphone pairs. Thus, the same under-lying idea of the C-SRP holds here: what matters for electing aspatial region as the one containing the source position is the   NUNES  et al. : SRP ALGORITHM EMPLOYING HIERARCHICAL SEARCH FOR ACOUSTIC SOURCE LOCALIZATION 5175 number of times the hyperboloids in (9) pass through that spa-tial region. This is an ef  Þ cient way of summing up all pieces of information that the points within the region convey. There-fore, the proposed localization method sets the bounding func-tion , described in Section II, as in(11). As already mentioned in Section II, this bounding func-tion will represent the objective function of the proposed hier-archical search.An important question that may arise from the H-SRP objec-tive function de Þ nition in (11) is whether the sound source iscontained within the region that maximizes the H-SRP-PHATobjective function or not. The answer to this question is yes, asdescribed in the following theorem. Theorem 1:  If , then ,for any .  Proof:  See Appendix A.Theorem 1 guarantees that the source position is not lostduring the search for the regions that maximize the objec-tive function . It is worth highlighting that morethan one region may maximize , but the theoremguarantees that the volume containing the source location isalways a candidate to be the winning volume in an anechoicenvironment. This property of the proposed ob jective functionis one of the fundamental differences between the H-SRP andthe method proposed in [13].Another important result refers to the bounding capability of the objective function in (11), which is key to the B&B process,namely: the fact that the objective function cannot increasewhen one passes from a given volume to one of its subsets.This result is described in the following theorem. Theorem 2:  If , then.  Proof:  See Appendix B. C. Effects of Reverberation In order to motivate the de Þ nition of the H-SRP objectivefunction, only anechoic signals were considered in the former discussion of the C-SRP-PHAT method. But, what occurs whenreverberation is present? There is no de Þ nite answer to thisquestion, since it depends on the degree of reverberation. In-deed, considering that the signal acquired by the th micro- phone can be written as , in whichis the maximum delay (including those from the re ß ections)for the signal to arrive at any microphone in the array, andis the th coef  Þ cient of the multipath model of the rever- beration effect between the source and the th microphone. 6 Substituting this model into (5), one gets (for the C-SRP-PHATobjective function)(12) 6 This model, in practice, needs to consider large values of in order to ade-quately model a reverberant room. where isthe discrete-time Fourier transform of  , andis as given in (6). Notethat,ingeneral, isnotalinearfunction(withmodulo ) of the normalized frequency , which means thatthe integral in (12)  is not   equal to a simple discrete-time pulsesignal. In fact, such an integral may not have a closed-formexpression. However, since the aim of the C-SRP-PHAT tech-nique is to maximize and since the integral that ap- pears in (12) is always smaller than or equal to , then thisapproach will elect the position which yields TDoAs suchthat the referred integral is as close to as possible for asmany microphone pairs as possible. Since dependsnot only on the source and microphone positions, but also onthe reverberation effects that take place, then the applicabilityof such a beamformer may be limited, since multipath effectsmight hinder an accurate estimate of the delays related to the di-rect paths. As a consequence, the MA may not steer its beam tothe correct source location. Nevertheless, C-SRP-PHAT is stillquiteemployedtosolvesource-localizationproblems,sinceitismorerobusttoreverberationandnoiseeffectsthanTDoA-basedmethods that employ the generalized cross-correlation (GCC)technique [2], [8].With respect to the proposed H-SRP-PHAT objective func-tion, the reverberation model along with (11) yield(13)Thus, the method will elect the spatial region whose associ-ated TDoAs maximize the integral in (13) for as many micro- phone pairs as possible. In other words, the method is also sen-sitive to reverberation effects, which are expressed in the phase. As regards Theorems 1 and 2, they no longer holdin the context of reverberant environments.It is worth mentioning that, when the direct path acquiredsignal is much stronger than the other multipath signals, thencan be approximated by a linear function of ,which eventually means that the result is analogous to that ob-tained considering an anechoic setup.IV. P RACTICAL  C ONSIDERATIONS In this section, practical considerations regarding the imple-mentation of the proposed algorithm are discussed. Firstly, the branching process, i.e., the strategy to divide the search spaceinto subregions, is presented. Then, the algorithm employed to Þ nd the time-delay bounds of (11) and (8) is described. After that, the initialization procedure of the hierarchical search is de-scribed. The section ends with an implementation summary of the proposed algorithm. It is worth mentioning that the topicsaddressed in this section are also new contributions of this work which are not present in [13].  A. Branching Process Inthecurrentimplementationonlycuboidregionsareconsid-ered. Thus, the whole search space is assumed to be a cuboid
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks