Abstract
— This paper studies the delay magnification mechanism of the parasitoid fly
Ormia ochracea
and proceeds by proposing and analyzing
a simpler delay magnification system inspired by it. The proposed system combined with a conventional cross correlogram has the potential to localize competing sound sources with ITDs in the microsecond range.
I.
I
NTRODUCTION
OUND localization is a necessary preprocessing task for robust speech separation and noise reduction systems. The human auditory system employs binaural processing when dealing with sound localization [1]. The binaural processing employs interauraltime differences (ITD) along with interaurallevel differences (ILD). Coincidence detection models have been proposed for the computation of the ITD in humans [2]. Low power analog VLSI coincidence detection architectures have also been implemented [37]. However, when the distance between the acoustic sensors becomes very small, the available ITD and ILD values become challenging to use with coincidence detection systems given that their delay lines can generally resolve minimum delays in the range of tens of microseconds.
Ormia ochracea
(
O
2
), a parasitoid fly, possesses a remarkable hearing system which copes with minute ITD values [8]. Although the distance between its ears is so small (approximately 520µm) that the ITD and ILD values are also very small, its hearing system has been found to act as a multiplier of the ITD value. Recently, certain groups have developed and constructed low noise miniaturized differential microphones based on the mechanical model of the
Ormia ochracea
hearing system [912]. Our study analyzes the delay magnification mechanism of the
Ormia
’s ears based on the dynamic behavior proposed in [8]. As a result of our analysis, an
Ormia
inspired signal processing scheme suitable for the magnification of the ITD in a miniature binaural sound localization is proposed. The paper is organized as follows: Part II introduces the mechan
Manuscript received April 16, 2010. This work was supported by The Imperial Royal Thai scholarship in Biomedical Engineering. M. Kongpoon is with the Biomedical Engineering Department, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK. (email: m.kongpoon07@imperial.ac.uk). Y. N. Billeh was with the Biomedical Engineering Department, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK. (email: yazan.billeh06@imperial.ac.uk). E. M. Drakakis is with the Biomedical Engineering Department, Imperial College London, South Kensington Campus, London, SW7 2AZ, UK. (email: e.drakakis@imperial.ac.uk).
2
x
33
,
ck
ck
,
m
1
s
m
ck
,
1
x
2
s
Figure 1. Mechanical model of the
Ormia ochracea
hearing system proposed in [8].
ical model of the
Ormia
’s
hearing system. Part III analyses briefly the operation of the natural O
2
system and proposes a new O
2
system inspired by it. Part IV presents crosscorrelogrambased localization results from a system employing the O
2
inspired system while part V discusses errors due to mismatches in the proposed system. II.
O
RMIA
O
CHRACEA
E
ARS
’
M
ECHANICAL
M
ODEL
In [8] Miles and his colleagues have proposed the mechanical model of Fig.1 for the
Ormia ochracea
hearing system. The spring constants
k
and
k
3
and the unison constants
c
and
c
3
account for the stiffness at the middle and at the ends of the intertympanal bridge of the fly’s ears. The quantities
x
1
and
x
2
denote the displacement of the ear membranes,
s
1
and
s
2
are the applied forces at the membranes, and
m
is the mass at each end. The dynamic behavior of the model of Fig.1 can be codified as:
⎥⎦⎤⎢⎣⎡=⎥⎦⎤⎢⎣⎡⎥⎦⎤⎢⎣⎡+⎥⎦⎤⎢⎣⎡⎥⎦⎤⎢⎣⎡+++⎥⎦⎤⎢⎣⎡⎥⎦⎤⎢⎣⎡++
212121332133
00
s s x xmm x xcccccc x xk k k k k k
&&&&&&
(1) It can be shown [8] that the dynamics of the
Ormia ochracea
hearing system can be expressed as the superposition of two vibration modes: the
rocking mode (RM)
and the
translating mode (TM)
. The rocking mode responds to the difference in magnitude of the input forces while the translating mode responds to the superposition of the input forces. III.
A
NALYSIS OF THE
O
2
D
ELAY
M
AGNIFICATION
S
YSTEM
The study of
Ormia
’s ears in [8] suggests the existence of a coupled mechanical structure which can magnify the ITD and ILD. This section explains the mechanism of the delay magnification process of the
Ormia
’s ears based on the dynamic response (1) and proposes a simpler delay magnification structure.
A Study of a Delay Magnification System Inspired by the
Ormia ochracea
Hearing System
Metha Kongpoon, Yazan N. Billeh, and Emmanuel M. Drakakis,
Member, IEEE
S
Proceedings of the 2010 3rd IEEE RAS & EMBSInternational Conference on Biomedical Robotics and Biomechatronics,The University of Tokyo, Tokyo, Japan, September 2629, 2010
9781424477098/10/$26.00 ©2010 IEEE
540
1
s
2
s
1
x
2
x
Figure 2. Equivalent signal flow graph of the O
2
mechanical model. The system receives the inputs
s
1
and
s
2
which are scaled by the factors 1/
k
r
and 1/
k
t
and are processed by the two filters RM and TM. The two outputs of the system are denoted by
x
1
and
x
2
, respectively. TABLE I THE PARAMETERS OF THE O
2
SYSTEM IN FIG.2
RM Filter TM Filter
t r
Q
0/0
ckm
/
)2/()2(
33
ccmk k
++
t r
0/0
mk
/
mk k
/)2(
3
+
t r
k
/
20
2
r
m
ω
20
2
t
m
ω
t r
H
/
][
2000220
r r r r
sQ s
ω ω ω
++
][
2000220
t t t t
sQ s
ω ω ω
++
Figure 3. The
ITD
of the O
2
hearing system as a function of the frequency and the physical delay. The parameters of the O
2
hearing system are:
k
= 0.576 N/m,
k
3
= 0.18 N/m,
c
= 1.15×10
5
N s/m,
c
3
= 2.88×10
5
N s/m and
m
= 2.88×10
10
kg [8].
A.
Detailed Analysis of the O
2
Delay Magnification System
In this subsection the equivalent signal flow graph of the mechanical model of the
Ormia
’s ears shown in Fig.1 is derived. The delay characteristic of the
Ormia
’s ear membranes is derived and explained from this signal flow graph. To derive the equivalent signal flow graph of Fig.1, we apply Laplace Transform on both sides of (1) resulting in two equations. After subtracting and adding these two equations and rearranging them in the form of two transfer functions the system codified by (1) can be expressed by means of the signal flow graph shown in Fig.2. The quantities appearing in Fig.2 are listed in Table I with
k
,
k
3
,
c
,
c
3
and
m
illustrated in Fig.1. The transfer functions
H
r
(
s
) and
H
t
(
s
) correspond to the rocking and the translating modes and are therefore termed
RM and TM filters
, respectively. The signal flow graph that we derived in Fig.2 is equivalent to the modal solution for (1) shown in [8]. We term the delay present between the two ear inputs as
the physical delay
τ
pd
and the delay appearing between the two outputs
x
1
and
x
2
of the O
2
system shown in Fig.2 as
ITD
. The characteristic of the
ITD
resulting from the O
2
hearing system of Fig.2 can be seen in Fig.3. Fig.3 is produced by varying the frequency and the physical delay of the inputs
s
1
and
s
2
in Fig.2 which are assumed to be pure sinusoidals of the same amplitude and frequency but are delayed by
τ
pd
. The frequency and
τ
pd
vary from 0 to 25kHz and from 2.5 to 12µs, respectively. The system parameters used for the production of Fig.3 are the same as the system parameters shown in [8]:
k
= 0.576 N/m,
k
3
= 0.18 N/m,
c
= 1.15×10
5
N s/m,
c
3
= 2.88×10
5
N s/m and
m
= 2.88×10
10
kg. From Fig.3, it can be seen, for example, that for
τ
pd
= 2.5 µs and a frequency around 1kHz, the
ITD
is around 50 µs which is 20 times the physical delay. It is worth noting that at low frequencies the
ITD
is approximately a linear function of the physical delay. We can derive the identity for the characteristic of the
ITD
in Fig.3 from the phase response of the O
2
system outputs as:
ω ϕ λ λ λ ω ϕ λ
/)])1)2cos(2(
1(cos[),,(
2/124
21
++−−=
−
ITD
(2) The derivation of (2) assumes sinusoidal incident sound waves of frequency
ω
rad/s with the same amplitudes arriving at both ears with the physical delay
τ
pd
. In (2)
λ
(
ω
,
τ
pd
) = 
C
(
ω
,
τ
pd
)/
D
(
ω
,
τ
pd
) with 
C
(
ω
,
τ
pd
) and 
D
(
ω
,
τ
pd
) the amplitudes of the output signals from the RM and TM filters of Fig.2. Thus, it can be internally controlled by gains of RM and TM filters, respectively. The quantity
φ
is the difference in phase provided by the RM and TM filters at their outputs. A useful feature to be noted from (2) is that for a given frequency
ω
, the
ITD
depends on
φ
and
λ
only. It should be pointed out that the purpose of input arithmetic blocks is to extract the information of the physical delay and modulate the magnitude of the input signals of the RM and TM filters. The RM and TM filters amplify the physical delay by means of their magnitude gains. Finally, output arithmetic blocks are used to include the amplified physical delay back into the phase terms of the output signals.
B.
The Modified O
2
system
From the previous subsection it should be clear that the
ITD
between the outputs of the
Ormia
’s ears is controlled by phase differences and gain ratios between RM and TM filters. This subsection investigates a simpler O
2
system which results in higher delay gain than that of the srcinal system. Now bearing in mind Fig.2 consider a modified O
2
system where the 1/
k
r
and 1/
k
t
blocks are removed, and the RM and TM filters are reduced to simple gain elements whose gain
541
Figure 4. The effect of increasing the delay gain
β
value and the frequency
ω
/2
π
value on the
ITD
relation (6) when the input physical delay amounts to 5µs. (a): the delay gain varies from 20 to 100. (b): the frequency varies from 0.1 to 2 kHz. Figure 5. Tuning characteristics of the O
2
system. (a): The delay gain varies from 20, 10 to 6.67 providing the same
ITD
for different physical delay values of 5, 10 and 15µs respectively. (b): With the physical delay set to 5 µs, when the delay gain
β
varies from 20.18 to 20.7 and 21.6 the same
ITD
value of 100µs is achieved at different frequencies of 0.5, 1 and 1.5 kHz, respectively.
magnitudes are denoted by
ν
and
ρ
respectively. Assuming
s
1
(
t
) = sin(
ω
t
),
s
2
(
t
) = sin(
ω
(
t

τ
pd
)), and
the delay gain
β
=
ν
/
ρ
, the cross correlation between the outputs of the O
2
system is computed by:
∫
+−∞→
+=
T T T
dt t xt x
T c
)()(
21lim)(
21
τ τ
(3) The
ITD
value corresponds to the value of
τ
where the maximum value of the cross correlation occurs. From (3) a closed form for the
ITD
can be computed:
ω ωτ β β ωτ β
/)])cos()1(1
)sin(2(tan[
221
pd pd
ITD
++−−=
−
(4) Relation (4) reveals a complex relation between the physical delay and
ITD
. To make the analysis simpler, we assume that
s
1
(
t
) =
s
(
t
) and
s
2
(
t
) =
s
(
t+
τ
pd
) are the received narrow band signals. Expanding
s
(
t+
τ
pd
) by means of a Taylor series around
t
, we have: ....)(21)()()(
2
+++=+
t st st st s
pd pd pd
&&&
τ τ τ
(5) When
τ
2
pd
on the right hand side of (5) is very small, the third term of (5) can be neglected. Replacing
s
2
(
t
) with
s
(
t+
τ
pd
), applying the Laplace transform and rearranging
s
1
(
t
) and
s
2
(
t
), (4) can be approximated (6):
ω βωτ ωτ βωτ
/)])2()2(1(tan[
221
pd pd pd
ITD
−+−≈
−
(6) Next, we analyze the effect of
β
and
ω
values on the resulting
ITD
as expressed by (6). At low
ω
and
β
values where
βωτ
pd
<< 1, the
ITD
in (6) can be simplified to
ITD
≈

βτ
pd
(since tan
1
(
βωτ
pd
)
≈
βωτ
pd
when
βωτ
pd
<< 1).This means that the
ITD
value is approximately equal to the physical delay
τ
pd
multiplied by the delay gain
β
. For given physical delay and
β
values, the
ITD
decreases monotonically with
ω
as shown in Fig.4.a. Fig.4.b illustrates the dependence of
ITD
upon
β
for various input frequencies. Bearing in mind (6) we investigate the tuning properties of the proposed system. Assuming that
ωτ
pd
<< 1 the term (
ωτ
pd
/2)
2
can be neglected with respect to the term (
βωτ
pd
/2)
2
and the
ITD
value depends exclusively on
βωτ
pd
. Consequently, so long as the product
βωτ
pd
preserves its value for different physical delay
τ
pd
values (this can be achieved by varying the value of
β
accordingly) the same targeted
ITD
values can be attained. This is shown in Fig.5.a. Conversely, for a given
τ
pd
value the same targeted
ITD
value can be achieved at different frequencies of interest by varying the
β
value. This is shown in Fig.5.b. By comparing the delay characteristic of the
ITD
provided by the srcinal O
2
system which is comprised of two second order filters (see Fig.3) with the
ITD
of the simple O
2
system composed of delay gains (see Fig.4.a), it can be seen that the simple O
2
system provides higher delay gain.
IV.
C
ROSS

CORRELOGRAM OF THE
O
2
S
YSTEM
This section introduces the crosscorrelogram as proposed in [13] and the crosscorrelogram combined with the simple O
2
system proposed in section III.B.
A.
The Conventional Crosscorrelogram
The crosscorrelogram [13] is an analysis tool that uses a spectrum analysis by cochlea filterbanks and cross correlation units to extract the ITD information from input speech signals. Its computation is equivalent to the coincidence detection model [2]. It consists of
M
identical channels for each of the left and right cochlea filterbanks and the correlator units. In this analysis the cochlea filters are modeled by means of
N
th
order Differential AllPole Gammatone Filters (DAPGF) [14] whose transfer function is given by:
N N DAPGF
sQ s s s H
][)(
2002120
ω ω ω
++=
−
(7)
542
Figure 6. The cross correlogram of a single sound source with 200µs physical delay. The left and right filterbanks are composed of 200 6
th
order DAPGF each. Each correlator consists of a pair of 60 delay stages of 10 µs delay each. Figure 7. The proposed O
2
inspired system combined with the coincidence detection model.
where
ω
0
and
Q
are the pole frequency and the quality factor respectively. At each
m
th
channel frequency the output of the left and right cochlea filters (
l
m
and
r
m
respectively) are the inputs to the correlator unit which consists of a pair of
T
stages of
τ
µs delay units emerging from the left and right cochlea side at the
m
th
channel frequency. At the
j
th
delay unit and
t
th
time frame the cross correlation between signals coming from both left and right cochlea filters using
K
sampling data can be computed as:
∑
−=
−−×−=
t K t k mm
jT k l jk r jmt C
))(()(),,(
(8) The ITD corresponding to the
m
th
frequency channel can be approximated by the location of time lag along the
T
delay stages where the maximum value of the crosscorrelation takes place. To illustrate the concept of the crosscorrelogram, a real i
(a) (b) Figure 8. a) The crosscorrelogram including the O
2
system for the same sound source as the previous example with the physical delay of 4µs, and delay gain of 50 at low frequencies. The crosscorrelation is composed of 32 stages of 10µs of delay each. b) Remapping the crosscorrelogram of Fig.8.a using the inverse relation codified by (9).
nput speech signal from a male speaker is recorded from a microphone with a sampling rate of 44kHz. This signal is assumed to be received from the first acoustic sensor. The second acoustic sensor is assumed to receive a delayed version of the same signal. This second signal is delayed by the physical delay
τ
pd
. In this example a sound source placed near the left acoustic sensor is assumed which corresponds to a
τ
pd
value of 200µs. Signals from the right and left sensors are windowed using 45.5ms rectangular window. These windowed signals are also interpolated using MATLAB spline functions in order to generate an input signal with enough time resolution; time resolution is set to 0.5µs. The interpolated signals are then passed to the left and right cochlea filterbanks where the total number and the order of DAPGFs (
M
and
N
) at each side are set to 200 and 6, respectively. The center frequency of each cochlea filter ranges from 0.08 to 5kHz and is equally spaced on the ERB scale [15]. The crosscorrelation between the output of the right and left DAPGF are computed independently for each frequency channel using (8) where the total number of delay stages
T
is 60 and each delay unit
τ
is equivalent to 10µs delay. The resulting crosscorrelogram computed by means of Matlab is shown in Fig.6. From Fig.6 it can be seen that the main peak where the maximum of the crosscorrelation occurs (highlighted in red color) takes place at 200µs which corresponds to its input physical delay. Besides the main peak, “ambiguous” peaks also appear on the left and right of the main peak. This phenomenon is known to be caused by the high frequency of
543
(a)
(b) Figure 9. The crosscorrelogram(top) and the pooled crosscorrelogram(bottom) of a mixture of two sound sources with delays of 10µs and 2µs respectively a) crosscorrelogram without the O
2
system using 60 stages of 0.5µs delay units. b) remapped crosscorrelogram including the O
2
system using 30 stages of 10µs delay units.
the signal compared to
τ
pd
[16]. In practice, delays of less than 10µs are difficult to resolve [37] due to the restriction posed by the resolution of delay units. When the physical delay is less than the resolution of the delay unit, it will not be possible to extract the desired ITD from the crosscorrelogram. From the analysis in the previous section, it should be clear that the O
2
system with simple delay gain elements can magnify a physical delay of less than 10µs to the range of hundreds of microseconds. In what follows we study the crosscorrelogram of a system which incorporates the O
2
unit.
B.
The Modified O
2
System with the Crosscorrelogram
This subsection investigates the simplified
Ormia
inspired ITD magnification scheme of subsection III.B in conjunction with the conventional crosscorrelogram technique of subsection IV.A. Now, consider the system in Fig.7 where we include the proposed O
2
system as a preprocessing stage for the crosscorrelgram. In Fig.7, the O
2
system proposed in section III.B receives the outputs from both microphones and produces a pair of signals with magnified physical delay. These signals are filtered by both the left and right cochlea filters. Pairs of outputs from same frequency left and right cochlea filters are crosscorrelated in the coincidence detection model. Given that this crosscorrelation corresponds to magnified delays the resulting crosscorrelogram is mapped back to the “correct” (i.e. nonmagnified) one. To demonstrate the effect of the O
2
system on a conventional crosscorrelogram, the physical delay
τ
pd
is set to 4µs which is less than the resolution of the delay unit in the crosscorrelator which is 10
μ
s, the delay gain is set to 50 and the rest of the parameters are the same as in the previous example. The resulting crosscorrelogram is illustrated in Fig.8.a. From Fig.8.a, it is clear that the incorporation of the O
2
system allows for the production of a crosscorrelogram even though the delay of the correlator units is longer than the physical delay. However the crosscorrelogram of the system including the O
2
system has a “modified” shape compared to the ideal crosscorrelogram without the O
2
system. Bearing in mind Fig.6 it can be seen that at low frequencies the ITD in Fig.8.a is approximately equal to the product of the physical delay by the delay gain and the main peak is bent towards zeros at high frequencies. This effect results from the gain frequency characteristic of the O
2
system as predicted by (6) and Fig.4.a. To make the interpretation of the modified crosscorrelogram reliable, the inverse relation between the measured ITD from the crosscorrelogram with the O
2
system and the approximated physical delay
τ
′
pd
has to be employed. This inverse relation between the
i
th
measured ITD from the crosscorrelogram and the corresponding approximated physical delay
τ
′
pd
(
ω
) at fixed operating frequency
ω
and delay gain
β
can be found from (6) as:
⎪⎪⎪⎩⎪⎪⎪⎨⎧>−−−+<−−−−=
θ θ
ω ω β ω α β α β β
ω ω β ω α β α β β
ω τ
,)1()1( ,)1()1()(
22222222'
iiiii pd
(9) where2/))(tan(
ω ω α
ii
ITD
×=
and))(2/(
ω π ω
θ
i
ITD
×=
. By applying (9) to the values of the crosscorrelogram shown in Fig.8.a, the crosscorrelogram of Fig.8.b can be constructed. From Fig.8.b, it can be observed that the main peak of the “corrected” crosscorrelogram occurs around 4µs which coincides with the correct physical delay value.
C.
Multiple Sound Sources Localization
In real life situations a sound source of interest is practically always interfered by other competing signals. This will affect its localization. The conventional crosscorrelogram can locate more than one sound sources when the overlap between the input spectrums is so small that there are enough frequency channels dominated by each one
544