Blind Separation of Anechoic UnderdeterminedSpeech Mixtures using Multiple Sensors
Rayan Saab
1
, ¨Ozg¨ur Yılmaz
2
, Martin J. McKeown
3
, Rafeef Abugharbieh
1
1
Department of Electrical and Computer Engineering, The University of British Columbia.
2
Department of Mathematics, The University of British Columbia.
3
Department of Medicine (Neurology), Paciﬁc Parkinson’s Research Centre, The University of British Columbia
Abstract
—This paper presents a novel technique for Blind SourceSeparation (BSS) of anechoic speech mixtures in the underdetermined case. A demixing algorithm that exploits the sparsity of the short time Fourier transform (STFT) of speech signals is proposed. Thealgorithm merges constrained optimization with ideas based on thedegenerate unmixing estimation technique (DUET) [1]. Thus, thenovelty in the proposed approach is twofold. First, the algorithmutilizes all available mixtures in the anechoic scenario, where bothattenuations and arrival delays between sensors are considered. Second, it is demonstrated that
l
q
minimization with
q <
1
outperformsthe standard choice of
q
= 1
. Experimental results on both syntheticand real mixtures indicate signiﬁcant performance gains over other BSS algorithms reported in the literature.
Keywords 
blind source separation, sparse signal representation, DUET, timefrequency representations, Gabor expansion,underdetermined signal unmixing, overcomplete representations
I. I
NTRODUCTION
Over the last few years, BSS algorithms have been developed for a wide variety of models, ranging from instantaneous,anechoic, and echoic mixing on one hand to overdetermined,evendetermined and underdetermined scenarios on the otherhand.For instantaneous mixing, especially in the evendeterminedcase, a powerful tool that has found increasing use is independent component analysis (ICA). First expressed in [2],then developed in an information maximization framework by Bell and Sejnowski [3], standard ICA assumes statisticalindependence of the sources and tries to extract
n
sources from
n
recorded mixtures. Lewicki and Sejnowski [4], and Lee etal. [5] expanded ICA into the instantaneous underdeterminedcase, where there are more sources than available mixtures,by using a maximum a posteriori approach and exploitingsparsity. For an extensive overview of ICA see [6]. OtherBSS approaches e.g. [7], [8] and [9] assume sparsity of thesources in some transform domain as well as a linear mixingmodel to solve the BSS problem for
instantaneous mixturesin the underdetermined case
. These approaches generally useconstrained
l
1
minimization for separation assuming that thismaximizes sparsity of the estimated sources in the transformdomain.Another set of algorithms that deal with
anechoic underdetermined
mixing scenarios were proposed. Jourjine et al.[10] and Yılmaz and Rickard [1] developed an algorithm,called DUET, that exploits sparsity in the short time Fouriertransform (STFT) domain and uses masking to extract multiplesources from only two mixtures. The assumption they referto as
WDisjoint Orthogonality
is that of only one sourcebeing active at every point in the timefrequency (TF) domain.Boﬁll [11] proposed another anechoic demixing algorithmfor underdetermined mixtures that extracts the attenuationcoefﬁcients using a scatter plot technique and the delays bymaximizing a kernel function. After the amplitudes and delaysare extracted, Boﬁll uses
second order cone programming
,a technique that can be used for
l
1
minimization in thecomplex domain, to recover the sources in the TF planefrom two mixtures. Vielva et al [12] considered the case of underdetermined instantaneous blind source separation wheresource densities are parametrized by a sparsity factor, andpresented a maximum a posteriori method for separation,while [13] focused on the estimation of the mixing matrixfor underdetermined BSS under the sparsity assumption. Arecent survey of available methods in blind source separationover the range of assumptions made and models used can befound in [14].In this paper, a new BSS technique for extracting sourcesin an underdetermined anechoic environment is proposed.In solving the problem of extracting a number of sourcesthat exceeds the number of available mixtures, the ‘standard’two stage approach as formalized by Theis et al [15] isadopted. In the ﬁrst stage the mixing parameters are estimatedby clustering feature vectors which are constructed from theGabor coefﬁcients (or Shorttime Fourier transform) of themixtures. These parameters, as well as a sparsity assumptionon the Gabor expansions of speech signals, are then usedin the second stage to extract the sources. In particular, an
l
q
 minimization based algorithm, with
q <
1
, is used toestimate the sources in the STFT domain. Accordingly, thenovel aspects of the proposed approach are:
•
Generation of feature vectors incorporating both attenuation and delay parameters for an arbitrary number of mixtures in the underdetermined BSS case.
•
Proposing the use of
l
q
minimization with
q <
1
which is shown to signiﬁcantly improve the separationperformance.
2006 IEEE International Symposium on Signal Processing and Information Technology0780397541/06/$20.00©2006 IEEE642
•
Comparing the performance of source extraction based on
ℓ
q
minimization and
ℓ
q
basispursuit for values
0
≤
q
≤
1
in STFT domain, and illustrating that the best separationperformance for speech is obtained for
0
.
1
≤
q
≤
0
.
4
.Experiments conducted on both synthetic and real mixturesindicate signiﬁcant performance gains over other BSS algorithms reported in the literature.II. M
IXING
M
ODEL AND
P
ROBLEM
F
ORMULATION
Assuming
n
time domain sources,
s
1
(
t
)
,...,s
n
(
t
)
and
m
mixtures
x
1
(
t
)
,...,x
m
(
t
)
such that
x
i
(
t
) =
n
j
=1
a
ij
s
j
(
t
−
δ
ij
)
, i
= 1
,
2
,...,m
(1)where
m < n
and
a
ij
∈
R
+
and
δ
ij
∈
R
are attenuationcoefﬁcients and time delays associated with the path from the
j
th
source to the
i
th
receiver, respectively. Equation (1) deﬁnesan
anechoic
mixing model. Without loss of generality, one canset
δ
1
j
= 0
and scale the source functions
s
j
such that
m
i
=1
a
2
ij
= 1
(2)for
j
= 1
,...,n
.By taking the STFT of
x
1
,...,
x
m
with an appropriatewindow function, the mixing model (1) can be written as
ˆ
x
(
τ,ω
) =
A
(
ω
)ˆ
s
(
τ,ω
)
,
(3)where
ˆ
x
= [ˆ
x
1
...
ˆ
x
m
]
T
,
ˆ
s
= [ˆ
s
1
...
ˆ
s
n
]
T
,
(4)
ˆ
x
i
and
ˆ
s
j
denote the STFT of
x
i
and
s
j
, respectively, and
A
(
ω
) =
a
11
... a
1
n
a
21
e
−
iωδ
21
... a
2
n
e
−
iωδ
2
n
.........
a
m
1
e
−
iωδ
m
1
... a
mn
e
−
iωδ
mn
.
(5)Note that, by (2), the column vectors of
A
have unit norm.Using the equivalent discrete form of the continuous STFT,i.e., the samples (Gabor coefﬁcients) of
s
on a sufﬁcientlydense lattice in the TF plane given by
ˆ
s
j
[
k,l
] = ˆ
s
j
(
kτ
0
,lω
0
)
(6)where
τ
0
and
ω
0
are the timefrequency lattice parameters.Similar notation will be used for the mixing matrix
A
and theGabor coefﬁcients of the mixtures
x
i
.The following two sections describe the two stages of the proposed algorithm, i.e., the parameter estimation andextraction of srcinal sources, both of which depend on thesparsity of the Gabor expansions of speech signals.III. M
IXING
P
ARAMETERS
R
ECOVERY
This section presents the method used to recover the mixingmodel parameters, i.e., the delay and attenuation coefﬁcients.
A. Speech Sparsity of STFT Coefﬁcients
Signal sparsity in certain transform domains facilitatesmixing parameter estimation. Cardoso [16] noted that theaccuracy with which the mixing parameters in a BSS modelcan be estimated depends on nonGaussianity of the sources.Furthermore, sparser sources allow better separation quality,e.g., [8]. Thus a transformation that yields a sparse representation of the data is desirable, both for estimating the mixingparameters accurately, and for separation. It was shown in[1] that Gabor expansions of speech signals are sparse. Tofurther illustrate the sparsity exhibited by speech in the STFTdomain, Figure 1 shows the average cumulative powers of the
sorted Gabor (STFT) coefﬁcients
along with the averagecumulative power of the time domain sources as well asof their Fourier (DFT) coefﬁcients. The STFT with 64mswindowsize is sparser, capturing 98% of the total signal powerwith only approximately 9% of the coefﬁcients.
0204060801009595.59696.59797.59898.59999.5100Percentage of Points
P e r c e n t a g e o f t h e P o w e r
Cumulative Power Distribution forSTFT with Various Window Sizes, Original Signal, Frequency Domain Signal
STFT: 32ms windowSTFT: 64ms windowSTFT: 128ms windowTime domain signalFrequency domain signal
Fig. 1. Average cumulative power of 50
3
s
speech signals in time domain,frequency (Fourier) domain, and TF domain for window sizes of
32
ms
,
64
ms
and
128
ms
. The STFT with
32
ms
and
64
ms
window length yieldsigniﬁcantly sparser representations of the data (more power represented infewer coefﬁcients).
B. Parameter Estimation
Consider the feature vectors at each TF point [k,l] given by
F
[
k,l
] :=
ˆ
x
1
[
k,l
]
ˆ
x
[
k,l
]
···
ˆ
x
m
[
k,l
]
ˆ
x
[
k,l
]
ˆ∆
21
[
k,l
]
···
ˆ∆
m
1
[
k,l
]
.
(7)where
·
denotes the Euclidean norm and
ˆ∆
j
1
[
k,l
] :=
−
1
lω
0
∠
ˆ
x
j
[
k,l
]ˆ
x
1
[
k,l
]
.
(8)as in [1]. If only one source
s
J
is nonzero at a TF point, thefeature vector at that TF point will reduce to
F
[
k,l
] =
ˆ
a
1
J
· · ·
a
mJ
· · ·
δ
2
J
· · ·
δ
mJ
˜
:=
F
J
Thus, the feature vectors calculated at any TF point
[
k,l
]
atwhich source
J
is the only active source will be identical, andequal to
F
J
.
643
Given the sparsity assumption for the sources in the TFdomain, it can be expected that there will be many points witha single active source. A clustering approach, in the featurespace, such as kmeans can thus be used to estimate the delayand attenuation parameters of the mixing model. In summary,the proposed
Parameter Estimation Algorithm
is as follows:1) Compute the mixture vector
ˆ
x
[
k,l
]
at every TF point
[
k,l
]
.2) At every TF point
[
k,l
]
, compute the correspondingfeature vector
F
[
k,l
]
, as in (7),3) Perform some clustering algorithm (e.g. Kmeans) toﬁnd the
n
cluster centers in the feature space. The clustercenters will yield preliminary estimates
¯
a
ij
and
¯
δ
ij
of the mixing parameters
a
ij
and
δ
ij
, respectively.4) Normalize the attenuation coefﬁcients to obtain the
ﬁnalattenuation parameter estimates
˜
a
ij
, i.e.,
˜
a
ij
:= ¯
a
ij
/
(
m
i
=1
¯
a
2
ij
)
1
/
2
.
The
ﬁnal delay parameter estimates
are given by
˜
δ
ij
:=¯
δ
ij
.IV. S
OURCE
S
EPARATION
This section presents a method for extracting the srcinalsources using the parameters estimated as described in sectionIII.First the estimated mixing matrix
˜
A
[
l
]
is constructed as
˜
A
[
l
] =
˜
a
11
e
−
ilω
0
˜
δ
11
...
˜
a
1
n
e
−
ilω
0
˜
δ
1
n
˜
a
21
e
−
ilω
0
˜
δ
21
...
˜
a
2
n
e
−
ilω
0
˜
δ
2
n
.........
˜
a
m
1
e
−
ilω
0
˜
δ
m
1
...
˜
a
mn
e
−
ilω
0
˜
δ
mn
(9)where,
˜
a
ij
and
˜
δ
ij
are the estimated attenuation and delayparameters. Note that each column vector of
˜
A
[
l
]
is a unitvector in
C
m
.The next step is to compute estimates
s
e
1
,s
e
2
,...,s
en
of thesrcinal sources
s
1
,s
2
,...,s
n
that satisfy
˜
A
[
l
]
ˆs
e
[
k,l
] =
ˆx
[
k,l
]
,
(10)where
ˆ
s
e
= [ˆ
s
e
1
,...
ˆ
s
en
]
T
is the vector of source estimates in theTF domain. At each TF point
[
k,l
]
, (10) provides
m
equations(corresponding to the
m
available mixtures) with
n > m
unknowns
(ˆ
s
e
1
,...
ˆ
s
en
)
. Assuming that this system of equationsis consistent, it has inﬁnitely many solutions. To choose areasonable estimate among these inﬁnitely many solutions thesparsity of the sources in the TF domain can be exploited.
A. Sparsity and
l
q
minimization
To ﬁnd, at each time frequency point, the “sparsest”
ˆs
e
thatsolves (10), the problem can be formally stated as
min
ˆs
e
ˆs
e
sparse
subject to
˜
A
ˆs
e
=
ˆx
,
(11)where
x
sparse
denotes some measure of sparsity of the vector
x
.Given a vector
x
= (
x
1
,...,x
n
)
∈
C
n
, one measure of itssparsity is given by the number of the nonzero componentsof
x
, commonly denoted by
x
0
. Replacing
x
sparse
in(11) with
x
0
, gives rise to the socalled
P
0
problem, e.g.,[17]. Solving
P
0
is, in general, combinatorial and the solutionis very sensitive to noise. More importantly, the sparsitymodel for the Gabor coefﬁcients of speech signals essentiallysuggests that most of the coefﬁcients are very small, howevernot identically zero. In this case,
P
0
fails as it does not takeinto account the value of the components. Alternatively, onecan consider
x
q
:= (
i

x
i

q
)
1
/q
,
where
0
< q
≤
1
,
as a measure of sparsity. Smaller values of
q
simply indicate more emphasis on sparsity of
x
, e.g., [18].Motivated by this, the vector of source estimates
ˆs
e
can becomputed by solving at each TF point
[
k,l
]
the
P
q
problemdeﬁned by replacing
ˆs
e
sparse
in (11) with
ˆs
e
q
.
B. Solving
P
q
The optimization problem
P
q
is not convex, thus computationally challenging. Under certain conditions, it can be shownthat a near minimizer can be obtained by solving the convex
P
1
problem [17], [19]. However, one would not want to imposeany a priori conditions on the sparsity of the Gabor coefﬁcientsof the source vectors. Without such conditions, only localoptimization algorithms are available in the literature [19]. Onthe other hand, we demonstrate here that the
P
q
problem with
0
< q <
1
can be solved in combinatorial time whenever themixing matrix
A
is real.
Theorem 1:
Let
A
= [
a
1

a
2

...

a
n
]
be an
m
×
n
matrixwith
n > m
,
A
ij
∈
R
, and the column vectors
a
i
have unitnorms. Suppose
A
is full rank. For
0
< q <
1
, the
P
q
problem
min
s
s
q
subject to
A
s
=
x
where
x
∈
R
n
, has a solution
s
∗
= (
s
∗
1
,...s
∗
n
)
whichhas
k
≤
m
nonzero components. Moreover, if the nonzero components of
s
∗
are
s
∗
i
(
j
)
,
j
= 1
,...,k
, then thecorresponding column vectors
{
a
i
(
j
)
:
j
= 1
,...,k
}
of
A
are linearly independent.The proof of this theorem is long, and therefore will be omittedin this paper.Theorem 1 renders the
P
q
problem computationallytractable, as it shows that there are only a ﬁnite number of solutions of
P
q
, and suggests a combinatorial algorithm tosolving
P
q
. More precisely, let
A
be the set of all
m
×
m
invertible submatrices of
A
(
A
is nonempty as
A
is fullrank). The solution of
P
q
will then be given by the solutionof
min
B
−
1
x
B
q
where
B
∈ A
.
(12)Here, for
B
= [
a
i
(1)
···
a
i
(
m
)
]
,
x
B
:= [
x
i
(1)
···
x
i
(
m
)
]
. Notethat
#
A ≤
nm
, (12) is a combinatorial problem in the casewhen the mixing matrix
A
and the mixture
x
are realvalued.
644
Though Theorem 1 in general does not hold when the matrix
A
is complex (a counter example and discussion can be foundin [20]) the goal of ﬁnding the solution with the smallest
l
q
norm is to impose sparsity. We thus propose to extract thesources using an
l
q
 basis pursuit approach, i.e. to ﬁnd the bestbasis composed by a subset of columns of
A
that minimizesthe
l
q
norm of the solution vector. As shown above, this isequivalent to solving the
P
q
problem in the realvalued case.Moreover, in the complexvalued case, [20] demonstrated thatfor
q
= 1
, the combinatorial solution is a good approximationof the true solution and can be obtained much faster.Thus, the proposed
separation algorithm
can be summarized as follows. At each TF point
[
k,l
]
:1) Construct the estimated mixing matrix
˜
A
[
l
]
as in (9).2) Solve the
ℓ
q
basispursuit problem with
A
= ˜
A
[
l
]
asdescribed above for some
0
< q <
1
to ﬁnd theestimated source vector
ˆs
e
[
k,l
]
.3) Repeat steps 1 and 2 for all TF points and then reconstruct
s
e
(
t
)
, the time domain estimate of the sourcesfrom the estimated Gabor coefﬁcients.V. I
NTERFERENCE
S
UPPRESSION AND
D
ISTORTION
R
EDUCTION
A slight variation to the algorithm is introduced where
ρ
, auser set parameter, is introduced to reﬁne the source estimatesby increasing sparsity. At each TF point the estimates of thesmallest sources whose combined power contribution is lessthan
100(1
−
ρ
)%
of the total power are set to zero. Thus,for
ρ
= 0
only the highest estimate is kept, and for
ρ
= 1
allestimates are kept. The idea here is to remove contributionsdue to noise or due to errors in estimating the mixing matrix.VI. E
XPERIMENTS AND
R
ESULTS
To test the performance of the proposed algorithm, we usethree measures deﬁned in [21], SignaltoInterference (SIR),SignaltoArtifact (SAR) and Signalto Distortion (SDR) Ratios. SIR measures the amount of interference due to othersources present in a certain estimated source, while SARmeasures artifacts due to algorithmic effects such as forcedor unnatural zeros in the STFT of sources, and SDR is anaggregate measure of distortion in an extracted source relativeto the srcinal.The importance of the algorithm being able to use all theavailable mixtures is highlighted by performing demixing of 5 sources using a decreasing number of mixtures startingwith the evendetermined case, and comparing the separationperformance against that of DUET. To generate the 5 mixtures,a mixing model composed of random attenuation
∼
U
(0
.
5
,
1)
and delay parameters
∼
U
(
−
1
,
1)
was employed. The proposed BSS algorithm was ﬁrst applied using 5 availablemixtures and the experiment was repeated using 4, 3 andﬁnally 2 mixtures. Figure 2 shows the SDR, SIR and SARresulting from separation as a function of
q
varying from0 to 1 in steps of 0.1. As expected, separation performance
00.20.40.60.81−15−10−5051015quasi−norm
S D R
Average SDR for estimated sources (n=5,
ρ
=0.8)m=5m=4m=3m=2DUET
(a) SDR
00.20.40.60.81−505101520253035quasi−norm
S I R
Average SIR for estimated sources (n=5,
ρ
=0.8)m=5m=4m=3m=2DUET
(b) SIR
00.20.40.60.81−505101520quasi−norm
S A R
Average SAR for estimated sources (n=5,
ρ
=0.8)m=5m=4m=3m=2DUET
(c) SARFig. 2. Average SDR, SIR and SAR (over the ﬁve sources) obtained fromdemixing various number of
simulated anechoic
mixtures of 5 sources as afunction of the norm with a preserved power parameter of 0.8. The horizontalline represents the results obtained using DUET. Across all results, the userestimates the existence of 6 sources.
drops as the number of mixtures used drops. Notably, even inthe case of 2 mixtures, our proposed algorithm outperformsDUET, which is designed to deal with exactly 2 mixtures. Ina set of experiments to further assess the performance of thealgorithm, an anechoic room mixing model [22] was used.The simulated scenario involved 3 microphones and severalsources placed in the room. The setup involved extracting 4underlying sources from the 3 mixtures and experiments wererepeated 60 times by varying both the speech sources and theirlocations in the room. The results are presented in Figure 3along with the results obtained using DUET, for comparison.Next, to provide an example on a real mixture, we test theaglorithm using the mixtures posted on [23], which have 2sources and 2 microphones. The microphones are placed 35cmapart, and the sources are placed
60
o
degrees to the left of the microphones and 2m on the midperpendicular of themicrophones respectively [23], [24]. Table 1 shows that theproposed algorithm outperforms that of [24] for which theaudio separation results can be found at [23].VII. C
ONCLUSION
This paper presents a novel BSS algorithm for demixingunderdetermined, anechoic mixtures. The technique is capableof using all available mixtures where both attenuations as wellas arrival delays between sensors are considered. Moreover,the proposed technique improves the separation performanceby incorporating
l
q
minimization with
q <
1
to inforce
645
Table 1. Demixing Performance (in dB) with 2 real mixtures of 2 sources,
ρ
= 0
.
7
,
ˆ
n
= 2
SIR [24] SIR (proposed algorithm) SAR [24] SAR (proposed algorithm) SDR [24] SDR (proposed algorithm)
s
1
26
.
232 40
.
7632 4
.
5363 7
.
4011 4
.
4967 7
.
3987
s
2
55
.
410 43
.
4322 5
.
6433 10
.
4101 5
.
6433 10
.
4077
mean
40
.
821
42
.
0977
5
.
0898
8
.
9056
5
.
0700
8
.
9032
00.20.40.60.81−6−4−202468quasi−norm
S D R
Average SDR for estimated sources (n=4, m=3)
ρ
=1
ρ
=0.8
ρ
=0.6DUET
(a) SDR
00.20.40.60.810510152025quasi−norm
S I R
Average SIR for estimated sources (n=4, m=3)
ρ
=1
ρ
=0.8
ρ
=0.6DUET
(b) SIR
00.20.40.60.81345678910quasi−norm
S A R
Average SAR for estimated sources (n=4, m=3)
ρ
=1
ρ
=0.8
ρ
=0.6DUET
(c) SARFig. 3.
Average
SDR, SIR and SAR (over 4 sources in 60 experiments)obtained from demixing 3 mixtures when the user estimates the existenceof 5 sources. Results are plotted as a function of the
q
for varying preservedpower parameter. The horizontal line represents results obtained using DUET.
sparsity. By adopting a twostage approach the proposedmethod combines the strengths of
l
q
minimization and DUET.In the
blind mixing model recovery
stage, feature vectors areconstructed and used to extract the parameters of the mixingmodel via clustering. This is followed by a
blind sourceextraction stage
based on
l
q
minimization which performs thedemixing at every TF point. Experimental results indicate thatthe proposed algorithm provides signiﬁcant gains over otherBSS techniques capable of using only two mixtures.R
EFERENCES[1] ¨O. Yılmaz and S. Rickard, “Blind source separation of speech mixturesvia timefrequency masking,”
IEEE Transactions on Signal Processing
,vol. 52, no. 7, pp. 1830–1847, July 2004.[2] C. Jutten, J. Herault, P. Comon, and E.Sorouchiary, “Blind separation of sources, parts i,ii and iii,”
Signal Processing
, vol. 24, pp. 1–29, 1991.[3] A. Bell and T. Sejnowski, “An informationmaximization approach toblind separation and blind deconvolution,”
Neural Computation
, vol. 7,pp. 1129–1159, 1995.[4] M. Lewicki and T. Sejnowski, “Learning overcomplete representations,”in
Neural Computation
, 2000, pp. 12:337–365.[5] T.W. Lee, M. Lewicki, M. Girolami, and T. Sejnowski, “Blind sourceseparation of more sources than mixtures using overcomplete representations,”
IEEE Signal Proc. Letters
, vol. 6, no. 4, pp. 87–90, April 1999.[6] A. Hyvarinen and E. Oja, “Indpendent component anlysis: Algorithmsand applications,”
Neural Networks
, vol. 13, no. 45, pp. 411–430, 2000.[7] P. Boﬁll and M. Zibulevsky, “Blind separation of more sources thanmixtures using sparsity of their shorttime Fourier transform,” in
International Workshop on Independent Component Analysis and Blind Signal Separation (ICA)
, Helsinki, Finland, June 19–22 2000, pp. 87–92.[8] M. Zibulevsky, B. Pearlmutter, P. Boﬁll, and P. Kisilev, “Blind SourceSeparation by Sparse Decompostion of a Signal Dictionary,” in
Inde pendent Component Analysis: Principles and Practice
, S. Roberts andR. Everson, Eds. Cambridge, 2001, ch. 7.[9] Y. Li, A. Cichocki, S. Amari, S. Shishkin, J. Cao, andF. Gu, “Sparse representation and its applications in blindsource separation,” in
Seventeenth Annual Conference on Neural Information Processing Systems (NIPS2003)
, Vancouver, Dec.2003. [Online]. Available: http://www.bsp.brain.riken.jp/publications/ 2003/NIPS03LiCiAmShiCaoGu.pdf [10] A. Jourjine, S. Rickard, and O. Yılmaz, “Blind separation of disjointorthogonal signals: Demixing N sources from 2 mixtures,” in
Proc. ICASSP2000, June 59, 2000, Istanbul, Turkey
, June 2000.[11] P. Boﬁll, “Underdetermined blind separation of delayed sound sourcesin the frequency domain,”
Neurocomputing
, vol. 55, pp. 627–641, 2003.[12] L. Vielva, D. Erdogmus, and J. Principe, “Underdetermined blindsource separation using a probabilistic source sparsity model,” in
2nd International Workshop on Independent Component Analysis and Blind Signal Separation
, June 2000.[13] D. Luengo, I. Santamaria, L. Vielva, and C. Pantaleh, “Underdeterminedblind separation of sparse sources with instantaneous and convolutivemixtures,” in
IEEE XIII Workshop on Neural Networks for SignalProcessing
, 2003.[14] P. O’Grady, B. Pearlmutter, and S. Rickard, “Survey of sparse and nonsparse methods in source separation,”
International Journal of ImagingSystems and Technology
, vol. 15, no. 1, 2005.[15] F. Theis and E. Lang, “Formalization of the twostep approach toovercomplete bss,” in
Proc. 4th Intern. Conf. on Signal and ImageProcessing (SIP’02) (Hawaii)
, N. Younan, Ed., 2002.[16] J. Cardoso, “Blind signal separation: Statistical principles,”
Proceedingsof IEEE, Special Issue on Blind System Identiﬁcation and Estimation
,pp. 2009–2025, October 1998.[17] D. Donoho and M. Elad, “Optimally sparse representation in general(nonorthogonal) dictionaries via
l
1
minimization,” in
Proc. Natl. Acad.Sci. USA 100 (2003), 21972202
.[18] D. Donoho, “Sparse components of images and optimal atomic decompositions,”
Constructive Approximation
, vol. 17, pp. 352–382, 2001.[19] D. Malioutov, “A sparse signal reconstruction perspective for sourcelocalization with sensor arrays,” Master’s thesis, MIT, 2003.[20] S. Winter, H. Sawada, and S. Makino, “On real and complex valued l1norm minimization for overcomplete blind source separation,” in
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
,October 2005.[21] R. Gribonval, L. Benaroya, E. Vincent, and C. Fevotte, “Proposal forperformance measurement in source separation,” in
Proceedings of 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA2003)
, april 2003, pp. 763–768.[22] S. Rickard, “Personal communication,” 2005.[23] [Online]. Available: http://medi.unioldenburg.de/demo/demoseparation.html[24] J. Anemuller and B. Kollmeier, “Adaptive separation of acoustic sourcesfor anechoic conditions: a constrained frequency domain approach,”
Speech Commun.
, vol. 39, no. 12, pp. 79–95, 2003.
646