Semiparametric Approach to Multichannel 
Blind Deconvolution 
of Nonminimum Phase Systems 
L.-Q. Zhang, S. Amari and A. Cichocki 
Brain-style Information Systems Research Group, BSI 
The Institute of Physical and Chemical Research 
Wako shi, Saitama 351-0198, JAPAN 
zha@open.brain.riken.go.jp 
{ amari,cia } @brain.riken.go.jp 
Abstract 
In this paper we discuss the semiparametric statistical model for blind 
deconvolution. First we introduce a Lie Group to the manifold of non- 
causal FIR filters. Then blind deconvolution problem is formulated in 
the framework of a semiparametric model, and a family of estimating 
functions is derived for blind deconvolution. A natural gradient learn- 
ing algorithm is developed for training noncausal filters. Stability of the 
natural gradient algorithm is also analyzed in this framework. 
1 Introduction 
Recently blind separation/deconvolution has been recognized as an increasing important 
research area due to its rapidly growing applications in various fields, such as telecom- 
munication systems, image enhancement and biomedical signal processing. Refer to re- 
view papers [7] and [13] for details. A semiparametric statistical model treats a family 
of probability distributions specified by a finite-dimensional parameter of interest and an 
infinite-dimensional nuisance parameter [12]. Amari and Kumon [10] have proposed an 
approach to semiparametric statistical models in terms of estimating functions and eluci- 
dated their geometric structures and efficiencies by information geometry [ 1 ]. Blind source 
separation can be formulated in the framework of semiparametric statistical models. Amari 
and Cardoso [5] applied information geometry of estimating functions to blind source sep- 
aration and derived an admissible class of estimating functions which includes efficient 
estimators. They showed that the manifold of mixtures is m-curvature free, so that we 
can design algorithms of blind separation without taking much care of misspecification of 
source probability functions. 
The theory of estimating functions has also been applied to the case of instantaneous mix- 
tures, where independent source signals have unknown temporal correlations [3]. It is also 
applied to derive efficiency and superefficiency of demixing learning algorithms [4]. 
Most of these theories treat only blind source separation of instantaneous mixtures. It is 
only recently that the natural gradient approach has been proposed for multichannel blind 
364 L.-Q. Zhang, S. Amari and A. Cichocki 
deconvolution [8], [18]. The present paper extends the geometrical theory of estimating 
functions to the semiparametric model of multichannel blind deconvolution. For the limited 
space, the detailed derivations and proofs are left to a full paper. 
2 Blind Deconvolution Problem 
In this paper, as a convolutive mixing model, we consider a multichannel linear time- 
invariant (LTI) systems, with no poles on the unit circle, of the form 
x(k): E Hls(k- p)' (1) 
p:--oO 
where s(k) is an n-dimensional vector of source signals which are spatially mutu- 
ally independent and temporarily identically independently distributed, and x(k) is an 
n-dimensional sensor vector at time k, k = 1, 2,.  .. We denote the unknown mixing filter 
by H(z) = Y''lo=-oo Hp z-p. The goal of multichannel blind deconvolution is to retrieve 
source signals s(k) only using sensor signals x(k), k - 1, 2,..., and certain knowledge 
of the source signal distributions and statistics. We carry out blind deconvolution by using 
another multichannel LTI system of the form 
y(k) = W(z)x(k), (2) 
where W(z) = y''pN___ N Wpz -p, N is the length of FIR filter W(z), y(k) = 
[y (k),..., yn(k)] r is an n-dimensional vector of the outputs, which is used to estimate 
the source signals. 
When we apply W(z) to the sensor signal x(k), the global transfer function from s(k) 
to y(k) is defined by G(z) = W(z)H(z). The goal of the blind deconvolution task is 
to find W(z) such that G(z) = PAD(z), where P E R nx" is a permutation matrix, 
D(z) = diag{z-d, --. , z -d' }, and A E R "x' is a nonsingular diagonal scaling matrix. 
3 Lie Group on .M(N, N) 
In this section, we introduce a Lie group to the manifold of noncausal FIR filters. The Lie 
group operations play a crucial role in the following discussion. The set of all the noncausal 
FIR filters W(z) of length N, having the constraint that VV is nonsingular, is denoted by 
4(N,N) = {W(z) 
I W(z) = Wpz-, dot(w) # 0 , (3) 
p=-N 
where W is an N x N block matrix, 
W0 
W 
WN-1 
W-1 ''' W-N+I ] 
W 0 ... W_N+ 2 
WN-2 ''' W0 
(4) 
A//(N, N) is a manifold of dimension n2(2N + 1). In general, multiplication of two filters 
in A//(N, N) will enlarge the filter length and the result does belong to A//(N, N) anymore. 
This makes it difficult to introduce the Riemannian structure to the manifold of noncausal 
FIR filters. In order to explore possible geometrical structures of A//(N, N) which will 
lead to effective learning algorithms for W(z), we define algebraic operations of filters in 
the Lie group framework. First, we introduce a novel filter decomposition of noncausal 
filters in A//(N, N) into a product of two one-sided FIR filters [ 19], which is illustrated in 
Fig. 1. 
Blind Deconvolution of Nonminimum Phase Systems 365 
Unknown 
n(z) 
Mixing model 
R(z") 
Demixing model 
y(k) 
Figure 1: Illustration of decomposition of noncausal filters in ./4 (N, N) 
Lemma I 119] If the matrix 14] is nonsingular, any noncausal filter W(z) in JI(N,N) 
has the decomposition W(z) = R(z)L(z -x) where R(z) v 
, -- -p=oRpz-p, L(z -1) -- 
N 
Y]p=O LpzP are one-sided FIR filters. 
In the manifold J4 (N, N), Lie operations, multiplication  and inverse t, are defined 
as follows: For B(z), C(z) e .A4(N, N), 
B(z) * C(z) '- [B(z)C(Z)]N, B'(z) '- L  (z-1)l])'(z), 
(5) 
where [B(z)]v is the truncating operator that any terms with orders higher than N in the 
polynomial B(z) are truncated, and the inverse of one-side FIR filters is recurrently defined 
byP R0-, Rp p t --1 
---- '-- -- Zq=i Rp_qRqR , p = 1,..., N. Refer to [18] for the detailed 
derivation. With these operations, both B(z)  C(z) and B t (z) still remain in the manifold 
.M (N, N). It is easy to verify that the manifold .M (N, N) with the above operations forms 
a Lie Group. The identity element is E(z) = I. 
4 Semiparametric Approach to Blind Deconvolution 
We first introduce the basic theory of semiparametric models, and formulate blind decon- 
volution problem in the framework of the semiparametric models. 
4.1 Semiparametric model 
Consider a general statistical model {p(x; O, )}, where x is a random variable whose 
probability density function is specified by two parameters, 0 and , 0 being the param- 
eter of interest, and  being the nuisance parameter. When the nuisance parameter is of 
infinite dimensions or of functional degrees of freedom, the statistical model is called 
a semiparametric model [12]. The gradient vectors of the log likelihood u(a:; O, ) = 
O og v(a:;O,) 
0 log p(X;O,) V(X; O, ) O are called the score functions of the parameter 
O0 ' -- ' 
of interest or shortly O-score and the nuisance score or shortly -score, respectively. 
In the semiparametric model, it is difficult to estimate both the parameters of interest and 
nuisance parameters at the same time, since the nuisance parameter  is of infinite degrees 
of freedom. The semiparametric approach suggests to use an estimating function to es- 
timate the parameters of interest, regardless of the nuisance parameters. The estimating 
function is a vector function z(x, 0), independent of nuisance parameters , satisfying the 
following conditions 
1) Eo,[z(x,O)]=O, (6) 
_0z(x, 
2) det(lC)  O, where/C = Eo,f[ _0).]. (7) 
366 L.-Q. Zhang, S. Amari and A. Cichocki 
a) r(x, 0)] < 
(8) 
for all 0 and . Generally speaking, it is difficult to find an estimating function. Amari 
and Kawanabe [9] studied the information geometry of estimating functions and provided 
a novel approach to find all the estimating functions. In this paper, we follow the approach 
to find a family of estimating functions for bind deconvolution. 
4.2 Semiparametric Formulation for Blind Deconvolution 
Now we turn to formulate the blind deconvolution problem in the framework of semipara- 
metric models. From the statistical point of view, the blind deconvolution problem is to 
estimate H(z) or H-(z) from the observed data DL = {x(k), k = 1,2,...}. The es- 
timate includes two unknowns: One is the mixing filter H(z) which is the parameter of 
interest, and the other is the probability density function p(s) of sources, which is the nui- 
sance parameter in the present case. Fo blind deconvolution problem, we usually assume 
that source signals are zero-mean, E[$il '- 0, for i = 1,..., n. In addition, we generally 
impose constraints on the recovered signals to remove the indeterminacy, 
E[ki(si)] = 0, for i = 1,...,n. (9) 
4 _ 1. Since the source signals are spatially 
A typical example of the constraint is ki ($i) '- $i 
mutually independent and temporally iid, the pdf r(s) can be factorized into a product form 
/'(S) : I-Iin=i I'($i). The purpose of this paper is to find a family of estimating functions 
for blind deconvolution. Remarkable progress has been made recently in the theory of the 
semiparametric approach [9],[12]. It has been shown that the efficient score itself is an 
estimating function for blind separation. 
5 Estimating Functions 
In this section, we give an explicit form of the score function matrix of interest and the 
nuisance tangent space, by using a local nonholonomic reparameterization. We then derive 
a family of estimating functions from it. 
5.1 Score function matrix and its representation 
Since the mixing model is a matrix filter, we write an estimating function in the same matrix 
filter format 
N 
F(x;H(z))-- E Flvx;U(z))z-P' (10) 
p=--N 
where Fp(x; H(z)) are n x n-matrices. In order to derive the explicit form of the H-score, 
we reparameterize the filter in a small neighborhood of H (z) by using a new variable matrix 
filter as H(z)  (I - X(z)), where I is the identity element of the manifold JM(N, N). 
The variation X(z) represents a local coordinate system at the neighborhood ./Vii of H(z) 
on the manifold JM(N, N). The variation dH(z) of H(z) is represented as dH(z) = 
-H(z)  dX(z). Letting W(z) = Ht(z), we have 
= 0arV(z)  wt(z), (11) 
which is a nonholonomic differential variable [6] since (11) is not integrable. With this 
representation of the parameters, we can obtain learning algorithms having the equivariant 
property [14] since the deviation dX(z) is independent of a specific H(z). The relative or 
the natural gradient of a cost function on the manifold can be automatically derived from 
this representation [2], [ 14], [ 18]. 
Blind Deconvolution of Nonminimum Phase Systems 367 
u i 
Figure 2: Illustration of orthogonal decomposition of score functions 
The derivative of any cost function/(H(z)) with respect to a noncausal filter X(z) = 
v X z-V is defined by 
p=--N P 
ot(}I(z)) N Ot(H(z)) 
0x() - 
z -p (12) 
Now we can easily calculate the score function matrix of noncausal filter X(z), 
Ologp(x;H(z),r) v 
OX(z) = E qo(y)yT(k-- P)z-P' 
(13) 
where p(y) = (qoi(ys),-.-, qon(Yn)) T, Ti(Yi) '- dlogr,(y0 and y -- Ht(z)x. 
dy ' 
5.2 Efficient scores 
The efficient scores, denoted by UZ(s; H(z), r), can be obtained by projecting the score 
function to the space orthogonal to the nuisance tangent space v 
T(z),r, which is illustrated 
in figure 2. In this section, we give an explicit form of the efficient scores for blind decon- 
volution. 
Lemma 2 [5] The tangent nuisance space N 
7"I'(z),r is a linear space spanned by the nui- 
sance score functions, denoted by N 
Tl(z),r = {-'=l cii(si)}, where ci are coefficients, 
and as ( ss ) are arbitrary functions, satisfying the following conditions 
[,s(ss) ] < oo, [a,,(ss)] = o, [(s),(s)] = o. (14) 
We rewrite the score function (13) into the form U(s; H(z), r) N 
= Y.p=-v Up z-p, where 
Up - (T(ss(k))sj(k - P))nxn. 
Lemma 3 The off-diagonal elements uo,o (s; H(z), r), i  j, and the delay elements 
Up,O(s; H(z), r), p  O, of the score functions are orthogonal to the nuisance tangent 
space r 
Ti(:),,.. 
Lemma 4 The projection of uo,si to the space orthogonal to the nuisance tangent space 
N 
7'(z),r is ofthe form w(si) '- co + clss + c2k(si), where cs are any constants. 
368 L.-Q. Zhang, S. Amari and A. Cichocki 
In summary we have the following theorem 
Theorem 1 The efficient score, U m (s; H(z), r) N 
: p=-l Up Ez-p, is given by 
U = o(s)sr(k-p), forp0; (15) 
U0  = { qo(s)s r, for off diagonal elements, 
co + Cl S + c2 k ( s ) , for diagonal elements. (16) 
For the instantaneous mixture case, it has been proven [9] that the semiparametric model 
for blind separation is information m-curvature free. This is also true in the multichannel 
blind deconvolution case. As a result, the efficient score function is an estimating function 
for blind deconvolution. Using this result, we easily derive a family of estimating functions 
for blind deconvolution 
N 
F(x(k);W(z)) =  qo(y(k))y(k - p)Tg-p _ I, (17) 
p:-N 
where y(k) = W(z)x(k), and qo is a given function vector. The estimating function is the 
efficient score function, when co = Cl = 0, c2 = I and ki(si) = i(si)si - 1. 
6 Natural Gradient Learning and its Stability 
Ordinary stochastic gradient methods for parameterized systems suffer from slow conver- 
gence due to the statistical correlations of the processes signals. While quasi-Newton and 
related methods can be used to improve convergence, they also suffer from the mass com- 
putation and numerical instability, as well as local convergence. 
The natural gradient approach was developed to overcome the drawback of the ordinary 
gradient algorithm in the Riemannian spaces [2, 8, 15]. It has been proven that the natural 
gradient algorithm is an efficient algorithm in blind separation and blind deconvolution [2]. 
The efficient score function ( the estimating function ) gives an efficient search direction 
for updating filter X(z). Therefore, the updating rule for X(z) is described by 
Xk+l(Z) - Xk(z) - r/F(x(k), We(z)), (18) 
where r/is a learning rate. Since the new parameterization X(z) is defined by a nonholo- 
nomic transformation dX(z) = dW(z)  W t (z), the deviation of W(z) is given by 
AW(z) -- AX(z)  W(z). (19) 
Hence, the natural gradient learning algorithm for W(z) is described as 
Wk+ 1 (Z) -- Wk(z ) --]F(x(k), Wk (z)), Wk(Z), (20) 
where F(x, W(z)) is an estimating function in the form (17). The stability of the algorithm 
(20) is equivalent to the one of algorithm (18). Consider the averaged version of algorithm 
(18) 
/xx(z) = -vE[r(x(k), (21) 
Analyzing the variational equation of the above equation and using the mutual indepen- 
dence and i.i.d. properties of source signals, we derive the stability conditions of learning 
algorithm (21) at vicinity of the true solution 
2 2 
mi q- 1 > O, ni > O, ija i aj > 1, (22) 
, E ' 2 
fori,j = 1,...,n, wheremi: E(qo'(yi(k))y(k)] ti = [i(Yi)], ai = E[Iy12]. 
Therefore, we have the following theorem: 
Theorem 2 If the conditions (22) are satisfied, then the natural gradient learning algo- 
rithm (20) is locally stable. 
Blind Deconvolution of Nonminimum Phase Systems 369 
References 
[11 
[21 
[31 
[41 
[51 
[6] 
[71 
[8] 
[91 
[10] 
[11] 
[12] 
[13] 
[14] 
[15] 
[16] 
[17] 
[18] 
[19] 
S. Amari. Differential-geometrical methods in statistics, Lecture Notes in Statistics, 
volume 28. Springer, Berlin, 1985. 
S. Amari. Natural gradient works efficiently in learning. Neural Computation, 
10:251-276, 1998. 
S. Amari. ICA of temporally correlated signals - Learning algorithm. In Proceeding 
of 1st Inter. Workshop on Independent Component Analysis and Signal Separation, 
pages 37-42, Aussois, France, January, 11-15 1999. 
S. Amari. Superefficiency in blind source separation. IEEE Trans. on Signal Process- 
ing, 47(4):936-944, April 1999. 
S. Amari and J.-F. Cardoso. Blind source separation- semiparametric statistical ap- 
proach. IEEE Trans. Signal Processing, 45:2692-2700, Nov. 1997. 
S. Amari, T. Chen, and A. Cichocki. Nonholonomic orthogonal constraints in blind 
source separation. Neural Cornput., to be published. 
S. Amari and A. Cichocki. Adaptive blind signal processing- neural network ap- 
proaches. Proceedings of the IEEE, 86(10):2026-2048, 1998. 
S. Amari, S. Douglas, A. Cichocki, and H. Yang. Multichannel blind deconvolution 
and equalization using the natural gradient. In Proc. IEEE Workshop on Signal Pro- 
cessing Adv. in Wireless Communications, pages 101-104, Paris, France, April 1997. 
S. Amari and M. Kawanabe. Estimating functions in semiparametric statistical mod- 
els. In I. V. Basawa, V.P. Godambe, and R.L. Taylor, editors, Estimating Functions, 
volume 32 of Monograph Series, pages 65-81. IMS, 1998. 
S. Amari and M. Kumon. Estimation in the presence of infinitely many nuisance 
parameters in semiparametric statistical models. Ann. Statistics, 16:1044-1068, 1988. 
A.J. Bell and T.J. Sejnowski. An information maximization approach to blind sepa- 
ration and blind deconvolution. Neural Computation, 7:1129-1159, 1995. 
P. Bickel, C. Klaassen, Y. Ritov, and J. Wellner. Efficient and Adaptive Estimation 
for Semiparametric Models. The Johns Hopkins Univ. Press, Baltimore and London, 
1993. 
J.-F Cardoso. Blind signal separation: Statistical principles. Proceedings of the IEEE, 
86(10):2009-2025, 1998. 
J.-F. Cardoso and B. Laheld. Equivariant adaptive source separation. IEEE Trans. 
Signal Processing, SP-43:3017-3029, Dec 1996. 
A. Cichocki and R. Unbehauen. Robust neural networks with on-line learning for 
blind identification and blind separation of sources. IEEE Trans Circuits and Systems 
I: Fundamentals Theory and Applications, 43(11): 894-906, 1996. 
L. Tong, R.W. Liu, V.C. Soon, and Y.F. Huang. Indeterminacy and identifiability of 
blind identification. IEEE Trans. Circuits, Syst., 38(5):499-509, May 1991. 
H. Yang and S. Amari. Adaptive on-line learning algorithms for blind separation: 
Maximum entropy and minimal mutual information. Neural Cornput., 9:1457-1482, 
1997. 
L. Zhang, A. Cichocki, and S. Amari. Geometrical structures of FIR manifold and 
their application to multichannel blind deconvolution. In Proceeding of NNSP'99, 
pages 303-312, Madison, Wisconsin, August 23-25 1999. 
L. Zhang, A. Cichocki, and S. Amari. Multichannel blind deconvolution of non- 
minimum phase systems using information backpropagation. In Proceedings of the 
Fifth International Conference on Neural Information Processing(ICONIP'99), page 
210-216, Perth, Australia, Nov. 16-20 1999. 
