Factorizing Multivariate Function Classes 
Juan K. Lin* 
Department of Physics 
University of Chicago 
Chicago, IL 60637 
Abstract 
The mathematical framework for factorizing equivalence classes of 
multivariate functions is formulated in this paper. Independent 
component analysis is shown to be a special case of this decompo- 
sition. Using only the local geometric structure of a class repre- 
sentative, we derive an analytic solution for the factorization. We 
demonstrate the factorization solution with numerical experiments 
and present a preliminary tie to decorrelation. 
I FORMALISM 
In independent component analysis (ICA), the goal is to find an unknown linear 
coordinate system where the joint distribution function admits a factorization into 
the product of one dimensional functions. However, this decomposition is only 
rarely possible. To formalize the notion of multivariate function factorization, we 
begin by defining an equivalence relation. 
Definition. We say that two functions f, g:  -  are equivalent if there exists 
A,gand c such that: f() - cg(A+g), where A is a non-singular matrix and c  0. 
Thus, the equivalence class of a function consists of all invertible linear transfor- 
mations of it. To avoid confusion, equivalence classes will be denoted in upper 
case, and class representatives in lower case. We now define the product of two 
equivalence classes. Consider representatives b:  -  and c: 11 m - 11 of cor- 
responding equivalence classes B and C, Let  & 11 n , a & 11 m , and  - (a, a). 
From the scalar product of the two functions, define the function a: n+m _  
by a() - b()c(). Let the product of B and C be the equivalence class A with 
* Current address: E25-201, MIT, Cambridge, MA 02139. Email: jklin@ai.mit.edu 
564 J. K. Lin 
representative a(). This product is independent of the choice of representatives of 
B and C, and hence is a well defined operation on equivalence classes. We proceed 
to define the notion of an irreducible class. 
Definition. Denote the equivalence class of constants by I. We say that A is 
irreducible if A = BC implies either B - A, C -- I, or B -- I, C - A. 
From the way products of equivalence classes are defined, we know that all equiv- 
alence classes of one dimensional functions are irreducible. Our formulation of the 
factorization of multivariate function classes is now complete. Given a multivariate 
function, we seek a factorization of the equivalence class of the given representative 
into a product of irreducibles. Intuitively, in the context of joint distribution func- 
tions, the irreducible classes constitute the underlying sources. This factorization 
generalizes independent component analysis to allow for higher dimensional "vec- 
tor" sources. Consequently, this decomposition is well-defined for all multivariate 
function classes. We now present a local geometric approach to accomplishing this 
factorization. 
2 LOCAL GEOMETRIC INFORMATION 
Given that the joint distribution factorizes into a product in the "source" coordinate 
system, what information can be extracted locally from the joint distribution in a 
"mixed" coordinate frame? We assume that the relevant multivariate function is 
twice differentiable in the r%ion of interest, and denote H f, the Hessian of f, to be 
the matrix with elements Hj -- OOjf, where 0k = o 
Proposition: H f is block diagonal everywhere, OiOjfl o = 0 for all points 0 
and all i _ k, j  k, if and only if f is separable into a sum f(sx,...,sn) = 
g(s,..., s) + h(s+l,..., Sn) for some nctions g and h. 
Proof- Sciency: 
Given f(Sl,... ,s) = g(sl,...,s) + h(S+l,... ,s), 
Of _ 00h(s+,...,s,) =0 
OsiOs Osi Osj 
everywhere for all i  k, j > k. 
Necessity: 
om H[ = O, we can decompose f into 
f(, , ... ,) = O(s,..., -) + h(,..., ), 
for some functions  and . Continuing by imposing the constraints H[j = 0 for l 
j > k, we find 
f(, , ... ,) = (1,..., ) + (,... ,). 
Combining with Hj = 0 for all j > k yields 
f(, ,...,) = (,..., ) + (,... ,). 
Finally, inducting on i, from the constraints H = 0 for all i  k and j > k, we 
arrive at the desired functional form 
f(,:,...,) = g(,...,) + h(+x,..., ). 
Factorizing Multivariate Function Classes 565 
More explicitly, a twice-differentiable function satisfies the set of coupled partial 
differential equations represented by the block diagonal structure of H if and only 
if it admits the corresponding separation of variables decomposition. By letting 
logp = jr, the additive decomposition of f translates to a product decomposition of 
p. The more general decomposition into an arbitrary number of factors is obtained 
by iterative application of the above proposition. The special case of independent 
component analysis corresponds to a strict diagonalization of H. Thus, in the 
context of smooth joint distribution functions, pairwise conditional independence is 
necessary and sufficient for statistical independence. 
To use this information in a transformed "mixture" frame, we must understand how 
the matrix H 1gp transforms. From the relation between the mixture and source 
coordinate systems given by  - A', we have o _ Aji  where we use Ein- 
Osi Ox i  
stein's convention of summation over repeated indices. From the relation between 
the joint distributions in the mixture and source frames, ps(s-*) = IAIp(), direct 
differentiation gives 
0 2 log ps ( 0 2 log Pz () 
OSiOSi -- AjiAki OxjOxk 
Letting Hij 
H = ATA. 
- 2P'( and ij -- 21gp'() in matrix notation we have 
In other words, H is a second rank (symmetric) covariant tensor. 
The joint distribution admits a product decomposition in the source frame if and 
only if H and hence ATA has the corresponding block diagonal structure. Thus 
multivariate function class factorization is solved by joint block diagonalization of 
symmetric matrices, with constraints on A of the form AjijAkt - O. 
Because the Hessian is symmetric, its diagonalization involves only (n choose 2) 
constraints. Consequently, in the independent component analysis case where the 
joint distribution function admits a factorization into one dimensional functions, 
if the mixing transformation is orthogonal, the independent component coordinate 
system will lie along the eigenvector directions of . Generally however, n(n - 
1) independent constraints corresponding to information from the Hessian at two 
points are needed to determine the n arbitrary coordinate directions. 
3 NUMERICAL EXPERIMENTS 
In the simplest attack on the factorization problem, we solve the constraint equa- 
tions from two points simultaneously. The analytic solution is demonstrated in two 
Without loss of generality, the mixing matrix A is taken to be of the 
dimensions. 
form 
The constraints from the two points are: ax + b(xy + 1)+ cy = 0, and 
atx + bt(xy + 1) +cty = 0, where Hlx = a, H2x = Hi2 = b and H22 = c at the 
first point, and the primed coefficients denote the values at the second point. 
Solving the simultaneous quadratic equations, we find 
x 
atc- ac' rk v/(atc - ac') 2 - 4(a'b - abt)(btc - bc t) 
2(ab'-a'b) 
566 J. K. Lin 
Y 
The + double roots is indicative of the (x, y) - (l/y, l/x) symmetry in the equa- 
tions, and together only give two distinct orientation solutions. These independent 
component orientation solutions are given by 01 - tan -x (l/x) and 02 - tan-l(y). 
3.1 Natural Audio Sources 
To demonstrate the analytic factorization solution, we present some proof of concept 
numerics. Generality is pursued over optimization concerns. First, we perform the 
standard separation of two linearly mixed natural audio sources. The input dataset 
consists of 32000 un-ordered datapoints, since no use will be made of the temporal 
information. The process for obtaining estimates of the Hessian matrix r is as 
follows. A histogram of the input distribution was first acquired and smoothed by a 
low-pass Gaussian mask in spatial-frequency space. The elements of r were then 
obtained via convolution with a discrete approximation of the derivative operator. 
The width of the Gaussian mask and the support of the derivative operator were 
chosen to reduce sensitivity to low spatial-frequency uncertainty. It should be noted 
that the analytic factorization solution makes no assumptions about the mixing 
transformation, consequently, a blind determination of the smoothing length scale 
is not possible because of the multiplicative degree of freedom in each source. 
Because of the need to take the logarithm of p before differentiation, or equivalently 
to divide by p afterwards, we set a threshold and only extracted information from 
points where the number of counts was greater than threshold. This is justified from 
a counting uncertainty perspective, and also from the understanding that regions 
with vanishing probability measure contain no information. 
With our sample of 32000 datapoints, we considered only the bin-points with a 
corresponding bin count greater than 30. From the 394 bin locations that satisfied 
this constraint, the solutions (0, 02) for all (394 choose 2) -- (394-393/2) pairs of 
the corresponding factorization equations are plotted in Fig. 1. A histogram of these 
solutions are shown in Fig. 2. The two peaks in the solution histogram correspond 
to orientations that differ from the two actual independent component orientations 
by 0.008 and 0.013 radians. The signal to mixture ratio of the two outputs generated 
from the solution are 158 and 49. 
3.2 Effect of Noise 
Because the solution is analytic, uncertainty in the sampling just propagates through 
to the solution, giving rise to a finite width in the solution's distribution. We 
investigated the effect of noise and counting uncertainty by performing numerics 
starting from analytic forms for the source distributions. The joint distribution in 
the source frame was taken to be: 
ps(s,s2) -- (2 + sin(s))  (2 + sin(s2)). 
Normalization is irrelevant since a function's decomposition into product form is 
preserved in scalar multiplication. This is also reflected in the equivalence between 
H lgp and H lgcp for c an arbitrary positive constant. The joint distribution in 
the mixture frame was obtained from the relation pz (3) - IAl-ps(s-'*). To simulate 
Factorizing Multivariate Function Classes 567 
02 
Figure 1: Scatterplot of the independent component orientation solutions. All 
unordered solution pairs (0, 02) are plotted. The solutions are taken in the range 
from -r/2 to r/2. 
2000 
0 
....... rr/2 ....... rr/4 0 
r/4 r/2 
Figure 2: Histogram of the orientation solutions plotted in the previous figure. 
The range is still taken from -/2 to /2, with the histogram wrapped around 
to ease the circular identification. The mixing matrix used was: a - 0.0514, 
a2 - 0.779, a2 = 0.930, a2 = -0.579, giving independent component orientations 
at -0.557 and 1.505 radians. Gaussian fit to the centers of the two solution peaks 
give -0.570 :t: 0.066 and 1.513 :t: 0.077 radians for the two orientations. 
sampling, Px() was multiplied with the number of samples M, onto which was 
added Gaussian distributed noise with amplitude given by the (M pz())/2. This 
reflects the fact that counting uncertainty scales as the square root of the number 
of counts. The result was rounded to the nearest integer, with all negative count 
values set to zero. The subsequent processing coincided with that for natural audio 
sources. From the source distribution equation above, the minimum number of 
expected counts is M, and the maximum is 9M. The results in Figures 3 and 4 
show that, as expected, increasing the number of samplings decreases the widths 
of the solution peaks. By fitting Gaussians to the two peaks, we find that the 
uncertainty (peak widths) in the independent component orientations changes from 
0.06 to 0.1 radians as the sampling is decreased from for M = 20 to M = 2. So 
even with few samplings, a relatively accurate determination of the independent 
component coordinate system can be made. 
568 J. K. Lin 
14 x/2 
Figure 3: Histogram of the independent component orientation solutions for four 
different samplings. Solutions were generated from 20000 randomly chosen pairs 
of positions. The curves, from darkest to lightest, correspond to solutions for the 
noiseless, M - 20, 11 and 2 simulations. The noiseless solution histogram curve 
extends to a height of approximately 15000 counts, and is accurate to the width of 
the bin. The slight scatter is due to discretization noise. Spikes at 0 - 0 and -r/2 
correspond to pairs of positions which contain no information. 
rc/2 
rc/4 
Figure 4: The centers ad widths of the solution peaks as a function of the minimum 
expected number of counts M. From the source distribution, the maximum expected 
number of counts is 9M. Information was only extracted from regions with more 
than 2M counts. The actual independent component orientation as determined 
from the mixing matrix A are shown by the two dashed lines. The solutions are 
very accurate even for small samplings. 
4 RELATION TO DECORRELATION 
Ideally, if a mixed tensor (transforms as J = A -1 jA) with the full degrees of 
freedom can be found which is diagonal if and only if the joint distribution appears 
in product form, then the independent component coordinate directions will coincide 
with that of the tensor's eigenvectors. However, the preceding analysis shows that 
a maximum of n(n - 1)/2 constraints contain all the information that exists locally. 
This, however, provides a nice connection with decorrelation. 
Starting with the characteristic function of logp(f), 
the off diagonal terms of H 1gp are given by 
0 2 log p (:x / 
axaxj kkje 
which can loosely be seen as the second order cross-moments in (/). Thus di- 
Factorizing Multivariate Function Classes 569 
agonalization of H 1gp roughly translates into decorrelation in b(/). It should be 
noted that b(/) is not a proper distribution function. In fact, it is a complex valued 
function with b(fc) - b* (-fc). Consequently, the summation in the above equation 
is not an expectation value, and needs to be interpreted as a superposition of plane 
waves with specified wavelengths, amplitudes and phases. 
5 DISCUSSION 
The introduced functional decomposition defines a generalization of independent 
component analysis which is valid for all multivariate functions. A rigorous no- 
tion of the decomposition of a multivariate function into a set of lower dimensional 
factors is presented. With only the assumption of local twice differentiability, we 
derive an analytic solution for this factorization [1]. A new algorithm is presented, 
which in contrast to iterative non-local parametric density estimation ICA algo- 
rithms [2, 3, 4], performs the decomposition analytically using local geometric in- 
formation. The analytic nature of this approach allows for a proper treatment of 
source separation in the presence of uncertainty, while the local nature allows for a 
local determination of the source coordinate system. This leaves open the possibil- 
ity of describing a position dependent independent component coordinate system 
with local linear coordinates patches. 
The presented class factorization formalism removes the decomposition assump- 
tions needed for independent component analysis, and reinforces the well known 
fact that sources are recoverable only up to linear transformation. By modifying 
the equivalence class relation, a rich underlying algebraic structure with both mul- 
tiplication and addition can be constructed. Also, it is clear that the matrix of 
second derivatives reveals an even more general combinatorial undirected graphical 
structure of the multivariate function. These topics, as well as uniqueness issues of 
the factorization will be addressed elsewhere [5]. 
The author is grateful to Jack Cowan, David Grier and Robert Wald for many 
invaluable discussions. 
References 
[1] J. K. Lin, Local Independent Component Analysis, Ph.D. thesis, University of 
Chicago, 1997. 
[2] A. J. Bell and T. J. Sejnowski, Neural Computation 7, 1129 (1995). 
[3] 
S. Amari, A. Cichocki, and H. Yang, in Advances in Neural and Information 
Processing Systems, 8, edited by D. S. Touretzky, M. C. Mozer, and M. E. 
Hasselmo (MIT Press, Cambridge, MA, 1996), pp. 757-763. 
[4] B. A. Pearlmutter and L. Parra, in Advances in Neural and Information Pro- 
cessing Systems, 9, edited by M. C. Mozer, M. I. Jordan, and T. Petsche (MIT 
Press, Cambridge, MA, 1997), pp. 613-619. 
[5] J. K. Lin, Graphical Structure of Multivariate Functions, in preparation. 
