Viewpoint invariant face recognition using 
independent component analysis and 
attractor networks 
Marian Stewart Bartlett 
University of California San Diego 
The Salk Institute 
La Jolla, CA 92037 
marni@salk.edu 
Terrence J. Sejnowski 
University of California San Diego 
Howard Hughes Medical Institute 
The Salk Institute, La Jolla, CA 92037 
terry@salk.edu 
Abstract 
We have explored two approaches to recognizing faces across 
changes in pose. First, we developed a representation of face images 
based on independent component analysis (ICA) and compared it 
to a principal component analysis (PCA) representation for face 
recognition. The ICA basis vectors for this data set were more 
spatially local than the PCA basis vectors and the ICA representa- 
tion had greater invariance to changes in pose. Second, we present 
a model for the development of viewpoint invariant responses to 
faces from visual experience in a biological system. The temporal 
continuity of natural visual experience was incorporated into an 
attractor network model by Hebbian learning following a lowpass 
temporal filter on unit activities. When combined with the tem- 
poral filter, a basic Hebbian update rule became a generalization 
of Griniasty et al. (1993), which associates temporally proximal 
input patterns into basins of attraction. The system acquired rep- 
resentations of faces that were largely independent of pose. 
I Independent component representations of faces 
Important advances in face recognition have employed forms of principal compo- 
nent analysis, which considers only second-order moments of the input (Cottrell & 
Metcalfe, 1991; Turk & Pentland 1991). Independent component analysis (ICA) 
is a generalization of principal component analysis (PCA), which decorrelates the 
higher-order moments of the input (Comon, 1994). In a task such as face recogni- 
tion, much of the important information is contained in the high-order statistics of 
the images. A representational basis in which the high-order statistics are decorre- 
lated may be more powerful for face recognition than one in which only the second 
order statistics are decorrelated, as in PCA representations. We compared an ICA- 
based representation to a PCA-based representation for recognizing faces across 
changes in pose. 
818 M. S. Bartlett and T. J. Sejnowski 
Figure 1: Examples from image set (Beymer, 1994). 
The image set contained 200 images of faces, consisting of 40 subjects at each of 
five poses (Figure 1). The images were converted to vectors and comprised the rows 
of a 200 x 3600 data matrix, X. We consider the face images in X to be a linear 
mixture of an unknown set of statistically independent source images S, where A 
is an unknown mixing matrix (Figure 2). The sources are recovered by a matrix of 
learned filters, W, which produce statistically independent outputs, U. 
S X 
Sources Unknown Face Learned 
Mixing Images Weights 
Process 
U 
Separated 
Outputs 
Figure 2: Image synthesis model. 
The weight matrix, W, was found through an unsupervised learning algorithm that 
maximizes the mutual information between the input and the output of a nonlinear 
transformation (Bell & Sejnowski, 1995). This algorithm has proven successful for 
separating randomly mixed auditory signals (the cocktail party problem), and has 
recently been applied to EEG signals (Makeig et al., 1996) and natural scenes (see 
Bell & Sejnowski, this volume). The independent component images contained in 
the rows of U are shown in Figure 3. In contrast to the principal components, all 200 
independent components were spatially local. We took as our face representation 
the rows of the matrix A - W - which provide the linear combination of source 
images in U that comprise each face image in X. 
1.1 Face Recognition Performance: ICA vs. Eigenfaces 
We compared the performance of the ICA representation to that of the PCA repre- 
sentation for recognizing faces across changes in pose. The PCA representation of a 
face consisted of its component coefficients, which was equivalent to the "Eigenface" 
Viewpoint Invariant Face Recognition 819 
Figure 3: Top: Four independent components of the image set. 
principal components. 
Bottom: First four 
representation (Turk & Pentland, 1991). A test image was recognized by assigning 
it the label of the nearest of the other 199 images in Euclidean distance. 
Classification error rates for the ICA and PCA representations and for the original 
graylevel images are presented in Table 1. For the PCA representation, the best 
performance was obtained with the 120 principal components corresponding to the 
highest eigenvalues. Dropping the first three principal components, or selecting 
ranges of intermediate components did not improve performance. The independent 
component sources were ordered by the magnitude of the weight vector, row of W, 
used to extract the source from the image.  Best performance was obtained with 
the 130 independent components with the largest weight vectors. Performance with 
the ICA representation was significantly superior to Eigenfaces by a paired t-test 
< 0.05). 
Graylevel Images 
PCA 
ICA 
Mutual Information 
.89 
.10 
.007 
Percent Correct Recognition 
.83 
.84 
.87 
Table 1: Mean mutual information between all pairs of 10 basis images, and between the 
original graylevel images. Face recognition performance is across all 200 images. 
For the task of recognizing faces across pose, a statistically independent basis set 
provided a more powerful representation for face images than a principal component 
representation in which only the second order statistics are decorrelated. 
The magnitude of the weight vector for optimally projecting the source onto the sloping 
part of the noniinearity provides a measure of the variance of the original source (Tony 
Bell, personal communication). 
820 M. S. Bartlett and T. J. Sejnowski 
2 
Unsupervised Learning of Viewpoint Invariant 
Representations of Faces in an Attractor Network 
Cells in the primate inferior temporal lobe have been reported that respond selec- 
tively to faces despite substantial changes in viewpoint (Hasselmo, Rolls, Baylis, & 
Nalwa, 1989). Some cells responded independently of viewing angle, whereas other 
cells gave intermediate responses between a viewer-centered and an object centered 
representation. This section addresses how a system can acquire such invariance to 
viewpoint from visual experience. 
During natural visual experience, different views of an object or face tend to appear 
in close temporal proximity as an animal manipulates the object or navigates around 
it, or as a face changes pose. Capturing such temporal relationships in the input 
is a way to automatically associate different views of an object without requiring 
three dimensional descriptions. 
Attractor Network 
Energy Model  
t t 2 t 3 t4 ts 
Figure 4: Model architecture. 
Hebbian learning can capture these temporal relationships in a feedforward sys- 
tem when the output unit activities are passed through a lowpass temporal filter 
(Foldiak, 1991; Wallis & Rolls, 1996). Such lowpass temporal filters have been 
related to the time course of the modifiable state of a neuron based on the open 
time of the NMDA channel for calcium influx (Rhodes, 1992). We show that 1) 
this lowpass temporal filter increases viewpoint invariance of face representations in 
a feedforward system trained with competitive Hebbian learning, and 2) when the 
input patterns to an attractor network are passed through a lowpass temporal fil- 
ter, then a basic Hebbian weight update rule associates sequentially proximal input 
patterns into the same basin of attraction. 
This simulation used a subset of 100 images from Section 1, consisting of twenty 
faces at five poses each. Images were presented to the model in sequential order as 
the subject changed pose from left to right (Figure 4). The first layer is an energy 
model related to the output of V1 complex cells (Heeger, 1991). The images were 
filtered by a set of sine and cosine Gabor filters at 4 spatial scales and 4 orientations 
at 255 spatial locations. Sine and cosine outputs were squared and summed. The 
set of V1 model outputs projected to a second layer of 70 units, grouped into two 
Iqewpoint Invariant Face Recognition 821 
inhibitory pools. The third stage of the model was an attractor network produced 
by lateral interconnections among all of the complex pattern units. The feedforward 
and lateral connections were trained successively. 
2.1 Competitive Hebbian learning of temporal relationships 
The Competitive Learning Algorithm (Rumelhart & Zipser, 1985) was extended to 
include a temporal lowpass filter on output unit activities (Bartlett & Sejnowski, 
1996). This manipulation gives the winner in the previous time steps a competitive 
advantage for winning, and therefore learning, in the current time step. 
{(( - wij ) if winner = j 
AWij -- 0.1a xj} _ wi if winner 
winner = maxj [-(t)] 
-](t) = Ay} + (1-/)-(t-1) 
(1) 
The output activity of unit j at time t, ](t), is determined by the trace, or running 
t is the weighted sum of the feedforward inputs, c 
average, of its activation, where yj 
is the learning rate, xiu is the value of input unit i for pattern u, and su is the total 
amount of input activation for pattern u. The weight to each unit was constrained 
to sum to one. This algorithm was used to train the feedforward connections. There 
was one face pattern per time step and A was set to I between individuals. 
2.2 Lateral connections in the output layer form an attractor network 
Hebbian learning of lateral interconnections, in combination with a lowpass tem- 
poral filter on the unit activities in (1), produces a learning rule that associates 
temporally proximal inputs into basins of attraction. We begin with a simple Heb- 
bian learning rule 
1 k(y  _ yO)(y _ yO) (2) 
wi= N 
t=l 
where N is the number of units, P is the number of patterns, and y0 is the mean 
activity over all of the units. Replacing y with the activity trace -/(t) defined in 
(1), substituting y0 = Ay0 + (1 - A)y  and multiplying out the terms leads to the 
following learning rule: 
- (Yi-Y)(Yj-Y ) 1 .... 
Wij- Z t 0 t 0 q-k [(y yO)(-(t-1)yO)_[_ ((t-1)yO)(yjt. yO)] 
t--1 
-{-k2 [((,-1) _ y0)((,-1) _ yO)] (3) 
where kl x_( and k2 
-- 
The first term in this equation is basic Hebbian learning, the second term associates 
pattern t with pattern t - 1, and the third term is Hebbian association of the trace 
activity for pattern t - 1. This learning rule is a generalization of an attractor 
network learning rule that has been shown to associate random input patterns 
822 M. S. Bartlett and T. J. Sejnowski 
into basins of attraction based on serial position in the input sequence (Griniasty, 
Tsodyks & Amit, 1993). The following update rule was used for the activation V 
of unit i at time t from the lateral inputs (Griniasty, Tsodyks, & Amit, 1993): 
Where 0 is a neural threshold and b(x) = I for x > 0, and 0 otherwise. In these 
simulations, 0 - 0.007, N - 70, P - 100, y0 _ 0.03, and A - 0.5 gave kl -- k2 - 1. 
2.3 Results 
Temporal association in the feedforward connections broadened the pose tuning of 
the output units (Figure 5 Left). When the lateral connections in the output layer 
were added, the attractor network acquired responses that were largely invariant to 
pose (Figure 5 Right). 
el Same Face, with Trace I [ [' Hebb plus tzace 
I 0 Same Face, no Trace I / I o Test set 
 I.   Griniasty et. al. 
I /-- Different Faces 
/1\\ 
. 
o o. 
.60  -45  _30  -15  0  15  30  45  60  -60  -45  .30  .15  0  15  30  45  60  
A Pose A Pose 
Figure 5: Left: Correlation of the outputs of the feedforward system as a function 
of change in pose. Correlations across different views of the same face (-) axe compaxed 
to correlations across different faces (--) with the temporal trace paxameter A -- 0.5 
and A = 0. Right: Correlations in sustained activity patterns in the attractor network 
as a function of change in pose. Results obtained with Equation 3 (Hebb plus trace) axe 
compared to Hebb only, and to the learning rule in Griniasty et al. (1993). Test set 
results for Equation 3 (open squaxes) were obtained by alternately training on four poses 
and testing on the fifth, and then averaging across all test cases. 
F 
5 
10 
20 
20 
N 
Attractor Network 
F/N % Correct 
70 .07 1.00 
70 .14 .90 
70 .29 .61 
160 .13 .73 
ICA 
% Correct 
.96 
.86 
.89 
.89 
Table 2: Face classification performance of the attractor network for four ratios of the 
number of desired memories, F, to the number of units, N. Results axe compaxed to ICA 
for the same subset of faces. 
Viewpoint Invariant Face Recognition 823 
Classification accuracy of the attractor network was calculated by nearest neighbor 
on the activity states (Table 2). Performance of the attractor network depends both 
on the performance of the feedforward system, which comprises its input, and on 
the ratio of the number of patterns to be encoded in memory, F, to the number of 
units, N, where each individual in the face set comprises one memory pattern. The 
attractor network performed well when this ratio was sufficiently high. The ICA 
representation also performed well, especially for N-20. 
The goal of this simulation was to begin with structured inputs similar to the re- 
sponses of V1 complex cells, nd to explore the performance of unsupervised learn- 
ing mechanisms that can transform these inputs into pose invariant responses. We 
showed that a lowpass temporal filter on unit activities, which has been related 
to the time course of the modifiable state of a neuron (Rhodes, 1992), cooperates 
with Hebbian learning to (1) increase the viewpoint invariance of responses to faces 
in a feedforward system, and (2) create basins of attraction in an attractor net- 
work which associate temporally proximal inputs. These simulations demonstrated 
that viewpoint invariant representations of complex objects such as faces can be 
developed from visual experience by accessing the temporal structure of the input. 
Acknowledgments 
This project was supported by Lawrence Livermore National Laboratory ISCR Agreement 
B291528, and by the McDonnell-Pew Center for Cognitive Neuroscience at San Diego. 
References 
Bartlett, M. Stewart,  Sejnowski, T., 1996. Unsupervised learning of invariant represen- 
tations of faces through temporal association. Computational Neuroscience: Int. Rev. 
Neurobio. Suppl. I J.M Bower, Ed., Academic Press, San Diego, CA:317-322. 
Beymet, D. 1994. Face recognition under varying pose. In Proceedings of the 1993 IEEE 
Computer Society Conference on Computer Vision and Pattern Recognition. Los Alami- 
tos, CA: IEEE Cornput. Soc. Press: 756-61. 
Bell, A.  Sejnowski, T., (1997). The independent components of natural scenes axe edge 
filters. Advances in Neural Information Processing Systems 9. 
Bell, A.,  Sejnowski, T., 1995. An information Maximization approach to blind separa- 
tion and blind deconvolution. Neural Comp. 7: 1129-1159. 
Comon, P. 1994. Independent component analysis - a new concept? Signal Processing 
36:287-314. 
Cottrell & Metcalfe, 1991. Face, gender and emotion recognition using Holons. In Ad- 
vances in Neural Information Processing Systems 3, D. Touretzky, (Ed.), Morga Kauf- 
man, San Mateo, CA: 564 - 571. 
Foldiak, P. 1991. Learning invariance from transformation sequences. Neural Comp. 3:194- 
200. 
Griniasty, M., Tsodyks, M., lz Amit, D. 1993. Conversion of temporal correlations between 
stimuli to spatial correlations between attractors. Neural Comp. 5:1-17. 
Hasselmo M. Rolls E. Baylis G. & Nalwa V. 1989. Object-centered encoding by face- 
selective neurons in the cortex in the superior temporal sulcus of the monkey. Experi- 
mental Brain Research 75(2):417-29. 
Heeger, D. (1991). Nonlinear model of neural responses in cat visual cortex. Computational 
Models of Visual Processing, M. Landy lz J. Movshon, Eds. MIT Press, Cambridge, 
MA. 
Makeig, S, Bell, A J, Jung, T-P, and Sejnowski, TJ 1996. Independent component anal- 
ysis of Electroencephalographic data, In: Advances in Neural Information Processing 
Systems 8, 145-151. 
Phodes, P. 1992. The long open time of the NMDA channel facilitates the self-organization 
of invariant object responses in cortex. Soc. Neurosci. Abst. 18:740. 
Rumelhart, D. z Zipset, D. 1985. Feature discovery by competitive learning. Cognitive 
Science 9: 75-112. 
Turk, M., & Pentland, A. 1991. Eigenfaces for Recognition. J. Cog. Neurosci. 3(1):71-86. 
Wallis, G. & Rolls, E. 1996. A model of invariant object recognition in the visual system. 
Technical Report, Oxford University Department of Experimental Psychology. 
