Scale Mixtures of Gaussians and the 
Statistics of Natural Images 
Martin J. Wainwright 
Stochastic Systems Group 
Electrical Engineering & CS 
MIT, Building 35-425 
Cambridge, MA 02139 
mjwain@mit. edu 
Eero P. Simoncelli 
Ctr. for Neural Science, and 
Courant Inst. of Mathematical Sciences 
New York University 
New York, NY 10012 
eero. simoncel li @nyu. edu 
Abstract 
The statistics of photographic images, when represented using 
multiscale (wavelet) bases, exhibit two striking types of non- 
Gaussian behavior. First, the marginal densities of the coefficients 
have extended heavy tails. Second, the joint densities exhibit vari- 
ance dependencies not captured by second-order models. We ex- 
amine properties of the class of Gaussian scale mixtures, and show 
that these densities can accurately characterize both the marginal 
and joint distributions of natural image wavelet coefficients. This 
class of model suggests a Markov structure, in which wavelet coeffi- 
cients are linked by hidden scaling variables corresponding to local 
image structure. We derive an estimator for these hidden variables, 
and show that a nonlinear "normalization" procedure can be used 
to Gaussianize the coefficients. 
Recent years have witnessed a surge of interest in modeling the statistics of natural 
images. Such models are important for applications in image processing and com- 
puter vision, where many techniques rely (either implicitly or explicitly) on a prior 
density. A number of empirical studies have demonstrated that the power spectra 
of natural images follow a 1If v law in radial frequency, where the exponent  is 
typically close to two [e.g., 1]. Such second-order characterization is inadequate, 
however, because images usually exhibit highly non-Gaussian behavior. For in- 
stance, the marginals of wavelet coefficients typically have much heavier tails than 
a Gaussian [2]. Furthermore, despite being approximately decorrelated (as sug- 
gested by theoretical analysis of 1If processes [3]), orthonormal wavelet coefficients 
exhibit striking forms of statistical dependency [4, 5]. In particular, the standard 
deviation of a wavelet coefficient typically scales with the absolute values of its 
neighbors [5]. 
A number of researchers have modeled the marginal distributions of wavelet coef- 
ficients with generalized Laplacians, p(y) c exp(-ly/AI p) [e.g. 6, 7, 8]. Special 
cases include the Gaussian (p - 2) and the Laplacian (p - 1), but appropriate ex- 
Research supported by NSERC 1969 fellowship 160833 to MJW, and NSF CAREER grant 
MIP-9796040 to EPS. 
856 M. J. Wainwright and E. P Sirnoncelli 
Mixing density 
Positive, V/ - stable 
No explicit form 
GSM density GSM char. function 
symmetrized Gamma 
Student: 
, 1 
[1/(), + > 
a-stable 
generalized Laplacian: 
exp(-[y/XlP), pe (0,2] 
t2 
(l+2-X-  , >0 
No explicit form 
exp (-Itla), a e (0,2] 
No explicit form 
Table 1. Example densities from the class of Gaussian scale mixtures. Z(7 ) de- 
notes a positive gamma variable, with density p(z)= [1/F(y)]z *- exp(-z). 
The characteristic function of a random variable x is defined as 
opt(t) -- f_oo p(x) exp (jxt) dx. 
ponents for natural images are typically less than one. Simoncelli [5, 9] has modeled 
the variance dependencies of pairs of wavelet coefficients. Romberg et al. [10] have 
modeled wavelet densities using two-component mixtures of Gaussians. Huang and 
Mumford [11] have modeled marginal densities and cross-sections of joint densities 
with multi-dimensional generalized Laplacians. 
In the following sections, we explore the semi-parametric class of Gaussian scale 
mixtures. We show that members of this class satisfy the dual requirements of 
being heavy-tailed, and exhibiting multiplicative scaling between coefficients. We 
also show that a particular member of this class, in which the multiplier variables 
are distributed according to a gamma density, captures the range of joint statistical 
behaviors seen in wavelet coefficients of natural images. We derive an estimator for 
the multipliers, and show that a nonlinear "normalization" procedure can be used 
to Gaussianize the wavelet coefficients. Lastly, we form random cascades by linking 
the multipliers on a multiresolution tree. 
I Scale Mixtures of Gaussians 
d d 
A random vector Y is a Gaussian scale mixture (GSM) if Y - zU, where = denotes 
equality in distribution; z > 0 is a scalar random variable; U ,- iV'(0, Q) is a 
Gaussian random vector; and z and U are independent. 
As a consequence, any GSM variable has a density given by an integral: 
/_ 1 ( YTo-Y'  
PY(Y) = oo Iz2QI1/2 exp  / c)(z)dz. 
where bz is the probability density of the mixing variable z (henceforth the mul- 
tiplier). A special case of a GSM is a finite mixture of Gaussians, where z is a 
discrete random variable. More generally, it is straightforward to provide condi- 
tions on either the density [12] or characteristic function of X that ensure it is a 
GSM, but these conditions do not necessarily provide an explicit form of bz. Nev- 
ertheless, a number of well-known distributions may be written as Gaussian scale 
mixtures. For the scalar case, a few of these densities, along with their associated 
characteristic functions, are listed in Table 1. Each variable is characterized by a 
scale parameter A, and a tail parameter. All of the GSM models listed in Table 1 
produce heavy-tailed marginal and variance-scaling joint densities. 
Scale Mixtures of Gaussians and the Statistics of Natural Images 857 
baboon boats flower frog 
-500 0 5430 
_2[ 
-500 0 500 1000 -1000 -500 0 500 
['7, ,'k2] = [0.97, 15.04] 
AH/H = 0.00079 
[0.45, 13.77] [0.78, 26.83] [0.80, 15.39] 
0.0030 0.0030 0.0076 
Figure 1. GSMs (dashed lines) fitted to empirical histograms (solid lines). Below 
each plot are the parameter values, and the relative entropy between the histogram 
(with 256 bins) and the model, as a fraction of the histogram entropy. 
2 Modeling Natural Images 
As mentioned in the introduction, natural images exhibit striking non-Gaussian 
behavior, both in their marginal and joint statistics. In this section, we show that 
this behavior is consistent with a GSM, using the first of the densities given in 
Table I for illustration. 
2.1 Marginal distributions 
We begin by examining the symmetrized Gamma class as a model for marginal 
distributions of wavelet coefficients. Figure I shows empirical histograms of a par- 
ticular wavelet subband 1 for four different natural images, along with the best fitting 
instance of the symmetrized Gamma distribution. Fitting was performed by min- 
imizing the relative entropy (i.e., the Kullback-Leibler divergence, denoted AH) 
between empirical and theoretical histograms. In general, the fits are quite good: 
the fourth plot shows one of the worst fits in our data set. 
2.2 Normalized components 
For a GSM random vector Y __.a zU, the normalized variable Y/z formed by 
component-wise division is Gaussian-distributed. In order to test this behavior 
empirically, we model a given wavelet coefficient Y0 and a collection of neighbors 
{Yl,.-  , YN) as a GSM vector. For our examples, we use a neighborhood of N = 11 
coefficients corresponding to basis functions at 4 adjacent positions, 5 orientations, 
and 2 scales. Although the multiplier z is unknown, we can estimate it by max- 
imizing the log likelihood of the observed coefficients:  _a argmaxz  logp(Y]z)). 
Under reasonable conditions, the normalized quantity Y/ should converge in dis- 
tribution to a Gaussian as the number of neighbors increases. The estimate  is 
simple to derive: 
 = argmax{logp(Y[z)} 
z 
= argmin {Nlog(z) + YrQ-1y/2z2} 
z 
= v/YrQ-1y/N, 
1We use the steerable pyramid, an overcomplete multiscale representation described 
in [13]. The marginal and joint statistics of other multiscale oriented representations are 
similar. 
858 M. d. Wainwright and E. P. Simoncelli 
baboon 
boats 
flowers frog 
o 
AH/H = 0.00035 
0.00041 
o 
0.00042 
0.00043 
Figure 2. Marginal log histograms (solid lines) of the normalized coefficient v for a 
single subband of four natural images. Each shape is close to an inverted parabola, 
in agreement with Gaussians (dashed lines) of equivalent empirical variance. Below 
each plot is the relative entropy between the histogram (with 256 bins) and a 
variance-matched Gaussian, as a fraction of the total histogram entropy. 
where Q __a E [UU T] is the positive definite covariance matrix of the underlying 
Gaussian vector U. 
Given the estimate , we then compute the normalized coefficient y  Yo/. This is 
a generalization of the variance normalization proposed by Ruderman and Bialek[1], 
and the weighted sum of squares normalization procedure used by Simoncelli [5, 14]. 
Figure 2 shows the marginal histograms (in the log domain) of this normalized 
coefficient for four natural images, along with Gaussians of equal empirical variance. 
In contrast to histograms of the raw coefficients (shown in Figure 1), the histograms 
of normalized coefficients are nearly Gaussian. 
The GSM model makes a stronger prediction: that normalized quantities corre- 
sponding to nearby wavelet pairs should be jointly Gaussian. Specifically, a pair 
of normalized coefficients should be either correlated or uncorrelated Gaussians, 
depending on whether the underlying Gaussians U = Jul u2] T are correlated or un- 
correlated. We examine this prediction by collecting joint conditional histograms of 
normalized coefficients. The top row of Figure 3 shows joint conditional histograms 
for raw wavelet coefficients (taken from the same four natural images as Figure 2). 
The first two columns correspond to adjacent spatial scales; though decorrelated, 
they exhibit the familiar form of multiplicative scaling. The latter two columns cor- 
respond to adjacent orientations; in addition to being correlated, they also exhibit 
the multiplicative form of dependency. 
The bottom row shows the same joint conditional histograms, after the coefficients 
have been normalized. Whereas Figure 2 demonstrates that normalized coefficients 
are close to marginally Gaussian, Figure 3 demonstrates that they are also approx- 
imately jointly Gaussian. These observations support the use of a Gaussian scale 
mixture for modeling natural images. 
2.3 Joint distributions 
The GSM model is a reasonable approximation for groups of nearby wavelet coef- 
ficients. However, the components of GSM vectors are highly dependent, whereas 
the dependency between wavelet coefficients decreases as (for example) their spa- 
tial separation increases. Consequently, the simple GSM model is inadequate for 
global modeling of coefficients. We are thus led to use a graphical model (such as 
tree) that specifies probabilistic relations between the multipliers. The wavelet co- 
efficients themselves are considered observations, and are linked indirectly by their 
shared dependency on the (hidden) multipliers. 
Scale Mixtures of Gaussians and the Statists'cs of Natural Images 859 
baboon boats flowers frog 
-500 0 500 -500 0 500 -500 0 500 -500 0 500 
-25 0 25 -25 0 25 -25 0 25 -25 0 
AH/H = 0.0643 0.0743 0.0572 0.0836 
25 
Figure 3. Top row: joint conditional histograms of raw wavelet coefficients for 
four natural images. Bottom row: joint conditional histograms of normalized pairs 
of coefficients. Below each plot is the relative entropy between the joint histogram 
(with 256 x 256 bins) and a covaxiance-matched Gaussian, as a fraction of the total 
histogram entropy. 
For concreteness, we model the wavelet coefficient at node s as y(s) 
where x(s) is Gaussian, so that z - ]]xl] is the square root of a gamma variable of 
index 0.5. For illustration, we assume that the multipliers are linked by a multiscale 
autoregressive (MAR) process [15] on a tree: 
x(8) = x(p)) + x/T- 
where p(s) is the parent of node s. Two wavelet coefficients y(s) and y(t) are linked 
through the multiplier at their common ancestral node denoted s A t. In particular, 
the joint distributions are given by 
- *^*) x(, A ,) + (*)11 u(,) 
where v, v2 are independent white noise processes; and d(, ) denotes the distance 
between a node d one of its ancestors on the tree (e.g., d(s,p(s)) = 1). For 
nodes s and t at the same scale and oriemation but spatially separated by a dis- 
tance of A(s, t), the distance between s and the coon ancestor s A t ows 
d(s,s  t)  gog2(A(s, t)) + 1]. 
The first row of Figure 4 shows the range of behaviors seen in joint distributions 
taken from a wavelet subband of a particul natural image, compared to simulated 
GSM gamma distributions with  = 0.92. The first column corresponds to a pair 
of wavelet filters in quadrature phe (i.e., related by a Hilbert transform). Note 
that for this pair of coefficients, the contours are nearly circular, an observation 
that h been previously made by Zetzsche [4]. Nevertheless, these two coefficients 
e dependent,  shown by the multiplicative scaling in the conditional histo,am 
of the third row. This type of scaling dependency h been extensivdy documemed 
by Simoncelli [5, 9]. Analogous plots for the simulated Gamma model, with zero 
spatial separation are shown in rows 2 and 4. As in the image data, the contours of 
the joint density are very close to circular, and the conditional distribution shows 
a string variance dependency. 
860 M. J. Wainwright and E. P. Simoncelli 
image 
data 
simulated 
model 
image 
data 
quad. pair overlapping near distant 
simulated 
model 
Figure 4. Examples of empirically observed distributions of wavelet coefficients, 
compared with simulated distributions from the GSM gamma model. First row: 
Empirical joint histograms for the "mountain" image, for four pairs of wavelet coef- 
ficients, corresponding to basis functions with spatial separations A -- (0, 4, 8, 128). 
Second row: Simulated joint distributions for Gamma variables with p -- 0.92 and 
the same spatial separations. Contour lines are drawn at equal intervals of log 
probability. Third row: Empirical conditional histograms for the "mountain" im- 
age. Fourth row: Simulated conditional histograms for Gamma variables. For 
these conditional distributions, intensity corresponds to probability, except that 
each column has been independently rescaled to fill the full range of intensities. 
The remaining three columns of figure 4 show pairs of coefficients drawn from iden- 
tical wavelet filters at spatial displacements A = (4, 8, 128), corresponding to a 
pair of overlapping filters, a pair of nearby filters, and a distant pair. Note the pro- 
gression in the contour shapes from off-circular, to a diamond shape, to a concave 
"star" shape. The model distributions behave similarly, and show the same range of 
contours for simulated pairs of coefficients. Thus, consistent with empirical obser- 
vations, a GSM model can produce a range of dependency between pairs of wavelet 
coefficients. Again, the marginal histograms retain the same form throughout this 
range. 
3 Conclusions 
We have proposed the class of Gaussian scale mixtures for modeling natural images. 
Models in this class typically exhibit heavy-tailed marginals, as well as multiplicative 
scaling between adjacent coefficients. We have demonstrated that a particular GSM 
(the symmetrized Gamma family) accounts well for both the marginal and joint 
distributions of wavelet coefficients from natural images. More importantly, this 
model suggests a hidden Markov structure for natural images, in which wavelet 
coefficients are linked by hidden multipliers. Romberg et al. [10] have made a related 
proposal using two-state discrete multipliers, corresponding to a finite mixture of 
Gaussians. 
Scale Mixtures of Gaussians and the Statistics of Natural Images 861 
We have demonstrated that the hidden multipliers can be locally estimated from 
measurements of wavelet coefficients. Thus, by conditioning on fixed values of the 
multipliers, estimation problems may be reduced to the classical Gaussian case. 
Moreover, we described how to link the multipliers on a multiresolution tree, and 
showed that such a random cascade model accounts well for the drop-off in de- 
pendence of spatially separated coefficients. We are currently exploring EM-like 
algorithms for the problem of dual parameter and state estimation. 
Acknowledgements 
We thank Bill Freeman, David Mumford, Mike Schneider, Ilya Pollak, and Alan 
Willsky for helpful discussions. 
References 
[1] D. L. Ruderman and W. Bialek. Statistics of natural images: Scaling in the woods. 
Phys. Rev. Letters, 73(6):814-817, 1994. 
[2] D. J. Field. Relations between the statistics of natural images and the response 
properties of cortical cells. J. Opt. Soc. Am. A, 4(12):2379-2394, 1987. 
[3] A. H. Tewfik and M. Kim. Correlation structure of the discrete wavelet coefficients 
of fractional Brownian motion. IEEE Trans. Info. Theory, 38:904-909, Mar. 1992. 
[4] C. Zetzsche, B. Wegmann, and E. Barth. Nonlinear aspects of primary vision: Entropy 
reduction beyond decorrelation. In Int 'l Syrup. Soc. for Info. Display, volume 24, pages 
933-936, 1993. 
[5] E. P. Simoncelli. Statistical models for images: Compression, restoration and synthe- 
sis. In 31st Asilomar Conf., pages 673-678, Nov. 1997. 
[6] S. G. Mallat. A theory for multiresolution signal decomposition: the wavelet repre- 
sentation. IEEE Pat. Anal. Mach. Intell., 11:674-693, July 1989. 
[7] E. P. Simoncelli and E. H. Adelson. Noise removal via Bayesian wavelet coring. In 
Proc. IEEE ICIP, volume I, pages 379-382, September 1996. 
[8] P. Moulin and J. Liu. Analysis of multiresolution image denoising schemes using a 
generalized Gaussian and complexity priors. IEEE Trans. Info. Theory, 45:909-919, 
Apr. 1999. 
[9] R. W. Buccigrossi and E. P. Simoncelli. Image compression via joint statistical char- 
acterization in the wavelet domain. IEEE Trans. Image. Proc., 8(12):1688-1701, Dec. 
1999. 
[10] J.K. Romberg, H. Choi, and R.G. Baraniuk. Bayesian wavelet domain image modeling 
using hidden Markov trees. In Proc. IEEE ICIP, Kobe, Japan, Oct. 1999. 
[11] J. Huang and D. Mumford. Statistics of natural images and models. In CVPR, paper 
216, 1999. 
[12] D.F. Andrews and C.L. Mallows. Scale mixtures of normal distributions. J. Royal 
Star. Soc., 36:99-102, 1974. 
[13] E. P. Simoncelli and W. T. Freeman. The steerable pyramid: A flexible architecture 
for multi-scale derivative computation. In Proc. IEEE ICIP, volume III, pages 444- 
447, Oct. 1995. 
[14] E. P. Simoncelli and O. Schwartz. Image statistics and cortical normalization models. 
In M. S. Kearns, S. A. Solla, and D. A. Cobh, editors, Adv. Neural Information 
Processing Systems, volume 11, pages 153-159, Cambridge, MA, May 1999. 
[15] K. Chou, A. Willsky, and R. Nikoukhah. Multiscale systems, Kalman filters, and 
Riccati equations. IEEE Trans. Automatic Control, 39(3):479-492, Mar. 1994. 
