91 
OPTIMIZATION BY MEAN FIELD ANNEALING 
Griff' Bilbro 
ECE Dept. 
NCSU 
Raleigh, NC 27695 
Wesley. E. Snyder 
ECE Dept. 
NCSU 
Raleigh, NC 27695 
Reinhold Mann 
Eng. Physics and Math. Div. 
Oak Ridge Natl. Lab. 
Oak Ridge, TN 37831 
David E. Van den Bout 
ECE Dept. 
NCSU 
Raleigh, NC 27695 
Thomas K. Miller 
ECE Dept. 
NCSU 
Raleigh, NC 27695 
Mark White 
ECE Dept. 
NCSU 
Raleigh, NC 27695 
ABSTRACT 
Nearly optimal solutions to many combinatorial problems can be 
found using stochastic simulated annealing. This paper extends 
the concept of simulated annealing from its original formulation 
as a Markov process to a new formulation based on mean field 
theory. Mean field annealing essentially replaces the discrete de- 
grees of freedom in simulated annealing with their average values 
as computed by the mean field approximation. The net result is 
that equilibrium at a given temperature is achieved 1-2 orders of 
magnitude faster than with simulated annealing. A general frame- 
work for the mean field annealing algorithm is derived, and its re- 
lationship to Hopfield networks is shown. The behavior of MFA is 
examined both analytically and experimentally for a generic combi- 
natorial optimization problem: graph bipartitioning. This analysis 
indicates the presence of critical temperatures which could be im- 
portant in improving the performance of neural networks. 
STOCHASTIC VERSUS MEAN FIELD 
In combinatorial optimization problems, an objective function or HamiltonJan, 
H(s), is presented which depends on a vector of interacting spins, s -- {sx,..., 
in some complex nonlinear way. Stochastic simulated annealing (SSA) (S. Kirk- 
patrick, C. Gelatt, and M. Vecchi (1983)) finds a global minimum of H by com- 
bining gradient descent with a random process. This combination allows, under 
certain conditions, choices of s which actually increase H, thus providing SSA with 
a mechanism for escaping from local minima. The frequency and severity of these 
uphill moves is reduced by slowly decreasing a parameter T (often referred to as 
the temperature) such that the system settles into a global optimum. 
Two conceptual operations are involved in simulated annealing: a thermostatic op- 
eration which schedules decreases in the temperature, and a relaxation operation 
92 Bilbro, et al 
which iteratively finds the equilibrium solution at the new temperature using the 
final state of the system at the previous temperature as a starting point. In SSA, re- 
laxation occurs by randomly altering components of s with a probability determined 
by both T and the change in H caused by each such operation. This corresponds to 
probabilistic transitions in a Markov chain. In mean field annealing (MFA), some 
aspects of the optimization problem are replaced with their means or averages from 
the underlying Markov chain (e.g. s is replaced with its average, (s)). As the tem- 
perature is decreased, the MFA algorithm updates these averages based on their 
values at the previous temperature. Because computation using the means attains 
equilibrium faster than using the corresponding Markov chain, MFA relaxes to a 
solution at each temperature much faster than does SSA, which leads to an overall 
decrease in computational effort. 
In this paper, we present the MFA formulation in the context of the familiar Ising 
HamiltonJan and discuss its relationship to Hopfield neural networks. Then the 
application of MFA to the problem of graph bipartitioning is discussed, where we 
have analytically and experimentally investigated the affect of temperature on the 
behavior of MFA and observed speedups of 50:1 over SSA. 
MFA AND HOPFIELD NETWORKS 
Optimization theory, like physics, often concerns itself with systems possessing a 
large number of interacting degrees of freedom. Physicists often simplify their prob- 
lems by using the mean field approzimation: a simple analytic approximation of the 
behavior of systems of particles or spins in thermal equilibrium. In a correspond- 
ing manner, arbitrary functions can be optimized by using an analytic version of 
stochastic simulated annealing based on a technique analogous to the mean field 
approximation. The derivation of MFA presented here uses the naive mean field 
(D. J. Thouless, P.W. Anderson, and R.G. Palmer (1977)) and starts with a simple 
Ising HamiltonJan of N spins coupled by a product interaction: 
integer spins. 
Factoring H(s) shows the interaction between a spin sl and the rest of the system: 
(1) 
The mean or effective field affecting si is the average of its coefficient in (1): 
(2) 
The last part of (2) shows that, for the Ising case, the mean field can be simply 
calculated from the difference in the Hamiltonian caused by changing {si} from zero 
Optimization by Mean Field Annealing 93 
1. Initialize spin averages and add noise: sl = 1/2 + 6 Vi. 
2. Perform this relaxation step until a fixed-point is found: 
a. Select a spin average <s,) at random from 
b. Compute the mean field I,i -- h + 2 ]i# 
c. Compute the new spin average (si) -- {1 + exp 
3. Decrease T and repeat step 2 until freezing occurs. 
Figure 1. The Mean Field Annealing Algorithm 
to one while holding the other spin averages constant. By taking the Boltzmann- 
weighted average of the state values, the spin average is found to be 
(si) -- Z 
exp(-i/T) 1 
1 + exp(-i/T) 
1 + exp (i/T) ' 
(3) 
Equilibrium is established at a given temperature when equations (2) and (a) hold 
for each spin. The MFA algorithm (Figure 1) begins at a high temperature where 
this/zed-poi, t is easy to determine. The fixed-point is Zrad:ed as T is lowered by 
iterating a relaxation step which uses the spin averages to calculate a new mean 
field that is then used to update the spin averages. As the temperature is lowered, 
the optimum solution is found as the limit of this sequence of fixed-points. 
The relationship of Hopfield neural networks to MFA becomes apparent if the re- 
laxation step in Figure 1 is recast in a parallel form in which the entire mean field 
vector partially moves towards its new state, 
$ 
i, 
and then all the spin averages are updated using ';e. As 7  0, these difference 
equations become non-linear differential equations, 
: + - 
dt 
which are equivalent to the equations of motion for the Hopfield network (J. J. 
Hopaaa and D. W. Tank (1985)), 
94 Bilbro, et al 
provided we make Ci = pi = 1 and use a sigmoidal transfer function 
1 
f(uj) = 1 + exp(uj/T) ' 
Thus, the evolution of a solution in a Hopfield network is a special case of the 
relaxation toward an equilibrium state effected by the MFA algorithm at a fixed 
temperature. 
THE GRAPH BIPARTITIONING PROBLEM 
Formally, a graph consists of a set of N nodes such that nodes ni and rj are con- 
nected by an edge with weight Vq (which could be zero). The graph bipartitioning 
problem involves equally distributing the graph nodes across two bins, b0 and b, 
while minimizing the combined weight of the edges with endpoints in opposite bins. 
These two sub-objectives tend to frustrate one another in that the first goal is sat- 
isfied when the nodes are equally divided between the bins, but the second goal is 
met (trivially) by assigning all the nodes to a single bin. 
MEAN FIELD FORMULATION 
An optimal solution for the bipartitioning problem minimizes the HamiltonJan 
1 ifribx 
where sl -- 0 if ri  b0. 
In the first term, each edge attracts adjacent nodes into the same bin with a force 
proportional to its weight. Counterbalancing this attraction is r, an amorphous 
repulsive force between all of the nodes which discourages them from clustering 
together. The average spin of a node nl can be determined from its mean field: 
j#i 
EXPERIMENTAL RESULTS 
Table 1 compares the performance of the MFA algorithm of Figure i with SSA 
in terms of total optimization and computational effort for 100 trials on each of 
three example graphs. While the bipartitions found by SSA and MFA are nearly 
equivalent, MFA required as little as 2% of the number of iterations needed by SSA. 
The effect of the decrease in temperature upon the spin averages is depicted in 
Figure 2. At high temperatures the graph bipartition is maximally disordered, (i.e. 
{si)   i), but as the system is cooled past a critical temperature, To, each node 
Optimization by Mean Field Annealing 95 
TABLE 1. Comparison of SSA and MFA on Graph Bipartitioning 
Gx Gx G 
Nodes/Edg?s 83/115 100/200 1,00/400 
Solution Value (HMFA/H$$A) 0.762 1.078 1.030 
Relaxation Iterations (IMFA/I$$A) "0.187 0.063 0.019 
1.O 
0.8 
0.4 
0.::2; 
0.0 
-1. r -1.0 0 r 0.0 O.S 1.0 
log e(T) 
Fiure 2. The Effect of Decreasing Temperature on Spin Averages 
begins to move predominantly into one or the other of the two bins (as evidenced 
by the drift of the spin averages towards i or 0). The changes in the spin averages 
cause H to decrease rapidly in the vicinty of 
To analyze the effect of temperature on the spin averages, the behavior of a cluster 
C of spins is idealized with the assumptions: 
1. The repulsive force which balances the bin contents is negligible within C 
(' = 0) compared to the attractive forces arising from the graph edges; 
2. The attractive force exerted by each edge is replaced with an average attractive 
force V = i j j/E where E is the number of non-zero weighted edges; 
3. On average, each graph node is adjacent to ( = 2E/N neighboring nodes; 
4. The movement of the nodes in a cluster can be uniformly described by some 
deviation, o', such that (8) = (1 + o')/2. 
96 Bilbro, et al 
Using this model, a cluster moves according to 
/ / 
cr = tanh -o' . 
(4) 
The solution to (4) is a fixed point with r -- 0 when T is high. This fixed point 
becomes unstable and the spins diverge from 1/2 when the temperature is lowered 
to the point where 
 tanh -r > 1. 
Solving shows that Tc -- VI/2, which agrees with our experiments and is within 
of those observed in (C. Peterson and J. R. Anderson 
The point at which the nodes freeze into their respective bins can be found using 
(4) and assuming a worst-case situation in which a node is attracted by a single 
edge (i.e. ( -' 1). In this case, the spin deviation will cross an arbitrary threshold, 
crt (usually set 4-0.9), when 
Vcr 
'f = ln( + t) - ln( - t) ' 
A cooling sc,hedule is now needed which prescribes how many relaxation iterations, 
I, are required at each temperature to reach equilibrium as the system is annealed 
from Tc to Tj,. Further analysis of (4) shows that I {x [T/(T- T)]. Thus, 
more iterations are required to reach equilibrium around Tc than anywhere else, 
which agrees with observations made during our experiments..The affect of using 
fewer iterations at various temperatures was empirically studied using the following 
procedure: 
Each spin average was initialized to 1/2 and a small amount of noise was 
added to break the symmetry of the problem. 
An initial temperature Ti was imposed, and the mean field equations were 
iterated I times for each node. 
After completing the iterations at , the temperature was quenched to near 
zero and the mean field equations were again iterated I times to saturate each 
node at one or zero. 
The results of applying this procedure to one of our example graphs with different 
values of Ti and I are shown in Figure 3. Selecting an initial temperature near Tc and 
performing sufficient iterations of the mean field equations (I >_ 40 in this case) gives 
final bipartitions that are usually near-optimum, while performing an insufficient 
number of iterations (I -- 5 or I = 20) leads to poor solutions. However, even a 
large number of iterations will not compensate if Ti is set so low that the initial 
convergence causes the graph to abruptly freeze into a local minimum. The highest 
Optimization by Mean Field Annealing 97 
5.0 
4.0 
2.0 
1.O 
Fiure 3. 
POOR $OLlJT'101f$ 
1--20 
C.9 0 D $ 0 L [I'I'101I$ 
1--0 
loge(Ti) 
The Effect of Initial Temperature and Iterations on the Solution 
quality solutions are found when Ti  T, and a sufficient number of relaxations 
are performed, as shown in the traces for I = 40 and I = 90. This seems to 
perform as well as slow cooling and requires much less effort. Obviously, much of 
the structure of the optimal solution must be present after equilibrating at T,. Due 
to the equivalence we have shown between Hopfield networks and MFA, this fact 
may be useful in tuning the gains in Hopfield networks to get better performance. 
CONCLUSIONS 
The concept of mean field annealing (MFA) has been introduced and compared to 
stochastic simulated annealing (SSA) which it closely resembles in both derivation 
and implementation. In the graph bipartitioning application, we saw the level 
of optimization achieved by MFA was comparable to that achieved by SSA, but 
1-2 orders of magnitude fewer relaxation iterations were required. This speedup 
is achieved because the average values of the discrete degrees of freedom used by 
MFA relax to their equilibrium values much faster than the corresponding Markov 
chain employed in SSA. We have seen similar results when applying MFA to a other 
problems including N-way graph partitioning (D. E. Van den Bout and T. K. Miller 
III (1988)), restoration of range and luminanee images (Griff Bilbro and Wesley 
Snyder (1988)), and image halftoning (T. K. Miller III and D. E. Van den Bout 
(1989)). As was shown, the MFA algorithm can be formulated as a parallel iterative 
procedure, so it should also perform well in parallel processing environments. This 
has been verified by successfully porting MFA to a ZIP array processor, a 64-node 
98 Bilbro, et al 
NCUBE hypercube computer, and a 10-processor Sequent Balance shared-memory 
multiprocessor with near-linear speedups in each case. 
In addition to the speed advantages of MFA, the fact that the system state is 
represented by continuous variables allows the use of simple analytic techniques 
to characterize the system dynamics. The dynamics of the MFA algorithm were 
examined for the problem of graph bipartitioning, revealing the existence of a critical 
temperature, To, at which optimization begins to occur. It was also experimentally 
determined that MFA found better solutions when annealing began near T rather 
than at some lower temperature. Due to the correspondence shown between MFA 
and Hopfield networks, the critical temperature may be of use in setting the neural 
gains so that better solutions are found. 
Acknowledements 
This work was partially supported by the North Carolina State University Center for 
Communications and Signal Processing and Computer Systems Laboratory, and by 
the Office of Basic Energy Sciences, and the Office of Technology Support Programs, 
U.S. Department of Energy, under contract No. DE-AC05-84OR21400 with Martin 
Marietta Energy Systems, Inc. 
References 
Griff Bilbro and Wesley Snyder (1988) Image restoration by mean field annealing. 
In Advances in Neural Network Information Processing Systems. 
D. E. Van den Bout and T. K. Miller III (1988) Graph partitioning using an- 
nealed neural networks. Submitted to IEEE Trans. on Circuits and Systems. 
J. J. Hopfield and D. W. Tank (1985) Neural computation of decision in optimiza- 
tion problems. Biological Cybernetics, 52, 141-152. 
T. K. Miller III and D. E. Van den Bout (1989) Image halftoning by mean field 
annealing. Submitted to ICNN'89. 
S. Kirkpatrick, C. Gelart, and M. Vecchi (1983) Optimization by simulated an- 
nealing. Science, 220(4598), (571-680. 
Peterson and J. R. Anderson (1987) Neural Networks and NP-complete Opti- 
mization Problems: a Performance Study on the Graph Bisection Problem. Tech- 
nical Report MCC-EI-287-87, MCC. 
D. J. Thouless, P.W. Anderson, and R.G. Palmer (1977) Solution 
model of a spin glass'. Phil. Mag., 35(3), 593-601. 
of 'solvable 
