/**
\page new-munster Munster tutorial: (mostly) Metadynamics simulations with PLUMED

\authors Gareth Tribello and Giovanni Bussi, (but we borrowed a lot of material from other tutorials)
\date March 13, 2020

This document describes the activities that the students were supposed to work on during the PLUMED Hands-on session
at the workshop Computing Free Energy Across Disciplines From Method Development to Applications that 
would have been held in WWU Munster from the 11-13 March 2020.  You can find more information on the program of this
workshop here:

https://www.uni-muenster.de/FEW2020/index.html

The workshop was canceled due to the COVID-19 pandemic.

\section new-munster-aims Aims

This tutorial aims to train you to perform metadynamics simulations with PLUMED.
This first tutorial explains how to run and analyze a metadynamics simulation on alanine dipeptide.  Once you have completed the exercises here, you can test your understanding by working through the 
more open-ended activity detailed in \ref lugano-6b

If you have never used PLUMED, we would recommend you start by working through \ref lugano-1, which will teach you 
how to calculate collective variables using PLUMED.

\section new-munster-objectives Objectives

Once this tutorial is completed, students will be able to:

- Write a PLUMED input file to perform metadynamics simulations
- Monitor the behavior of variables in a metadynamics run
- Run various diagnostic tests on your metadynamics simulation
- Calculate an estimate of the free energy surface together with suitable errors 

We would recommend creating an ipython notebook before starting this exercise.  You can use this notebook to make notes on the methods that 
you have used and to store all the aspects of the data analysis that you have done.

\section new-munster-resources Resources

The \tarball{lugano-3} for this project contains the following files:

- diala.pdb: a PDB file for alanine dipeptide in vacuo
- topol.tpr: a GROMACS run file to perform MD of alanine dipeptide

This tutorial has been tested on version 2.6.  

\section new-munster-intro Introduction

In \ref lugano-1 we saw how PLUMED could be used to compute collective variables. Computing collective variables is a useful way to analyze our simulations, but PLUMED is most often use to add forces on the collective variables. With this in mind,
we have implemented a variety of possible biases that can act on collective variables.  The complete documentation for
all the biasing methods available in PLUMED can be found at the \ref Bias page.
In the following, we will see how to build an adaptive bias potential using metadynamics.
Before we get on to that, however, Here is a brief recap of the theory behind metadynamics.

\hidden{Summary of theory}

In metadynamics, an external, history-dependent bias potential is constructed in the space of 
a few selected degrees of freedom \f$ \vec{s}({q})\f$, generally called collective variables (CVs) \cite metad.
This potential is built as a sum of Gaussian kernels, which are deposited along the trajectory in the CVs space in 
accordance with the following formula:

\f[
V(\vec{s},t) = \sum_{ k \tau < t} W(k \tau)
\exp\left(
-\sum_{i=1}^{d} \frac{(s_i-s_i({q}(k \tau)))^2}{2\sigma_i^2}
\right).
\f]

In this expression \f$ \tau \f$ is the Gaussian deposition stride, 
\f$ \sigma_i \f$ the width of the Gaussian for the \f$i\f$th CV, and \f$ W(k \tau) \f$ the
height of the Gaussian. The effect of the metadynamics bias potential is to push the system away 
from local minima and into new regions of phase space. Furthermore, in the long
time limit, the bias potential converges to minus the free energy as a function of the CVs:

\f[
V(\vec{s},t\rightarrow \infty) = -F(\vec{s}) + C.
\f]

In standard metadynamics, Gaussian kernels of constant height are added for the entire course of a 
simulation. As a result, the system is eventually pushed to explore high free-energy regions, and the estimate of the free energy calculated from the bias potential oscillates around
the real value. 
In well-tempered metadynamics \cite Barducci:2008, the height of the Gaussian 
decreases with simulation time according to:

\f[
 W (k \tau ) = W_0 \exp \left( -\frac{V(\vec{s}({q}(k \tau)),k \tau)}{k_B\Delta T} \right ),
\f]

where \f$ W_0 \f$ is the initial Gaussian height, \f$ \Delta T \f$ an input parameter 
with the dimension of a temperature, and \f$ k_B \f$ the Boltzmann constant. 
With this rescaling of the Gaussian height, the bias potential smoothly converges in the long time limit,
but it does not fully compensate the underlying free energy.  Instead, the final bias is given by:

\f[
V(\vec{s},t\rightarrow \infty) = -\frac{\Delta T}{T+\Delta T}F(\vec{s}) + C.
\f]

Where \f$ T \f$ is the temperature of the system.
In the long time limit, the CVs thus sample an ensemble
at a temperature \f$ T+\Delta T \f$ which is higher than the system temperature \f$ T \f$.
The parameter \f$ \Delta T \f$ can be chosen so as to regulate the extent of free-energy exploration:
 \f$ \Delta T = 0\f$ corresponds to standard MD, \f$ \Delta T \rightarrow\infty\f$ to standard
metadynamics. In the literature on well-tempered metadynamics and in PLUMED, you will often encounter
the term "bias factor" which is the ratio between the temperature of the CVs (\f$ T+\Delta T \f$) 
and the system temperature (\f$ T \f$):

\f[
\gamma = \frac{T+\Delta T}{T}.
\f]

The bias factor should thus be carefully chosen as it should be large enough for the relevant free-energy barriers to be crossed
efficiently in the time scale of the simulation.
 
Additional information on this method can be found in review papers on metadynamics 
\cite gerv-laio09review \cite WCMS:WCMS31 \cite WCMS:WCMS1103 \cite bussi2015free.

\endhidden

In this tutorial, we will run simulations for a toy system, alanine dipeptide which we will simulate in vacuo using the AMBER99SB-ILDN 
force field (see Fig. \ref new-munster-ala-fig).
This rather simple molecule is a useful molecule to study in a tutorial because we can generate simulation data on it that we can then analyze quickly.
It is, however, not a good system on which to test a new free energy method as the free energy for this system can already be easily calculated using a broad range of techniques.
There is no value in demonstrating yet another way of attacking this solved problem.

\anchor new-munster-ala-fig
\image html belfast-2-ala.png "The molecule of the day: alanine dipeptide."

The free energy surface for alanine dipeptide has two minima that are separated by a high free-energy barrier.
It is normal to use to characterize the two states in terms of Ramachandran dihedral angles, which are denoted using \f$ \Phi \f$ and \f$ \Psi \f$ in Fig. \ref new-munster-transition-fig .

\anchor new-munster-transition-fig
\image html belfast-2-transition.png "The two basins in the free energy landscape of alanine dipeptide are characterized by their Ramachandran dihedral angles."


\subsection new-munster-ex-1 Exercise 1: a first metadynamics calculation

In this exercise, we will set up and perform a well-tempered metadynamics run using the backbone dihedral \f$ \phi \f$
as a collective variable. During the calculation, we will also monitor the behavior of the other backbone dihedral \f$ \psi \f$.

A sample `plumed.dat` file that you can use as a template is given below.
Whenever you see a highlighted \highlight{FILL} string, this is a string that you should replace.
In the sample input file below any text in green is a hyperlink to the documentation.  You should use the documentation to find
suitable replacements for the \highlight{FILL} strings.

\plumedfile
# vim:ft=plumed
MOLINFO STRUCTURE=diala.pdb
# Compute the backbone dihedral angle phi, defined by atoms C-N-CA-C
# you might want to use MOLINFO shortcuts
phi: TORSION ATOMS=__FILL__
# Compute the backbone dihedral angle psi, defined by atoms N-CA-C-N
psi: TORSION ATOMS=__FILL__

# Activate well-tempered metadynamics in phi
metad: __FILL__ ARG=__FILL__ ...
# Deposit a Gaussian every 500 time steps, with initial height equal to 1.2 kJ/mol
  PACE=500 HEIGHT=1.2 
# the bias factor should be wisely chosen
  BIASFACTOR=__FILL__
# Gaussian width (sigma) should be chosen based on CV fluctuation in unbiased run
  SIGMA=__FILL__
# Gaussians will be written to file and also stored on the grid
  FILE=HILLS GRID_MIN=-pi GRID_MAX=pi
...

# Print both collective variables and the value of the bias potential on COLVAR file
PRINT ARG=__FILL__ FILE=COLVAR STRIDE=10
\endplumedfile

Once your `plumed.dat` file is complete, you can run a 10-ns long metadynamics simulation with the following command
\verbatim
> gmx mdrun -s topol.tpr -nsteps 5000000 -plumed plumed.dat 
\endverbatim

During the metadynamics simulation, PLUMED will create two files, named COLVAR and HILLS.
The COLVAR file contains all the information specified by the PRINT command, in this case,
the value of the CVs every ten steps of simulation, along with the current value of the metadynamics bias potential. 
The HILLS file contains a list of the Gaussian kernels that were deposited during the simulation.

There is not much more to be said about the process of running a metadynamics simulation.  You will need to spend quite a while thinking
about what collective variables are appropriate for the particular system that you are studying.  Once you have done that though the process
of running your simulation will be very similar to the method that you have just carried out for alanine dipeptide.  Given the ease of performing these simulations, the rest of this tutorial focuses on how the output from the simulation should be analyzed.  We will first discuss some quick checks that you can perform
to test whether or not your simulations have worked.  Once we have completed this initial discussion, we will turn to how you should generate
publication-quality figures illustrating your results.

\subsection new-munster-qual-tests Qualitative tests

\subsubsection new-munster-cv-vis Visualizing the behavior of the biased CV

Once your metadynamics simulation has completed, you should plot how the values of the biased CV changed during the simulation.
We can use a python notebook to visualize the behavior of the CV during the simulation, as reported in the COLVAR file:

\code{.py}
import matplotlib.pyplot as plt
import numpy as np

cvvals = np.loadtxt("COLVAR")

plt.plot( cvvals[:,0] / 1000 , cvvals[:,1], 'b.')
plt.xlabel("Time / ns")
plt.ylabel(r'$\phi$'  " / radians" )
plt.plot() 
\endcode

\anchor new-munster-phi-fig
\image html munster-metad-phi.png "Time evolution of the metadynamics CV during the first 2 ns of a metadynamics simulation of alanine dipeptide in vacuum."

By inspecting Figure \ref new-munster-phi-fig, we can see that the system is initially in one of the two minima in the free energy surface for
alanine dipeptide. After a while (t=0.25 ns), the system is pushed
by the metadynamics bias potential to visit the other local minimum. As the simulation continues,
the bias potential fills the underlying free-energy landscape, and the system can diffuse around all of the
phase space.  This behavior is the ideal that we are looking for -- {\bf the full range of possible CV values should have been explored.}

\subsubsection new-munster-cv-vis Visualizing the hill heights

The HILLS file contains a list of the Gaussian kernels that were deposited during the simulation.
If we look at the header of this file, we can find relevant information about its content:

\verbatim
#! FIELDS time phi sigma_phi height biasf
#! SET multivariate false
#! SET min_phi -pi
#! SET max_phi pi
\endverbatim 

The line starting with FIELDS tells us what is displayed in the various columns of the HILLS file:
the simulation time, the instantaneous value of \f$ \phi \f$, the Gaussian width and height, and the bias factor. 
If we load the HILLS file into our ipython notebook and plot the data within as we plotted the data in the colvar file we 
can visualize how the heights of Gaussians decreases during the simulation, as shown below:

\anchor new-munster-phihills-fig
\image html munster-metad-phihills.png "Time evolution of the Gaussian height."

Try to reproduce this figure in your python notebooks.  Please note the commands to reproduce this figure are not identical to the commands
that we used to plot the COLVAR file.  You need to think about what we want to be plotted on the x and y-axis of the graph and to make
suitable adjustments to the script given above using what you have learned about what is plotted in the various columns of the HILLS 
file.

You should observe that the heights of the Gaussians are close to zero by the end of the simulation.  If the heights are close to zero by the end of the simulation, then all of the CV space has been explored.  Please note, however, that

\warning The fact that the Gaussian height has decreased to zero does not indicate that you metadynamics simulation has converged! 

\subsubsection new-munster-ex-2 Summing the hills

One can estimate the free energy as a function of the biased CV from the metadynamics
bias potential by using the utility \ref sum_hills.  This tool sums the Gaussian kernels
deposited during the simulation and stored in the HILLS file.  
The following command will thus calculate the free energy as a function of \f$ \phi \f$:

\verbatim
plumed sum_hills --hills HILLS
\endverbatim

The command above generates a file called `fes.dat` in which the free-energy surface as a function
of \f$ \phi \f$ is calculated on a regular grid.  You can plot this free energy surface in your python
notebook by using the commands:

\code{.py}
import matplotlib.pyplot as plt
import numpy as np

fes = np.loadtxt("fes.dat")

plt.plot( fes[:,0], fes[:,1], 'k-')
plot.show()
\endcode

The result should look like this:

\anchor new-munster-metad-phifes-fig
\image html munster-metad-phifes.png "Estimate of the free energy as a function of the dihedral phi from a 10ns-long well-tempered metadynamics simulation."

It is often useful to check how this estimate of the free energy is changing with time.  Towards the end of the simulation, once the simulation is converged,
there should be no significant changes in the shape of the free energy surface.  If you are observing substantial changes in the shape of the free energy surface throughout the simulation, this is thus indicating that you have not run your metadynamics trajectory for long enough.

To output multiple free energy surfaces, you can use the \-\-stride option for sum\_hills.  This option takes a single parameter, N, and tells plumed that the free energy should be output after each batch of N Gaussian kernels is added.   In the command below the additional option, \-\-mintozero is used to align all the profiles by setting the global minimum to zero.  Using this option makes comparing the various estimates of free energy more straightforward.

\verbatim
plumed sum_hills --hills HILLS --stride 100 --mintozero
\endverbatim

Once the command above has finished executing you should be able to use your ipython notebook to generate a figure showing all the various free energy
profiles that look like the figure given below:

\anchor new-munster-metad-phifest-fig
\image html munster-metad-phifest.png "Estimates of the free energy as a function of the dihedral phi calculated every 100 Gaussian kernels deposited."

\subsubsection new-munster-qual-sum

Once your metadynamics simulation has completed, you should thus perform the following three tests:

- Plot the value of the biased CV and check that the full range of possible CV values is explored.
- Plot the heights of the hills.  Check that this quantity has decayed to close to zero by the end of the simulation.
- Generate multiple estimates of the free energy surface using sum\_hills and check that at the end of the simulation the shape of the free energy surface is not changing substantially 

If you pass these three tests, this suggests that the simulation is most likely converged.  You can thus analyze the data correctly and produce the figures that might appear in a publication
about your result as discussed in the next but one section. 

\note IMPORTANT:  The observations above are necessary, but qualitative conditions for convergence.
A quantitative assessment of convergence can only be obtained by performing error analysis.

\subsection new-munster-block Exercise 3: Performing block averaging 

To understand the analysis that we will perform in the next section we need a brief digression on the block averaging technique that we will use to get the error bars for our estimate of the 
free energy surface.  The relevant pieces of probability theory are explained in the videos below, which you are encouraged to watch at some stage: 

@htmlonly
<table>
<tr>
<td><iframe width="560" height="315" src="https://www.youtube.com/embed/LOFnWyocr40" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></td>
<td> <iframe width="560" height="315" src="https://www.youtube.com/embed/0KqCK0yG9T0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> </td>
</tr>
</table>
@endhtmlonly

In essence, when we do a block averaging, we split the trajectory into N fixed-length blocks.  N Separate averages are calculated from these blocks, and an average and variance are then computed from these block averages.  These two quantities are then used to calculate the final estimate of the averaged quantity and the error on this quantity, respectively.  You can better understand this technique by working through the following tutorial:

@htmlonly
<iframe frameborder="0" width="100%" height="600px" src="https://repl.it/student_embed/classroom/138484/b1c05b5ed64d50f1098190481877a402"></iframe>
@endhtmlonly

In the remainder of this tutorial, we will use this technique to generate an estimate of the free energy surface that was sampled during our simulation of alanine dipeptide together with suitable error bars.  The block averaging procedure that we will use to do so is slightly more complicated than the process outlined in the exercise above as:

- The free energy surface is a function we thus have to compute multiple averages.  Error bars on each of these averages then have to be calculated separately.
- We have to calculate weighted averages because reweighting is required to compensate for the effect the bias potential has on the distribution of sampled configuration.

If you are interested, the way we resolve these two issues is explored in more detail in https://arxiv.org/abs/1812.08213 

\section new-munster-ex-4 Exercise 4: reweighting

If you write a paper that contains information on the results from a metadynamics simulation, you will likely want to plot a free energy surface.  The best way to prepare this figure is to use reweighting as:

- If you reweight, you can compute an estimate of the errors on your free energy surface.  Providing this information on the errors is important as your results are not reproducible if you do not include the errors.
- If you reweight, you can compute the free energy as a function of the CV that was biased and the free energy as a function of CVs that were not biased.

You can reweight a metadynamics trajectory using the method discussed here, the technique discussed in \cite Tiwary_jp504920s or using the method discussed in \cite Giberti_jp16100. 
None of these methods is particularly challenging to use so please, if you take home one message from this tutorial, let is be that {\bf free energy surfaces computed 
using sum\_hills should not appear in your published articles.}  

To perform the reweighting and block averaging on the alanine dipeptide data we have generated for you will need to prepare a 
`plumed_reweight.dat` file that contains the following completed input file:

\plumedfile
__FILL__ # here goes the definitions of the CVs

# Activate well-tempered metadynamics in phi
metad: __FILL__ ARG=__FILL__ ...
# Deposit a Gaussian every 500 time steps, with initial height equal to 1.2 kJ/mol
  PACE=10000000 HEIGHT=0.0 # <- this is the new stuff!
# the bias factor should be wisely chosen
  BIASFACTOR=__FILL__
# Gaussian width (sigma) should be chosen based on CV fluctuation in unbiased run
  SIGMA=__FILL__
# Gaussians will be written to file and also stored on the grid
  FILE=HILLS GRID_MIN=-pi GRID_MAX=pi
# Say that METAD should be restarting
  RESTART=YES # <- this is the new stuff!
...

aa: REWEIGHT_BIAS ARG=__FILL__

# Calculate the histogram for phi and output it to a file
phihh: HISTOGRAM ARG=phi GRID_MIN=__FILL__ GRID_MAX=__FILL__ GRID_BIN=600 BANDWIDTH=0.1 LOGWEIGHTS=__FILL__ CLEAR=__FILL__
DUMPGRID GRID=phihh FILE=phi_histogram.dat STRIDE=2500

# Calculate the histogram for phi and output it to a file
psihh: HISTOGRAM ARG=psi GRID_MIN=__FILL__ GRID_MAX=__FILL__ GRID_BIN=600 BANDWIDTH=0.1 LOGWEIGHTS=__FILL__ CLEAR=__FILL__
DUMPGRID GRID=psihh FILE=psi_histogram.dat STRIDE=2500
\endplumedfile

Once you have completed the input file, you can run the analysis using the command
\verbatim
> plumed driver --ixtc traj_comp.xtc --plumed plumed.dat --kt 2.5
\endverbatim
Notice that you have to specify the value of \f$k_BT\f$ in energy units. While running your simulation,
this information was taken from the MD code.

You must make sure that the HILLS file that was output by your metadynamics simulation is available in the directory where you run the above command.
If that condition is satisfied though you should generate some files containing histograms that will be called: analysis.0.phi_histogram.dat,
analysis.0.psi_histogram.dat, analysis.1.phi_histogram.dat, analysis.0.psi_histogram.dat, etc.  These files contain the histograms constructed from 
each of the blocks of data in your trajectory.  You can merge all the phi histograms to get the final free energy surface as a function of the phi angle, 
using the well-known relation between the histogram, \f$P(s)\f$, and the free energy surface, \f$F(s)\f$:

\f[
F(s) = - k_B T \ln P(s)
\f]

that is employed in the following python script:

\code{.py}
import matplotlib.pyplot as plt
import math
import glob
import numpy as np

# Function to read in histogram data and normalization
def readhistogram( fname ) :
    # Read in the histogram data
    data = np.loadtxt( fname )
    with open( fname, "r" ) as myfile :
         for line in myfile :
             if line.startswith("#! SET normalisation") : norm = line.split()[3]
    return float(norm), data

# Read in a grid
norm, griddata = readhistogram( "phi_histogram.dat" )
norm2 = norm*norm
# Create two np array that will be used to accumulate the average grid and the average grid squared
average = np.zeros( len(griddata[:,0]) )
average_sq = np.zeros( len(griddata[:,0]) )
average[:] = norm*griddata[:, 1]
average_sq[:] = norm*griddata[:, 1]*griddata[:, 1]

# Now sum the grids from all all the analysis files you have
for filen in glob.glob( "analysis.*.phi_histogram.dat" ) :
    tnorm, newgrid = readhistogram( filen )
    norm = norm + tnorm
    norm2 = norm2 + tnorm*tnorm
    average[:] = average[:] + tnorm*newgrid[:, 1]
    average_sq[:] = average_sq[:] + tnorm*newgrid[:, 1]*newgrid[:, 1]

# Compte the final average grid
average = average / norm
# Compute the sample variance for all grid points
variance = (average_sq / norm) - average*average
# Now multiply by bessel correction to unbias the sample variance and get the population variance
variance = ( norm /(norm-(norm2/norm)) ) * variance
# And lastly divide by number of grids and square root to get an error bar for each grid point
ngrid = 1 + len( glob.glob( "analysis.*.phi_histogram.dat" ) )
errors = np.sqrt( variance / ngrid )

plt.errorbar( griddata[:,0], -2.5*np.log(average), yerr=(2.5*errors/average) )
plt.xlabel( r'$\phi$'  " / radians" )
plt.ylabel("Free energy / kJ mol" r'$^{-1}$' )
plt.show()
\endcode

Copy this script to your python notebook and then execute it to generate a figure showing the free energy as a function of the phi angle.
Once you have done copy the script again and modify it to get the free energy as a function of the psi angle.  You should get results 
that looks something like the figures shown below.

\anchor new-munster-final-fes
\image html munster-final-free-energy.png "The final free energy shown as a function of phi and psi.  These free energies were calculated using reweighting, while error bars (the widths of the lines) were computed using block averaging."

\section new-munster-conclusions Conclusions

In this tutorial, we have discussed how metadynamics simulations should be run and how the resulting trajectories should be analyzed.
We have paid particular attention to the procedure for computing error bars on the estimates of the free energy that are obtained from the simulations.  Quoting these errors is essential as if no errors are provided, the results are not reproducible.

More information on the underlying probability theory that we have used in performing these analyzes can be found in https://arxiv.org/abs/1812.08213. 

Now that you understand how metadynamics simulations are performed see if you can work through the exercises in \ref lugano-6b

*/

link: @subpage new-munster 

description: This tutorial explains how to use PLUMED to run metadynamics simulations 

additional-files: lugano-3
