\documentclass{article} \usepackage{natbib} \usepackage{graphics} \usepackage{amsmath} \usepackage{indentfirst} \usepackage{url} \usepackage[utf8]{inputenc} \DeclareMathOperator{\var}{var} \DeclareMathOperator{\cov}{cov} % \VignetteIndexEntry{MCMC Example} \begin{document} <>= options(keep.source = TRUE, width = 60) @ \title{MCMC Package Example (Version \Sexpr{packageVersion("mcmc")})} \author{Charles J. Geyer} \maketitle \section{The Problem} This is an example of using the \verb@mcmc@ package in R. The problem comes from a take-home question on a (take-home) PhD qualifying exam (School of Statistics, University of Minnesota). Simulated data for the problem are in the dataset \verb@logit@. There are five variables in the data set, the response \verb@y@ and four predictors, \verb@x1@, \verb@x2@, \verb@x3@, and \verb@x4@. A frequentist analysis for the problem is done by the following R statements <>= library(mcmc) data(logit) out <- glm(y ~ x1 + x2 + x3 + x4, data = logit, family = binomial, x = TRUE) summary(out) @ But this problem isn't about that frequentist analysis, we want a Bayesian analysis. For our Bayesian analysis we assume the same data model as the frequentist, and we assume the prior distribution of the five parameters (the regression coefficients) makes them independent and identically normally distributed with mean 0 and standard deviation 2. The log unnormalized posterior density (log likelihood plus log prior) for this model is calculated by the following R function. In order to avoid using either global variables or \verb@...@ arguments, we use the function factory pattern (\citealp[Section~10.3.1]{wickham}; \citealp[Section~7.5, especially Subsection~7.5.4]{basic}; see also Appendix~\ref{sec:versus} below). <>= lupost_factory <- function(x, y) function(beta) { eta <- as.numeric(x %*% beta) logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta))) logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta))) logl <- sum(logp[y == 1]) + sum(logq[y == 0]) return(logl - sum(beta^2) / 8) } lupost <- lupost_factory(out$x, out$y) @ The tricky calculation of the log likelihood avoids overflow and catastrophic cancellation in calculation of $\log(p)$ and $\log(q)$ where \begin{align*} p & = \frac{\exp(\eta)}{1 + \exp(\eta)} = \frac{1}{1 + \exp(- \eta)} \\ q & = \frac{1}{1 + \exp(\eta)} = \frac{\exp(- \eta)}{1 + \exp(- \eta)} \end{align*} so taking logs gives \begin{align*} \log(p) & = \eta - \log(1 + \exp(\eta)) = - \log(1 + \exp(- \eta)) \\ \log(q) & = - \log(1 + \exp(\eta)) = - \eta - \log(1 + \exp(- \eta)) \end{align*} To avoid overflow, we always chose the case where the argument of $\exp$ is negative. We have also avoided catastrophic cancellation when $\lvert\eta\rvert$ is large. If $\eta$ is large and positive, then \begin{align*} p & \approx 1 \\ q & \approx 0 \\ \log(p) & \approx - \exp(- \eta) \\ \log(q) & \approx - \eta - \exp(- \eta) \end{align*} and our use of the R function \texttt{log1p}, which calculates the function $x \mapsto \log(1 + x)$ correctly for small $x$ avoids all problems. The case where $\eta$ is large and negative is similar. \section{Beginning MCMC} With those definitions in place, the following code runs the Metropolis algorithm to simulate the posterior. <>= set.seed(42) # to get reproducible results beta.init <- as.numeric(coefficients(out)) out <- metrop(lupost, beta.init, 1e3) names(out) out$accept @ The arguments to the \verb@metrop@ function used here (there are others we don't use) are \begin{itemize} \item an R function (here \verb@lupost@) that evaluates the log unnormalized density of the desired stationary distribution of the Markov chain (here a posterior distribution). Note that (although this example does not exhibit the phenomenon) that the unnormalized density may be zero, in which case the log unnormalized density is \verb@-Inf@. \item an initial state (here \verb@beta.init@) of the Markov chain. \item a number of batches (here \verb@1e3@) for the Markov chain. This combines with batch length and spacing (both 1 by default) to determine the number of iterations done. \item additional arguments (here \verb@x@ and \verb@y@) supplied to provided functions (here \verb@lupost@). \item there is no ``burn-in'' argument, although burn-in is easily accomplished, if desired (more on this below). \end{itemize} The output is in the component \verb@out$batch@ returned by the \verb@metrop@ function. We'll look at it presently, but first we need to adjust the proposal to get a higher acceptance rate (\verb@out$accept@). It is generally accepted \citep*{grg} that an acceptance rate of about 20\% is right, although this recommendation is based on the asymptotic analysis of a toy problem (simulating a multivariate normal distribution) for which one would never use MCMC and is very unrepresentative of difficult MCMC applications. \citet{geyer-temp} came to a similar conclusion, that a 20\% acceptance rate is about right, in a very different situation. But they also warned that a 20\% acceptance rate could be very wrong and produced an example where a 20\% acceptance rate was impossible and attempting to reduce the acceptance rate below 70\% would keep the sampler from ever visiting part of the state space. So the 20\% magic number must be considered like other rules of thumb we teach in intro courses (like $n > 30$ means means normal approximation is valid). We know these rules of thumb can fail. There are examples in the literature where they do fail. We keep repeating them because we want something simple to tell beginners, and they are all right for some problems. Be that as it may, we try for 20\%. <>= out <- metrop(out, scale = 0.1) out$accept out <- metrop(out, scale = 0.3) out$accept out <- metrop(out, scale = 0.5) out$accept out <- metrop(out, scale = 0.4) out$accept @ Here the first argument to each instance of the \verb@metrop@ function is the output of a previous invocation. The Markov chain continues where the previous run stopped, doing just what it would have done if it had kept going, the initial state and random seed being the final state and final random seed of the previous invocation. Everything stays the same except for the arguments supplied (here \verb@scale@). \begin{itemize} \item The argument \verb@scale@ controls the size of the Metropolis ``normal random walk'' proposal. The default is \verb@scale = 1@. Big steps give lower acceptance rates. Small steps give higher. We want something about 20\%. It is also possible to make \verb@scale@ a vector or a matrix. See \verb@help(metrop)@. \end{itemize} Because each run starts where the last one stopped (when the first argument to \verb@metrop@ is the output of the previous invocation), each run serves as ``burn-in'' for its successor (assuming that any part of that run was worth anything at all). \section{Diagnostics} O.~K. That does it for the acceptance rate. So let's do a longer run and look at the results. <>= out <- metrop(out, nbatch = 1e4) t.test(out$accept.batch)$conf.int out$time @ Here we do a Monte Carlo confidence interval for the true unknown acceptance rate (what we would see with an infinite Monte Carlo sample size). Figure~\ref{fig:fig1} (page~\pageref{fig:fig1}) shows the time series plot made by the R statement <>= plot(ts(out$batch)) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Time series plot of MCMC output.} \label{fig:fig1} \end{figure} Another way to look at the output is an autocorrelation plot. Figure~\ref{fig:fig2} (page~\pageref{fig:fig2}) shows the time series plot made by the R statement <>= acf(out$batch) @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Autocorrelation plot of MCMC output.} \label{fig:fig2} \end{figure} As with any multiplot plot, these are a bit hard to read. Readers are invited to make the separate plots to get a better picture. As with all ``diagnostic'' plots in MCMC, these don't ``diagnose'' subtle problems. \begin{quotation} The purpose of regression diagnostics is to find obvious, gross, embarrassing problems that jump out of simple plots \citep{bogosity}. \end{quotation} The time series plots will show \emph{obvious} nonstationarity. They will not show \emph{nonobvious} nonstationarity. They provide no guarantee whatsoever that your Markov chain is sampling anything remotely resembling the correct stationary distribution (with log unnormalized density \verb@lupost@). In this very easy problem, we do not expect any convergence difficulties and so believe what the diagnostics seem to show, but one is a fool to trust such diagnostics in difficult problems. The autocorrelation plots seem to show that the the autocorrelations are negligible after about lag 25. This diagnostic inference is reliable if the sampler is actually working (has nearly reached equilibrium) and worthless otherwise. Thus batches of length 25 should be sufficient, but let's use length 100 to be safe. A more judicious discussion of asymptotics is found in \citet[Section~1.11.5]{intro}, which says \begin{quotation} Many [diagnostics] come with theorems, but the theorems never prove the property you really want a diagnostic to have. These theorems say that if the chain converges, then the diagnostic will probably say that the chain converged, but they do not say that if the chain pseudo-converges, then the diagnostic will probably say that the chain did not converge. \end{quotation} \section{Monte Carlo Estimates and Standard Errors} <>= out <- metrop(out, nbatch = 100, blen = 100, outfun = function(z) c(z, z^2)) t.test(out$accept.batch)$conf.int out$time @ We have added an argument \verb@outfun@ that gives the functional of the Markov chain \citep[Section~1.6]{intro} we want to average. For this problem we are interested in both posterior mean and variance. Mean is easy, just average the variables in question. But variance is a little tricky. We need to use the identity $$ \var(X) = E(X^2) - E(X)^2 $$ to write variance as a function of two things that can be estimated by simple averages. Hence we want to average the state itself and the squares of each component. Hence our \verb@outfun@ returns \verb@c(z, z^2)@ for an argument (the state vector) \verb@z@. \subsection{Simple Means} The grand means (means of batch means) are <>= apply(out$batch, 2, mean) @ The first 5 numbers are the Monte Carlo estimates of the posterior means. The second 5 numbers are the Monte Carlo estimates of the posterior ordinary second moments. We get the posterior variances by <>= foo <- apply(out$batch, 2, mean) mu <- foo[1:5] sigmasq <- foo[6:10] - mu^2 mu sigmasq @ Monte Carlo standard errors (MCSE) are calculated from the batch means. This is simplest for the means. <>= mu.mcse <- apply(out$batch[ , 1:5], 2, sd) / sqrt(out$nbatch) mu.mcse @ The extra factor \verb@sqrt(out$nbatch)@ arises because the batch means have variance $\sigma^2 / b$ where $b$ is the batch length, which is \verb@out$blen@, whereas the overall means \verb@mu@ have variance $\sigma^2 / n$ where $n$ is the total number of iterations, which is \verb@out$blen * out$nbatch@. \subsection{Functions of Means} To get the MCSE for the posterior variances we apply the delta method. Let $u_i$ denote the sequence of batch means of the first kind for one parameter and $\bar{u}$ the grand mean (the estimate of the posterior mean of that parameter), let $v_i$ denote the sequence of batch means of the second kind for the same parameter and $\bar{v}$ the grand mean (the estimate of the posterior second absolute moment of that parameter), and let $\mu = E(\bar{u})$ and $\nu = E(\bar{v})$. Then the delta method linearizes the nonlinear function $$ g(\mu, \nu) = \nu - \mu^2 $$ as $$ \Delta g(\mu, \nu) = \Delta \nu - 2 \mu \Delta \mu $$ saying that $$ g(\bar{u}, \bar{v}) - g(\mu, \nu) $$ has the same asymptotic normal distribution as $$ (\bar{v} - \nu) - 2 \mu (\bar{u} - \mu) $$ which, of course, has variance \verb@1 / nbatch@ times that of $$ (v_i - \nu) - 2 \mu (u_i - \mu) $$ and this variance is estimated by $$ \frac{1}{n_{\text{batch}}} \sum_{i = 1}^{n_{\text{batch}}} \bigl[ (v_i - \bar{v}) - 2 \bar{u} (u_i - \bar{u}) \bigr]^2 $$ So <>= u <- out$batch[ , 1:5] v <- out$batch[ , 6:10] ubar <- apply(u, 2, mean) vbar <- apply(v, 2, mean) deltau <- sweep(u, 2, ubar) deltav <- sweep(v, 2, vbar) foo <- sweep(deltau, 2, ubar, "*") sigmasq.mcse <- sqrt(apply((deltav - 2 * foo)^2, 2, mean) / out$nbatch) sigmasq.mcse @ does the MCSE for the posterior variance. Let's just check that this complicated \verb@sweep@ and \verb@apply@ stuff does do the right thing. <>= sqrt(mean(((v[ , 2] - vbar[2]) - 2 * ubar[2] * (u[ , 2] - ubar[2]))^2) / out$nbatch) @ \paragraph{Comment} Through version 0.5 of this vignette it contained an incorrect procedure for calculating this MCSE, justified by a handwave (which was incorrect). Essentially, it said to use the standard deviation of the batch means called \verb@v@ here, which appears to be very conservative. \subsection{Functions of Functions of Means} If we are also interested in the posterior standard deviation (a natural question, although not asked on the exam problem), the delta method gives its standard error in terms of that for the variance <>= sigma <- sqrt(sigmasq) sigma.mcse <- sigmasq.mcse / (2 * sigma) sigma sigma.mcse @ \section{A Final Run} So that's it. The only thing left to do is a little more precision (the exam problem directed ``use a long enough run of your Markov chain sampler so that the MCSE are less than 0.01'') <>= out <- metrop(out, nbatch = 500, blen = 400) t.test(out$accept.batch)$conf.int out$time <> <> <> <> @ and some nicer output, which is presented in three tables constructed from the R variables defined above using the R \verb@xtable@ command in the \verb@xtable@ library. \begin{table}[ht] \caption{Posterior Means} \label{tab:mu} \begin{center} <>= foo <- rbind(mu, mu.mcse) dimnames(foo) <- list(c("estimate", "MCSE"), c("constant", paste("$x_", 1:4, "$", sep = ""))) library(xtable) print(xtable(foo, digits = rep(4, 6), align = c("l", rep("c", 5))), floating = FALSE, caption.placement = "top", sanitize.colnames.function = function(x) return(x)) @ \end{center} \end{table} \begin{table}[ht] \caption{Posterior Variances} \label{tab:sigmasq} \begin{center} <>= foo <- rbind(sigmasq, sigmasq.mcse) dimnames(foo) <- list(c("estimate", "MCSE"), c("constant", paste("$x_", 1:4, "$", sep = ""))) library(xtable) print(xtable(foo, digits = rep(4, 6), align = c("l", rep("c", 5))), floating = FALSE, caption.placement = "top", sanitize.colnames.function = function(x) return(x)) @ \end{center} \end{table} \begin{table}[ht] \caption{Posterior Standard Deviations} \label{tab:sigma} \begin{center} <>= foo <- rbind(sigma, sigma.mcse) dimnames(foo) <- list(c("estimate", "MCSE"), c("constant", paste("$x_", 1:4, "$", sep = ""))) library(xtable) print(xtable(foo, digits = rep(4, 6), align = c("l", rep("c", 5))), floating = FALSE, caption.placement = "top", sanitize.colnames.function = function(x) return(x)) @ \end{center} \end{table} Note for the record that the all the results presented in the tables are from ``one long run'' where long here took only <>= cat(out$time[1], "\n") @ seconds (on whatever computer it was run on). \section{New Variance Estimation Functions} R function \texttt{initseq} (added in version 0.6 of this package) estimates variances in the Markov chain central limit theorem (CLT) following the methodology introduced by \citet[Section~3.3]{practical}. These methods only apply to scalar-valued functionals of reversible Markov chains, but the Markov chains produced by the \texttt{metrop} function satisfy this condition, even, as we shall see below, when batching is used. Rather than redo the Markov chains in the preceding material, we just look at a toy problem, an AR(1) time series, which can be simulated in one line of R. This is the example on the help page for \texttt{initseq}. <>= n <- 2e4 rho <- 0.99 x <- arima.sim(model = list(ar = rho), n = n) @ The time series \texttt{x} is a reversible Markov chain and trivially a scalar-valued functional of a Markov chain. Define \begin{equation} \label{eq:little} \gamma_k = \cov(X_i, X_{i + k}) \end{equation} where the covariances refer to the stationary Markov chain having the same transition probabilities as \texttt{x}. Then the variance in the CLT is $$ \sigma^2 = \gamma_0 + 2 \sum_{k = 1}^\infty \gamma_k $$ \citep[Theorem~2.1]{practical}, that is, $$ \bar{x}_n \approx \text{Normal}\left(\mu, \frac{\sigma^2}{n}\right), $$ where $\mu = E(X_i)$ is the quantity being estimated by MCMC (in this toy problem we know $\mu = 0$). Naive estimates of $\sigma^2$ obtained by plugging in empirical estimates of the gammas do not provide consistent estimation \citep[Section~3.1]{practical}. Thus the scheme implemented by the R function \texttt{initseq}. Define \begin{equation} \label{eq:big} \Gamma_k = \gamma_{2 k} + \gamma_{2 k + 1} \end{equation} \citet[Theorem~3.1]{practical} says that $\Gamma_k$ considered as a function of $k$ is strictly positive, strictly decreasing, and strictly convex (provided we are, as stated above, working with a reversible Markov chain). Thus it makes sense to use estimators that use these properties. The estimators implemented by the R function \texttt{initseq} and described by \citet[Section~3.3]{practical} are conservative-consistent in the sense of Theorem~3.2 of that section. Figure~\ref{fig:gamma} (page~\pageref{fig:gamma}) shows the time series plot made by the R statement <>= out <- initseq(x) plot(seq(along = out$Gamma.pos) - 1, out$Gamma.pos, xlab = "k", ylab = expression(Gamma[k]), type = "l") lines(seq(along = out$Gamma.dec) - 1, out$Gamma.dec, lty = "dotted") lines(seq(along = out$Gamma.con) - 1, out$Gamma.con, lty = "dashed") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big}. Solid line, initial positive sequence estimator. Dotted line, initial monotone sequence estimator. Dashed line, initial convex sequence estimator.} \label{fig:gamma} \end{figure} One can use whichever curve one chooses, but now that the \texttt{initseq} function makes the computation trivial, it makes sense to use the initial convex sequence. Of course, one is not interested in Figure~\ref{fig:gamma}, except perhaps when explaining the methodology. What is actually important is the estimate of $\sigma^2$, which is given by <>= out$var.con (1 + rho) / (1 - rho) * 1 / (1 - rho^2) @ where for comparison we have given the exact theoretical value of $\sigma^2$, which, of course, is never available in a non-toy problem. These initial sequence estimators seem, at first sight to be a competitor for the method of batch means. However, appearances can be deceiving. The two methods are complementary. The sequence of batch means is itself a scalar-valued functional of a reversible Markov chain. Hence the initial sequence estimators can be applied to it. <>= blen <- 5 x.batch <- apply(matrix(x, nrow = blen), 2, mean) bout <- initseq(x.batch) @ Because the batch length is too short, the variance of the batch means does not estimate $\sigma^2$. We must account for the autocorrelation of the batches, shown in Figure~\ref{fig:gambat}. <>= plot(seq(along = bout$Gamma.con) - 1, bout$Gamma.con, xlab = "k", ylab = expression(Gamma[k]), type = "l") @ \begin{figure} \begin{center} <>= <> @ \end{center} \caption{Plot ``Big Gamma'' defined by \eqref{eq:little} and \eqref{eq:big} for the sequence of batch means (batch length \Sexpr{blen}). Only initial convex sequence estimator is shown.} \label{fig:gambat} \end{figure} Because the the variance is proportional to one over the batch length, we need to multiply by the batch length to estimate the $\sigma^2$ for the original series. <>= out$var.con bout$var.con * blen @ Another way to look at this is that the MCMC estimator of $\mu$ is either \texttt{mean(x)} or \texttt{mean(x.batch)}. And the variance must be divided by the sample size to give standard errors. So either <>= mean(x) + c(-1, 1) * qnorm(0.975) * sqrt(out$var.con / length(x)) mean(x.batch) + c(-1, 1) * qnorm(0.975) * sqrt(bout$var.con / length(x.batch)) @ is an asymptotic 95\% confidence interval for $\mu$. Just divide by the relevant sample size. \appendix \section{Dot-dot-dot Versus Global Variables Versus Closures} \label{sec:versus} This appendix deals with three ways to pass information to a function being passed to another R function (a higher-order function), for example when \begin{itemize} \item the function being passed is the objective function for an optimization done by the higher-order function, which optimizes (R function \texttt{optim} for example), \item the function being passed is the integrand for an integration done by the higher-order function, which integrates (R function \texttt{integrate} for example), \item the function being passed is the estimator of a parameter for a bootstrap done by the higher-order function, which simulates the (bootstrap approximation) of the sampling distribution of the estimator (R function \texttt{boot} in R package \texttt{boot}, for example), \item the function being passed is the log unnormalized density function for a simulation done by the higher-order function, which simulates the distribution having that unnormalized density (R function \texttt{metrop} in this package for example), \end{itemize} These ways are \begin{itemize} \item using dot-dot-dot (R syntax \verb@...@), \item using global variables, \item using closures, also called the function factory pattern. \end{itemize} The main body of this vignette uses the third option. This appendix explains them all and the virtues and vices of each. \subsection{Dot-dot-dot} The dot-dot-dot mechanism is fairly easy to use when only one function is passed to the higher-order function, but does require care and more work to use. It is even harder to deal with when more than one function is passed to the higher-order function. R functions \texttt{metrop} and \texttt{temper} in this package can be passed two functions: the log unnormalized density function and the output function. \subsection{Only One Function Argument} Previous versions of this vignette used the dot-dot-dot mechanism everywhere. In those versions the log unnormalized density function was defined by <>= lupost <- function(beta, x, y) { eta <- as.numeric(x %*% beta) logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta))) logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta))) logl <- sum(logp[y == 1]) + sum(logq[y == 0]) return(logl - sum(beta^2) / 8) } @ rather than the way it is now done in the main text above. Note that everything is the same in the two definitions except here we have \begin{verbatim} lupost <- function(beta, x, y) { \end{verbatim} where in the main text we now have \begin{verbatim} lupost_factory <- function(x, y) function(beta) { \end{verbatim} and then we have to execute the function factory \verb@lupost_factory@ to make the \texttt{lupost} function. But the main difference is that R function \texttt{lupost} (the user written function specifying the log unnormalized posterior density) \begin{itemize} \item here has 3 arguments, \texttt{beta}, \texttt{x}, and \texttt{y}, and the latter two must be passed via the dot-dot-dot mechanism, but \item there has 1 argument, \texttt{beta}, and it just knows about \texttt{x} and \texttt{y} --- they are in its closure. \end{itemize} So to use this \texttt{lupost} function, we have to add arguments \texttt{x} and \texttt{y} to each call to R function \texttt{metrop}. For example <>= out <- glm(y ~ x1 + x2 + x3 + x4, data = logit, family = binomial, x = TRUE) x <- out$x y <- out$y out <- metrop(lupost, beta.init, 1e3, x = x, y = y) out$accept out <- metrop(out, scale = 0.1, x = x, y = y) out$accept out <- metrop(out, scale = 0.3, x = x, y = y) out$accept out <- metrop(out, scale = 0.5, x = x, y = y) out$accept out <- metrop(out, scale = 0.4, x = x, y = y) out$accept @ and so forth. This method has the benefit that we do not have to explain function factories and has the drawback that we need to keep remembering to add \verb@x = x, y = y@ to each invocation of R function \texttt{metrop}. It is unclear to me which pattern is more mysterious to naive users. \subsection{More Than One Function Argument} The situation becomes more complicated (see the Warning section of the help pages for R functions \texttt{metrop} and \texttt{temper}) when more than one function argument is passed to the higher-order function. Then \emph{they all must handle the \emph{same} dot-dot-dot arguments} whether or not they want them. So now we must define the output function as <>= outfun <- function(z, ...) c(z, z^2) @ The \verb@...@ argument in the function signature is essential because this function is going to be passed dot-dot-dot arguments \texttt{x} and \texttt{y}, which it does not need and does not want, so it has to allow for them (and then not use them). Then we can continue <>= out <- metrop(out, nbatch = 100, blen = 100, outfun = outfun, x = x, y = y) out$accept @ and this works. But we would have gotten an error about unused arguments if we had defined the output function without the dot-dot-dot as we did in the main text. Your humble author was himself confused about this for years and so doubts that naive R users will find it intuitive. \subsection{Global Variables} As every programmer knows global variables are easy to use and evil \citep{c2}. They are part of the way of R (or perhaps part of one of the ways of R). They are OK for ``very small or one-off programs'' \citep{c2}. If you are going to throw the code away after one use, fine. If you are going to give your code to other people or even use it yourself six months later (after you have long forgotten the details), then this method is evil and should be avoided. In this method we define both functions passed to the higher-order function without \verb@...@ and without using a function factory. We already in the preceding section of this appendix defined R objects \texttt{x} and \texttt{y} as global variables (in the R global environment \verb@.GlobalEnv@). They are global variables and we use them as such, defining <>= lupost <- function(beta) { eta <- as.numeric(x %*% beta) logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta))) logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta))) logl <- sum(logp[y == 1]) + sum(logq[y == 0]) return(logl - sum(beta^2) / 8) } outfun <- function(z) c(z, z^2) @ Then the following works <>= out <- metrop(lupost, beta.init, 1e3) out$accept out <- metrop(out, scale = 0.1) out$accept out <- metrop(out, scale = 0.3) out$accept out <- metrop(out, scale = 0.5) out$accept out <- metrop(out, scale = 0.4) out$accept out <- metrop(out, nbatch = 100, blen = 100, outfun = outfun) out$accept @ We get the best of both worlds. We don't need \verb@x = x, y = y@ and we don't need a function factory. But if we change the name of the global variables to say \texttt{modmat} and \texttt{resp} instead of \texttt{x} and \texttt{y}, then our code breaks. R function \texttt{lupost} is looking up global variables \texttt{x} and \texttt{y} \emph{under those names} not under any other names. So your code using this method is rigid and brittle and probably unusable by others. Note that if we are using either of the other methods, renaming is no problem. Using dot-dot-dot we do \begin{verbatim} out <- metrop(out, scale = 0.1, x = modmat, y = resp) \end{verbatim} Using the function factory we do \begin{verbatim} lupost <- lupost_factory(modmat, resp) \end{verbatim} See Section~7.5.3 of \citet{basic} for more about this. \subsection{Function Factory} The terminology ``function factory pattern'' is apparently due to \citet[Section~10.3.1]{wickham}. But it is just a special case about how closures work (in R and all other languages that have them). Compare <>= fred <- function(y) function(x) x + y fred(2)(3) @ \citep[Section~6.5]{basic}) to <>= lupost_factory <- function(x, y) function(beta) { eta <- as.numeric(x %*% beta) logp <- ifelse(eta < 0, eta - log1p(exp(eta)), - log1p(exp(- eta))) logq <- ifelse(eta < 0, - log1p(exp(eta)), - eta - log1p(exp(- eta))) logl <- sum(logp[y == 1]) + sum(logq[y == 0]) return(logl - sum(beta^2) / 8) } lupost <- lupost_factory(x, y) lupost(beta.init) @ We could also do the same calculation treating \verb@lupost_factory@ as just an ordinary curried function, like R function \texttt{fred} in the preceding example, <>= lupost_factory(x, y)(beta.init) @ So there is nothing mysterious about function factories. They are just one particular use of the essence of functional programming (closures). If you really understand functional programming, then you must understand this. Long ago, when I switched from S to R, I would not have considered that being an ``knowledgeable user'' of R required knowledge of closures. Now I do. This is partly because other functional languages like Scheme, Javascript, F\#, Clojure, and Haskell have become very popular for general computer programming. So now we want to emphasize that we can do in R what we can do in them. But it is also partly because I now understand more about functional programming. I can now see that if you really understand closures, then you don't need most other features of these programming languages (including R). Following \citet{crockford} we can say that all programming languages (including R) have their good parts and their bad parts and we should use the good features and avoid the bad features. The best feature is closures. Crockford calls this the best idea ever put in a programming language. So we should use it a lot. If I were starting to write the \texttt{mcmc} package today, I might leave out the dot-dot-dot arguments of R functions \texttt{metrop}, \texttt{morph.metrop}, and \texttt{temper}. That would mean that users could not use the dot-dot-dot pattern. They would be forced to use the function factory pattern or the global variables pattern. And if they had accepted that the global variables pattern is evil, then they would have to use the function factory pattern. (If you like the dot-dot-dot pattern, don't worry. It is not going to be removed from this package. Backward compatibility trumps everything.) \begin{thebibliography}{} \bibitem[C2 Wiki(2013)]{c2} C2 Wiki Contributors (2013). \newblock Global Variables Are Bad. \newblock \url{http://wiki.c2.com/?GlobalVariablesAreBad}. \bibitem[Crockford(2008)]{crockford} Crockford, D. (2008). \newblock \emph{JavaScript: The Good Parts}. \newblock O'Reilly, Sebastopol CA. \bibitem[Gelman et al.(1996)Gelman, Roberts, and Gilks]{grg} Gelman, A., G.~O. Roberts, and W.~R. Gilks (1996). \newblock Efficient Metropolis jumping rules. \newblock In \emph{Bayesian Statistics, 5 (Alicante, 1994)}, pp.~599--607. Oxford University Press. \bibitem[Geyer(1992)]{practical} Geyer, C.~J. (1992). \newblock Practical Markov chain Monte Carlo (with discussion). \newblock \emph{Statistical Science}, 7, 473--511. \bibitem[Geyer(2006)]{bogosity} Geyer, C. J. (2006). \newblock On the bogosity of MCMC diagnostics. \newblock \url{http://users.stat.umn.edu/~geyer/mcmc/diag.html}. \bibitem[Geyer(2011)]{intro} Geyer, C. J. (2011). \newblock Introduction to Markov chain Monte Carlo. \newblock In \emph{Handbook of Markov Chain Monte Carlo}, edited by Brooks, S., Gelman, A., Jones, G., and Meng, X.-L., pp.~3--48. \newblock Chapman \& Hall/CRC, Boca Raton, FL. \bibitem[Geyer(2018)]{basic} Geyer, C.~J. (2018). \newblock Stat 3701 lecture notes: Basics of R. \newblock \url{http://www.stat.umn.edu/geyer/3701/notes/basic.html}. \bibitem[Geyer and Thompson(1995)]{geyer-temp} Geyer, C.~J. and E.~A. Thompson (1995). \newblock Annealing Markov chain Monte Carlo with applications to ancestral inference. \newblock \emph{Journal of the American Statistical Association}, 90, 909--920. \bibitem[Wickham(2014)]{wickham} Wickham, H. (2014). \newblock \emph{Advanced R}. \newblock Chapman \& Hall/CRC, Boca Raton, FL. \newblock Also available on the web \url{http://adv-r.had.co.nz/}. \end{thebibliography} \end{document}