Metropolis Hastings

2021-04-27 | Modified: 2021-04-27 | Statistics ·

We continue the series of Monte Carlo methods by covering Markov chain based sampling algorithms.

Metropolis Hastings is a sampling algorithm that relies on the guarantees from invariant distributions.

Metropolis Hastings


Initialise \(X^{(0)}\) and for \(t \geq 1\)

  1. Draw \(X \sim Q(\cdot \mid X^{(t-1)})\)
  2. Compute
    $$\alpha(X \mid X^{(t-1)}) = \min \left(1, \frac{\pi(X)Q(X^{(t-1)} \mid X)}{\pi(X^{(t-1)})Q(X \mid X^{(t-1)})} \right)$$

Set \(X^{(t)} = X\) with probability \(\alpha(X \mid X^{(t-1)})\) else \(X^{(t)} = X^{(t-1)}\).


The conditional probability of accepting any proposed state given the chain is in \(i\) is

$$ a(i) = \sum_{j \in \mathbb S} \alpha(j \mid i)Q_{i, j} $$

From the algorithm we see the transition kernel \(K_{i,j} = \alpha(j\mid i)Q_{i,i} + (1 - a(i))\delta_{ij}\) generates a Markov chain for the sequence \((X^{(t)})_{t \geq 0}\) using the distribution \(\pi\).

An advantage of using MH algorithm is we only need to know a distribution that is proportional to \(\pi\).

There are two popular Metropolis-Hastings variations, the Metropolis algorithm that uses a symmetric proposal distribution \(Q(X^{(t)} \mid X^{(t-1)}) = Q(X^{(t-1)} \mid X^{(t)})\). Common distributions used are normal, \(\mathcal N(\mu, \sigma^2)\) and uniform, \(U[a, b]\). The resulting acceptance ratio is simplified to

$$\alpha(X \mid X^{(t-1)}) = \min \left(1, \frac{\pi(X)}{\pi(X^{(t-1)})} \right)$$

Gibbs Sampling

Gibbs sampling uses full conditional distributions as the proposal distribution. For \(j = 1,2, \ldots, n\) the proposal

$$Q(X \mid X^{(t-1)}) = \pi(X_j \mid X_1^{(t-1)}, \ldots, X_1^{(t-1)}, X_1^{(t+1)}, \ldots, X_1^{(n)})$$

The acceptance probability \(\alpha(X \mid X^{(t-1)})\) can be shown to be equal to 1.


Initialise \(X^{(0)} = (X_1^{(0)}, X_2^{(0)})\) and for \(t \geq 1\)

  1. Draw \(X_1^{(t)} \sim \pi_{1|2}(\cdot \mid X_2^{(t-1)})\)
  2. Draw \(X_2^{(t)} \sim \pi_{2|1}(\cdot \mid X_1^{(t)})\)

The transition kernel that guarantees an invariant distribution is

$$K(\mathbf x, \mathbf y) = \pi_{1|2}(y_1 \mid x_2)\pi_{2|1}(y_2 \mid y_1)$$

with \(\mathbf x = (x_1, x_2) = (X_1^{(t-1)}, X_2^{(t-1)})\) and \(\mathbf y = (y_1, y_2) = (X_1^{(t)}, X_2^{(t)})\)