Metropolis Hastings

We continue the series of Monte Carlo methods by covering Markov chain based sampling algorithms.

Metropolis Hastings is a sampling algorithm that relies on the guarantees from invariant distributions.

Metropolis Hastings

Initialise $X^{(0)}$ and for $t \geq 1$

Draw $X \sim Q(\cdot \mid X^{(t-1)})$
Compute
$$\alpha(X \mid X^{(t-1)}) = \min \left(1, \frac{\pi(X)Q(X^{(t-1)} \mid X)}{\pi(X^{(t-1)})Q(X \mid X^{(t-1)})} \right)$$

Set $X^{(t)} = X$ with probability $\alpha(X \mid X^{(t-1)})$ else $X^{(t)} = X^{(t-1)}$.

The conditional probability of accepting any proposed state given the chain is in $i$ is

$$ a(i) = \sum_{j \in \mathbb S} \alpha(j \mid i)Q_{i, j} $$

From the algorithm we see the transition kernel $K_{i,j} = \alpha(j\mid i)Q_{i,i} + (1 - a(i))\delta_{ij}$ generates a Markov chain for the sequence $(X^{(t)})_{t \geq 0}$ using the distribution $\pi$.

An advantage of using MH algorithm is we only need to know a distribution that is proportional to $\pi$.

There are two popular Metropolis-Hastings variations, the Metropolis algorithm that uses a symmetric proposal distribution $Q(X^{(t)} \mid X^{(t-1)}) = Q(X^{(t-1)} \mid X^{(t)})$. Common distributions used are normal, $\mathcal N(\mu, \sigma^2)$ and uniform, $U[a, b]$. The resulting acceptance ratio is simplified to

$$\alpha(X \mid X^{(t-1)}) = \min \left(1, \frac{\pi(X)}{\pi(X^{(t-1)})} \right)$$

Gibbs Sampling

Gibbs sampling uses full conditional distributions as the proposal distribution. For $j = 1,2, \ldots, n$ the proposal

$$Q(X \mid X^{(t-1)}) = \pi(X_j \mid X_1^{(t-1)}, \ldots, X_1^{(t-1)}, X_1^{(t+1)}, \ldots, X_1^{(n)})$$

The acceptance probability $\alpha(X \mid X^{(t-1)})$ can be shown to be equal to 1.

Initialise $X^{(0)} = (X_1^{(0)}, X_2^{(0)})$ and for $t \geq 1$

Draw $X_1^{(t)} \sim \pi_{1|2}(\cdot \mid X_2^{(t-1)})$
Draw $X_2^{(t)} \sim \pi_{2|1}(\cdot \mid X_1^{(t)})$

The transition kernel that guarantees an invariant distribution is

$$K(\mathbf x, \mathbf y) = \pi_{1|2}(y_1 \mid x_2)\pi_{2|1}(y_2 \mid y_1)$$

with $\mathbf x = (x_1, x_2) = (X_1^{(t-1)}, X_2^{(t-1)})$ and $\mathbf y = (y_1, y_2) = (X_1^{(t)}, X_2^{(t)})$