Monte Carlo methods are a class of sampling algorithms to obtain numerical approximations. To do this we rely on the weak law of large numbers:
 The weak law of large numbers

A sequence of random variables \(X_1, X_2, \cdots, X_n\) converges in probability to a random variable \(X\) if for all \(\delta > 0\)
$$ \lim_{n\rightarrow \infty} \mathbb P(X_n  X  \geq \delta) = 0. $$
This is also written as \(X_n \overset{\text{p}}\rightarrow X\)
 A consequence of this taking using the empirical mean \(\overline{X}_n = \frac{1}{n} \sum_{i=1}^n X_i\)

If the sequence \(X_1, X_2, \cdots\) are independent and identically distributed (i.i.d) with \(\mathbb E(X_1) = \mu\) and finite variance \(\sigma^2 < \infty\), we have \(\overline{X}_n \overset{\text{p}}\rightarrow \mu\)
For an illustrated example we'll use an abstract example of estimating the probability that \(X \in A\) for a set \(A \subset \mathbb R\). We also assume \(X\) has probability density function \(\pi\).
Defining the estimator of a sample landing in \(A\) as \(\hat{p} := \boldsymbol{1}_{X_n \in A}\) we notice that \(\mathbb P(X \in A) = \mathbb E[\boldsymbol{1}_{X_n \in A}]\) thus we approximate \(\mathbb P(X \in A)\) with the empirical mean \(\frac{1}{n} \sum_{i=1}^n \boldsymbol{1}_{X_i \in A}\)
where
The estimator \(\hat{p}\) describes the proportion of samples landing in \(A\). If we define \(Z_n\) as the number of samples landing in \(A\) the we have the relationship \(\hat{p} = \dfrac{Z_n}{n}\).
It follows that the number of samples that land in \(A\) follows a Binomial distribution \(Z_n \sim \text{Bin}(n, p)\). This arises from considering the sum of \(n\) independent Bernoulli trials.