Discriminant Analysis

2021-05-02 | Modified: 2021-05-02

Linear Discriminant Classifier

Suppose we have two classes of data \(P_1\) and \(P_2\) corresponding to genuine and counterfeit currency distributed \(f_j(\mathbf x)\), \(j = \{1, 2\}\). Letting \(Y\) denote classes \(Y = 1\) or \(Y = 2\). We also assume the prior probabilities of class assignment as \(\mathbb P(Y = 1) = \pi_1\), \(\mathbb P(Y = 2) = \pi_2\) and \(\pi_1 + \pi_2 = 1\).

Given we have observed \(\mathbf x\) what is the probability of it belonging to class 1?

$$\mathbb P(Y = 1 \mid X = \mathbf x) = \frac{\mathbb P(Y = 1 \cap X = \mathbf x)}{\mathbb P(X = \mathbf x)} = \frac{f_1(\mathbf x)\pi_1}{f_1(\mathbf x)\pi_1 + f_2(\mathbf x)\pi_2}$$

thus Bayes rule classifies \(\mathbf x\) to the class with the highest probability:

$$\varphi(\mathbf x) = \arg \underset{j\in\{1, 2\}}\max \mathbb P(Y = j \mid X = \mathbf x).$$

In other words, we assign \(\mathbf x\) to class 1 if \(\mathbb P(Y = 1 \mid X = \mathbf x) > \mathbb P(Y = 2 \mid X = \mathbf x)\) or \(f_1(\mathbf x)\pi_1 > f_2(\mathbf x)\pi_2\).

Assuming the populations follow a normal distribution with different means but equal covariance, \(X \mid Y = j \sim \mathcal N(\mu_, \Sigma)\) We then assign \(\mathbf x\) to class 1 if

$$\begin{aligned} &\frac{f_1(\mathbf x)}{f_2(\mathbf x)} > \frac{\pi_2}{\pi_1} \\ &\Rightarrow \log \frac{f_1(\mathbf x)}{f_2(\mathbf x)} + \log \frac{\pi_1}{\pi_2} > 0 \\ &= \log \frac{\pi_1}{\pi_2} + (\mu_1 - \mu_2)^T\Sigma^{-1}\mathbf x - \frac{1}{2} (\mu_1 - \mu_2)^T\Sigma^{-1}(\mu_1 + \mu_2) > 0. \end{aligned}$$

In practice the true mean and covariance is not known for population distributions. Instead we use the sample statistic, \(\hat \mu_j = \bar{\mathbf x_j}\) for the mean and \(\hat \Sigma_j = \sum_i (x_i - \bar{\mathbf x_j})^T(x_i - \bar{\mathbf x_j}) \,/\, (n_j - 1)\) for the covariance. It is also common to specify the prior probabilites as \(\hat \pi_j = n_j \,/\, N\), where \(n_j\) is the number of observations in class \(j\).

Relaxing Constraints and Generalizing

So far we have looked at the \(K =2\) classes case. For \(K > 2\) the probability of observation \(\mathbf x\) being distributed under class \(j\) is

$$\mathbb P(Y = j \mid \mathbf x) = \frac{f_j(x)\pi_j(x)}{\sum_{k=1}^K f_k(\mathbf x)\pi_k(\mathbf x)}$$

If we assume the populations do not have the same covariance \(\Sigma_j \neq \Sigma_k\) we get Quadratic Discriminant Analysis. Gaussian assumption can be replaced for non-linear decision boundary and the non-parametric version, Naive Bayes also considers independent conditional probabilities.