Linear Discriminant Classifier
Suppose we have two classes of data \(P_1\) and \(P_2\) corresponding to genuine and counterfeit currency distributed \(f_j(\mathbf x)\), \(j = \{1, 2\}\). Letting \(Y\) denote classes \(Y = 1\) or \(Y = 2\). We also assume the prior probabilities of class assignment as \(\mathbb P(Y = 1) = \pi_1\), \(\mathbb P(Y = 2) = \pi_2\) and \(\pi_1 + \pi_2 = 1\).
Given we have observed \(\mathbf x\) what is the probability of it belonging to class 1?
thus Bayes rule classifies \(\mathbf x\) to the class with the highest probability:
In other words, we assign \(\mathbf x\) to class 1 if \(\mathbb P(Y = 1 \mid X = \mathbf x) > \mathbb P(Y = 2 \mid X = \mathbf x)\) or \(f_1(\mathbf x)\pi_1 > f_2(\mathbf x)\pi_2\).
Assuming the populations follow a normal distribution with different means but equal covariance, \(X \mid Y = j \sim \mathcal N(\mu_, \Sigma)\) We then assign \(\mathbf x\) to class 1 if
In practice the true mean and covariance is not known for population distributions. Instead we use the sample statistic, \(\hat \mu_j = \bar{\mathbf x_j}\) for the mean and \(\hat \Sigma_j = \sum_i (x_i - \bar{\mathbf x_j})^T(x_i - \bar{\mathbf x_j}) \,/\, (n_j - 1)\) for the covariance. It is also common to specify the prior probabilites as \(\hat \pi_j = n_j \,/\, N\), where \(n_j\) is the number of observations in class \(j\).
Relaxing Constraints and Generalizing
So far we have looked at the \(K =2\) classes case. For \(K > 2\) the probability of observation \(\mathbf x\) being distributed under class \(j\) is
If we assume the populations do not have the same covariance \(\Sigma_j \neq \Sigma_k\) we get Quadratic Discriminant Analysis. Gaussian assumption can be replaced for non-linear decision boundary and the non-parametric version, Naive Bayes also considers independent conditional probabilities.