Classification
Nearest Mean Classifier
The nearest mean classifier is a simple classifier that is based on the idea that the mean of the training data for each class is the best estimate of the mean of the data for that class.
- No further knowledge about the structure of the underlying data is required.
- Efficiently computable.
- Outliers can influence the mean and thus the classification.
- The mean is not always a representative point of the data.
- Failure in case of different 'spread of data' (large eigenvalues of the covariance matrix (
np.cov(M.T))).
import numpy as np
def distance(p, q):
dif = p - q
return np.linalg.norm(dif, 2)
def classify_nearestmean(A, M, k):
# classification by nearest mean
t = M.shape[0]
r, d = A.shape
N = r // k
means = np.zeros((k, d))
alpha = np.zeros((M.shape[0],))
for i in range(k):
T = A[N * i:N * (i + 1)]
means[i, :] = np.mean(T, axis=0)
# print(means)
dist_to_mean = np.zeros((k,))
for i in range(t):
P = M[i, :]
for j in range(k):
dist_to_mean[j] = distance(P, means[j, :])
alpha[i] = dist_to_mean.argmin()
return alpha
Bayes Classifier (based on Gaussian Mixture Model)
Using a distance measure (like the mahalonobis distance) the Bayes classifier can be used to classify data. The distance measure is based on the assumption that the data is distributed according to a Gaussian Mixture Model (GMM).
import numpy as np
import scipy.stats as stats
def classify_gmm(A, M, k):
"""
Classification by Gaussian mixture model.
"""
r, d = A.shape
N = r // k
means = np.zeros((k, d))
covariances = np.zeros((k, d, d))
for i in range(k):
T = A[N * i:N * (i + 1)]
means[i, :] = np.mean(T, axis=0)
covariances[i, :, :] = np.cov(T.T)
# print(f"means: {means}")
# print(f"covariances: {covariances}")
probability_densities = [stats.multivariate_normal.pdf(M, means[j, :], covariances[j, :, :]) for j in range(k)]
return np.argmax(probability_densities, axis=0)