Classification

Nearest Mean Classifier

The nearest mean classifier is a simple classifier that is based on the idea that the mean of the training data for each class is the best estimate of the mean of the data for that class.

No further knowledge about the structure of the underlying data is required.
Efficiently computable.

Outliers can influence the mean and thus the classification.
The mean is not always a representative point of the data.
Failure in case of different 'spread of data' (large eigenvalues of the covariance matrix (np.cov(M.T))).

import numpy as np

def distance(p, q):
    dif = p - q
    return np.linalg.norm(dif, 2)


def classify_nearestmean(A, M, k):
    # classification by nearest mean
    t = M.shape[0]
    r, d = A.shape
    N = r // k

    means = np.zeros((k, d))
    alpha = np.zeros((M.shape[0],))
    for i in range(k):
        T = A[N * i:N * (i + 1)]
        means[i, :] = np.mean(T, axis=0)
    # print(means)
    dist_to_mean = np.zeros((k,))
    for i in range(t):
        P = M[i, :]
        for j in range(k):
            dist_to_mean[j] = distance(P, means[j, :])
        alpha[i] = dist_to_mean.argmin()

    return alpha

Bayes Classifier (based on Gaussian Mixture Model)

Using a distance measure (like the mahalonobis distance) the Bayes classifier can be used to classify data. The distance measure is based on the assumption that the data is distributed according to a Gaussian Mixture Model (GMM).

import numpy as np
import scipy.stats as stats

def classify_gmm(A, M, k):
    """
    Classification by Gaussian mixture model.
    """
    r, d = A.shape
    N = r // k
    means = np.zeros((k, d))
    covariances = np.zeros((k, d, d))

    for i in range(k):
        T = A[N * i:N * (i + 1)]
        means[i, :] = np.mean(T, axis=0)
        covariances[i, :, :] = np.cov(T.T)

    # print(f"means: {means}")
    # print(f"covariances: {covariances}")

    probability_densities = [stats.multivariate_normal.pdf(M, means[j, :], covariances[j, :, :]) for j in range(k)]
    return np.argmax(probability_densities, axis=0)