A Geometric Progression Leow Yao Yang

Note #1: Clustering Basics

Clustering goals

Partition a set of points into disjoint sets such that:

  • Points in the same group are similar
  • Points in different groups are dissimilar
  • Formally, we want a graph partition with minimal cut

Clustering variants

  • Similarity-based: given a N x N dissimilarity matrix or distance matrix D
  • Feature-based: given a N x D feature matrix or design matrix X
  • Flat clustering: partition into disjoint sets
  • Hierarchical clustering: partitions are organised in a tree structure

Associated problems

  • Multi-view clustering: cluster feature vectors, not data points
  • Label prediction: given a point, predict its label

Clustering algorithms

  • k-means
  • Hierarchical clustering
  • Gaussian mixtures
  • Spectral clustering
  • Affinity propagation
  • Infinite mixture model (Dirichlet process)

Clustering metrics

  • Graph cuts - cut score & conductance of the graph
  • Purity - requires ground truth class labels, does not penalise for # clusters
  • RAND score
  • Mutual information