mcdc

PURPOSE

Maximum Clusterability Divisive Clustering

SYNOPSIS

function [idx,t] = mcdc(X, K, varargin)

DESCRIPTION

Maximum Clusterability Divisive Clustering
[IDX,T] = MCDC(X, K, VARARGIN)

 [IDX, T] = MCDC(X, K) produces a divisive hierarchical clustering of the
 N-by-D data matrix X into (a maximum of) K clusters. This algorithm uses a
 hierarchy of binary partitions each splitting the observations with the
 hyperplane with maximum variance ratio clusterability. The algorithm can
 return fewer clusters if no valid hyperplane separators are found.

  [IDX,T] = MCDC(X, K) returns the cluster assignment, (IDX), and the  binary 
  tree (T) containing the cluster hierarchy

  [IDX, T] = MCDC(X, K, 'PARAM1',val1, 'PARAM2',val2, ...) specifies optional parameters
  in the form of Name,Value pairs. 

  'v0' - Function handle. v0(X) returns D-by-S matrix of initial projection vectors
       (default: Vector connecting centroids from 2-means)

  'split_index' - Criterion determining which cluster to split
    Function Handle: index = split_index(v, X, pars)
            (v: projection vector, X:data matrix, pars: parameters structure)
    Cluster with MAXIMUM INDEX is split at each step of the algorithm
    Two standard choices of split index can be enabled by setting 'split_index' to 
    one of the strings below:
        + 'fval':    Split cluster whose hyperplane achieves the lowest density integral
        + 'size':    Split largest cluster
    (default: split_index = 'mc_spindex' as recommended in Hofmeyr and Pavlidis (2015))

  'minsize' - Minimum cluster size (integer)
    (default minsize = 1)

  'maxit' - Number of BFGS iterations to perform for each value of alpha (default: 50)

  'ftol' - Stopping criterion for change in objective function value over consecutive iterations
    (default: 1.e-7)

  'verb' - Verbosity. Values greater than 0 enable visualisation during execution
    Enabling this option slows down the algorithm considerably
    (default: 0)

  'labels' - true cluster labels. Specifying these enables the computation of performance over 
    successive iterations and a better visualisation of how clusters are split

  'colours' - Matrix containing colour specification for observations in different clusters
    Number of rows must be equal to the number of true clusters (if 'labels' has been specified) or equal to 2.

Reference:
D.P. Hofmeyr and N.G. Pavlidis. Maximum clusterability divisive clustering.
IEEE Symposium Series on Computational Intelligence, pages 780-786, 2015.

CROSS-REFERENCE INFORMATION

This function calls:

ifelse Shorthand for ternary operator: if-then-else
myparser Function used to parse optional arguments in form of Name,Value pairs for a number of OPC algorithms
palette Determines colours used for visualisation
tree2clusters Assigns cluster labels from a cluster hierarchy (ctree object)
ctree Class implementing cluster hierarchy in tree data structure
mc_spindex Default split_index used to select which cluster MCDC partitions at each iteration
mc_v0 Default projection vector for maximum clusterability projection pursuit
mcpp Maximum Clusterability Projection Pursuit (MCPP) algorithm

This function is called by: