scppdc

PURPOSE

Spectral Clustering Projection Pursuit Divisive Clustering

SYNOPSIS

function [idx,t] = scppdc(X, K, varargin)

DESCRIPTION

Spectral Clustering Projection Pursuit Divisive Clustering
[IDX,T] = SCPPDC(X, K, VARARGIN)

 [IDX, T] = SCPPDC(X, K) produces a divisive hierarchical clustering of the
 N-by-D data matrix (X) into (a maximum of) K clusters. This algorithm uses a
 hierarchy of binary partitions each splitting the observations by identifying
 the linear subspace that minimises the second smallest eigenvalue of the
 normalised Laplacian and then splitting by applying spectral clustering on
 the projected data.
 The algorithm can return fewer clusters if no subspace is found that returns 
 a valid clustering.

  [IDX, T] = SCPPDC(X, K) returns the cluster assignment, IDX, and the  binary tree
  (T) containing the cluster hierarchy

  [IDX, T] = SCPPDC(X, K, 'PARAM1',val1, 'PARAM2',val2, ...) specifies optional parameters
  in the form of Name,Value pairs. 

  OPTIONAL PARAMETERS:
  'v0' - Initial projection matrix
    Function handle: v0(X) returns D-by-S projection matrix
       (default: v0 = @(y)(pca(y,'NumComponents',2)) -- first 2 principal components)

  'sigma' - Scale parameter for Gaussian kernel used to estimate similarity matrix
    Function Handle: sigma(X,params) returns sigma parameter (positive scalar)

  'split_index' - Criterion determining which cluster to split
    Function Handle: index = split_index(v, X, pars)
            (v: projection matrix, X:data matrix, pars: parameters structure)
    Cluster with MAXIMUM INDEX is split at each step of the algorithm
    Three standard choices of split index can be enabled by settgin 'split_index' to 
    one of the strings below:
        + 'size':    Split largest cluster
        + 'fval':    Split cluster whose normalised Laplacian has smallest second eigenvalue
    (default: split_index = 'size')

  'NumMicroClust' - Number of microclusters
    (default: 200)

  'minsize' - Minimum cluster size (integer)
    (default: size(X,1)/(5*K))

  'beta' - Initial/ maximum value of BETA (used in similarity transformation)
    (default: 3.0)

  'betaMin' - The minimum BETA used in similarity function transformation: 
    BETA starts from (BETA) and decreaes by 0.2 every until a valid subspace is found.
    (default: 0.5)

  'maxit' - Number of BFGS iterations to perform
    (default: 50)

  'ftol' - Stopping criterion for change in objective function value over consecutive iterations
    (default: 1.e-5)

  'verb' - Verbosity. Values greater than 0 enable visualisation during execution
    Enabling this option slows down the algorithm considerably
    (default: 0)

  'labels' - true cluster labels. Specifying these enables the computation of performance over 
    successive iterations and a better visualisation of how clusters are split

  'colours' - Matrix containing colour specification for observations in different clusters
    Number of rows must be equal to the number of true clusters (if 'labels' has been specified) or equal to 2.

Reference
D.P. Hofmeyr, N.G. Pavlidis, and I.A. Eckley. Minimum spectral connectivity projection pursuit. 
Statistics and Computing, forthcoming.

CROSS-REFERENCE INFORMATION

This function calls: This function is called by:
Generated on Tue 17-Jul-2018 18:58:09 by m2html © 2005