scppdc
PURPOSE
Spectral Clustering Projection Pursuit Divisive Clustering
SYNOPSIS
function [idx,t] = scppdc(X, K, varargin)
DESCRIPTION
Spectral Clustering Projection Pursuit Divisive Clustering [IDX,T] = SCPPDC(X, K, VARARGIN) [IDX, T] = SCPPDC(X, K) produces a divisive hierarchical clustering of the N-by-D data matrix (X) into (a maximum of) K clusters. This algorithm uses a hierarchy of binary partitions each splitting the observations by identifying the linear subspace that minimises the second smallest eigenvalue of the normalised Laplacian and then splitting by applying spectral clustering on the projected data. The algorithm can return fewer clusters if no subspace is found that returns a valid clustering. [IDX, T] = SCPPDC(X, K) returns the cluster assignment, IDX, and the binary tree (T) containing the cluster hierarchy [IDX, T] = SCPPDC(X, K, 'PARAM1',val1, 'PARAM2',val2, ...) specifies optional parameters in the form of Name,Value pairs. OPTIONAL PARAMETERS: 'v0' - Initial projection matrix Function handle: v0(X) returns D-by-S projection matrix (default: v0 = @(y)(pca(y,'NumComponents',2)) -- first 2 principal components) 'sigma' - Scale parameter for Gaussian kernel used to estimate similarity matrix Function Handle: sigma(X,params) returns sigma parameter (positive scalar) 'split_index' - Criterion determining which cluster to split Function Handle: index = split_index(v, X, pars) (v: projection matrix, X:data matrix, pars: parameters structure) Cluster with MAXIMUM INDEX is split at each step of the algorithm Three standard choices of split index can be enabled by settgin 'split_index' to one of the strings below: + 'size': Split largest cluster + 'fval': Split cluster whose normalised Laplacian has smallest second eigenvalue (default: split_index = 'size') 'NumMicroClust' - Number of microclusters (default: 200) 'minsize' - Minimum cluster size (integer) (default: size(X,1)/(5*K)) 'beta' - Initial/ maximum value of BETA (used in similarity transformation) (default: 3.0) 'betaMin' - The minimum BETA used in similarity function transformation: BETA starts from (BETA) and decreaes by 0.2 every until a valid subspace is found. (default: 0.5) 'maxit' - Number of BFGS iterations to perform (default: 50) 'ftol' - Stopping criterion for change in objective function value over consecutive iterations (default: 1.e-5) 'verb' - Verbosity. Values greater than 0 enable visualisation during execution Enabling this option slows down the algorithm considerably (default: 0) 'labels' - true cluster labels. Specifying these enables the computation of performance over successive iterations and a better visualisation of how clusters are split 'colours' - Matrix containing colour specification for observations in different clusters Number of rows must be equal to the number of true clusters (if 'labels' has been specified) or equal to 2. Reference D.P. Hofmeyr, N.G. Pavlidis, and I.A. Eckley. Minimum spectral connectivity projection pursuit. Statistics and Computing, forthcoming.
CROSS-REFERENCE INFORMATION
This function calls:- ifelse Shorthand for ternary operator: if-then-else
- myparser Function used to parse optional arguments in form of Name,Value pairs for a number of OPC algorithms
- palette Determines colours used for visualisation
- tree2clusters Assigns cluster labels from a cluster hierarchy (ctree object)
- ctree Class implementing cluster hierarchy in tree data structure
- scpp Spectral Clustering Projection Pursuit (SCPP) (divisive clustering is implemented in scppdc.m)
Generated on Tue 17-Jul-2018 18:58:09 by m2html © 2005