STOR-i Masterclass: Professor Brendan Murphy
Last week we had the first of this years STOR-i masterclasses given by Professor Brendan Murphy from University College Dublin. He introduced us to model-based clustering and classification. I hope to give a brief insight into his interesting talks over the two days.
The goal of clustering analysis is to place objects into groups that give some meaningful analysis. The idea is to get groups whose members share something in common, and different from members of other clusters.
Clustering as a concept has been around for millennia. Plato was the first to formalize the thinking with his ‘Theory of Forms” and Aristotle classified animals into groups based on their characteristics in his “History of Animals”.
Much later on Linnaeus began to cluster plants into hierarchical groups in his works “Species Plantarum” and “Systema Naturae”. He used features such as if the plants had flowers, and the number of stamen to divide them up into 24 different classes.
Brendan then explained how more recent clustering algorithms could be coded up on a computer to distinguish between different vectors of numbers. The masterclass was finished off with a live demonstration of how we could cluster runners in the 24 Hour World Championship of running, which led into thinking about some open questions in the area of clustering and classification.
Putting the methods into practice
I took the methods learnt in the masterclass and tried to apply them myself to cluster the pixels found in an image. If we think of a picture as the pixels with their red, green and blue co-ordinates plotted in a three dimensional grid, then we can cluster these into k groups and use this to compress the size of the image. Working in python I used this to cluster a picture of some of the mRes cohort.
If you would like to find more out about clustering I would recommend looking at some of Brendan’s work in the field including the book:
Bouveyron, Charles & Celeux, Gilles & Murphy, Thomas & Raftery, Adrian. (2019). Model-Based Clustering and Classification for Data Science: With Applications in R. 10.1017/9781108644181.