Supervised Vs. Unsupervised Learning
As the world is getting ‘smarter’ every day, most of the world is moving towards Machine learning and artificial intelligence to keep up with the expectations. Statistical machine learning encompasses automatic learning and data analysis procedures, which learn a task from a series of examples with the goal of identifying patterns and performing predictions under uncertainty. We will explore and understand the differences between the two main machine learning methods in this blog
What is Supervised Learning?
Supervised learning, as the name indicates, has the presence of a supervisor as a teacher. Basically, Supervised learning is when we teach or train the machine to use the data that is well defined and labelled. Just as a child, how we aren’t able to identify the fruits or vegetables right until we get to see, feel, and learn from the inputs, the same logic is applied here to the machines. In supervised learning, models need to find the mapping function to map the input variable (X) with the output variable (Y). The goal of supervised learning is to train the model so that it can predict the output when it is given new data.
For example, let’s assume we have images of different types of fruits. The task of our supervised learning model is to identify the fruits and classify them accordingly. So to identify the image in supervised learning, we will give the input data as well as output for that, which means we will train the model by the shape, size, color, and taste of each fruit. Once the training is completed, we will test the model by giving the new set of fruit. The model will identify the fruit and predict the output using a suitable algorithm.
Supervised learning can be separated into two types of problems, classification and regression.
- Classification problems use an algorithm to accurately assign test data into specific categories, such as separating apples from oranges, or disease or no disease etc.
- Regression is another type of supervised learning method that uses an algorithm to understand the relationship between dependent and independent variables. For example, study relationship between two variables like height and weight.
By measuring and labeling inputs and outputs, the model can learn over time.
What is Unsupervised Learning?
In unsupervised learning, information that is neither classified nor labeled is used to train a machine, allowing the algorithm to act based on that information without guidance. The objective of the algorithm is to group unsorted data based on similarity, pattern, and difference, without any prior training. The machine does not receive any training, as opposed to supervised learning. Therefore, the machine is limited to finding the hidden structure in unlabeled data on its own.
For instance, if we consider the above example as fruits, it will identify the patterns like long, round or curvy, and similarities and categorize them. If we have apples and bananas, we might have two different clusters of bananas and apples after running the algorithm.
Unsupervised learning models are used for three main tasks: clustering, association and dimensionality reduction:
- Clustering is a data mining technique for grouping unlabeled data based on their similarities or differences. For instance, Can we try to find groups of patients with similar symptoms and pain characteristics?
- Association is another type of unsupervised learning method that uses different rules to find relationships between variables in a given dataset. For example, a person buying a product X also tend to buy product Y.
- Dimensionality reduction is a learning technique used when the number of features (or dimensions) in a given dataset is too high.
Summary
Supervised Learning | Unsupervised Learning | |
Goals | Predict outcome of New data | Insight into a large volume of data |
Complexity | Simple | Computationally complex |
Accuracy | Better accuracy | Less accurate |
We can choose our methods depending on our input data (labelled or unlabelled) and by defining our goals. Among other things, supervised learning models are ideal for spam detection, sentiment analysis, weather forecasting, and price prediction. Unsupervised learning is better suited for anomaly detection, recommendation engines, customer persona, and medical imaging.
Further reading:
I found these blogs and youtube videos useful for basic understanding and exploring various methods in these categories:
https://www.technologynetworks.com/informatics/articles/supervised-vs-unsupervised-learning-352077
https://towardsdatascience.com/supervised-vs-unsupervised-learning-14f68e32ea8d
https://link.springer.com/book/10.1007/978-0-387-84858-7
I hope you found the blog insightful. Will discuss one of the unsupervised method, clustering in my next blog.
References :
Elements of statistical learning – Trevor Hastie, Robert Tibshirani , Jerome Friedman
https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning