In machine learning, supervised and unsupervised learning are two main types of approaches:
In supervised learning, the model is trained on a labeled dataset. This means we provide both input data and the corresponding output labels to the model. The goal is for the model to learn the relationship between inputs and outputs so it can predict new, unseen data. Common examples include classification (e.g., email spam detection) and regression (e.g., predicting house prices).
For example, if you have an image dataset labeled with “cat” or “dog,” the model learns to classify new images as either a cat or dog based on this training.
In this example, we have a dataset containing information about diamonds. The supervised machine learning approach focuses on predicting a specific target column based on other features in the dataset.
• Target is a Number (Regression): If the target column is numerical (e.g., “Price”), the goal is to predict the diamond’s price based on features like cut, color, and clarity. This is called regression.
• Target is Text (Classification): If the target is categorical (e.g., “Cut” with values like Ideal, Very Good), the goal is to classify diamonds into categories based on their characteristics. This is known as classification.
In unsupervised learning, the model is given only input data without any labeled outputs. The goal is to find patterns or groupings within the data. A common task here is clustering (e.g., grouping customers by purchasing behavior) and dimensionality reduction (e.g., simplifying data visualization).
For example, if you have images without labels, the model could group similar images together (like cats in one group and dogs in another) based on visual similarities.
In unsupervised learning, there is no target column or labeled output provided. Instead, the model analyzes patterns within the data to group or cluster similar items together.
In this diamond dataset example:
• We don’t specify a target column (like price or cut); instead, the goal is to find natural groupings of data points based on their features.
• Here, clustering is used to identify groups of diamonds with similar Carat Weight and Price characteristics.
• The scatter plot on the right shows how the diamonds are grouped into different clusters (e.g., cluster0, cluster1, etc.), revealing patterns in the data without needing predefined labels.
This approach is useful when you want the model to identify hidden structures or patterns within the data. Unsupervised learning is often used for customer segmentation, anomaly detection, and recommendation systems.