Classification vs Regression in ML: With Examples to Help You Understand

In this article, we’ll take a look at Classification Vs Regression and how they differ from each other With examples to help you understand.

Introduction

Classification and regression are two types of statistical techniques that you can apply to your data in order to make predictions, identify patterns, or group things together. However, they work in very different ways and when it comes to the actual implementation of these techniques, their differences become even more apparent. In this article, we’ll take a look at what the two techniques actually mean and how they differ from each other With examples to help you understand.

What is Classification?

Classification by definition is a process of categorizing a given set of data into its corresponding classes so that they can be better understood and analyzed.  

Let's understand this with an example you are given a set of images of different animals and you are required to group them into their corresponding classes i.e. mammals, birds, fish, etc. This is such an easy task for humans to do since we are used to it and have been doing it since we were kids. But what if we have to do the same thing with a machine? Machines are really powerful but they lack the ability to think and understand like humans. So, in order to make a machine classify images of animals into their corresponding groups, we need to first train the machine with some algorithms. 

We do this by feeding the machine a large data of images of animals that are labeled and grouped into their corresponding classes already. The machine then looks for patterns in this data and builds a model that can be used to classify new data. This model is then used to predict the class of new data that the machine has not seen before. Like how we can take decisions based on past experience.

Classification Examples

If we give the machine a new image of an animal, it will use the model that it has built to predict what class this new image belongs to. In machine learning, this method is known as supervised learning because the machine is given a set of training data that is already labeled. So Classification in Machine Learning is a supervised learning technique where the machine is given a set of training data and is required to learn and build a model that can be used to classify a new set of data in the future.

How classification works in machine learning?

When saying, images of animals don't mean that the algorithm can directly classify the animals by looking at the images of animals, instead it would need to learn some features about the animals from the images that it is given. These features could be the shape of the animal, the color of the fur, etc. These kinds of features are converted to numerical form probably a vector of numbers which is then given to a classification algorithm.

Classification

So, the classification algorithm looks at this vector of numbers and tries to find a decision boundary that can separate the different classes of data. Once the decision boundary is found, the machine can then use this boundary to classify new data points. And probably the new data points might be images of animals that the machine has never seen before but may have the same features before.

Types of Classification

Binary Classification: In binary classification, the machine is only given two classes to learn from. For example, if we were trying to build a machine that could distinguish between cats and dogs, we would be using binary classification.

Multi-class Classification: In multi-class classification, the machine is given more than two classes to learn from. For example, if we were trying to build a machine that could distinguish between different types of animals, we would be using multi-class classification.

Multi-label Classification:  In multi-label classification, the machine is given a set of data that can belong to multiple classes. For example, if we were trying to build a machine that could distinguish between different types of animals and also identify whether they are friendly or not, we would be using multi-label classification.

What is Regression?

When we discussed classification, we understand that classification is all about discrete values i.e. the machine is either going to predict one class or the other. However, in some cases, we might want the machine to predict a continuous value. For example, if we were trying to build a machine that could predict the price of a house based on its size, Regression comes into play. 

Regression is defined as a statistical method used to establish the relationship between a dependent variable and one or more independent variables.

In machine learning, Regression is a supervised learning technique since the dataset contains input features and the corresponding labels as well.

Let's understand Regression with an example, Suppose you want to find the price of houses based on the size, geography, resources, etc. The dependent variable would be the price of the house and the independent variables would be the size, geography, resources, etc.  Here a Regression model can predict the price of houses based on the changes in the independent variables, for instance, if the size of the house change it also affects the price, and the price of the house will increase or decrease with respect to the increase and decrease in size.

Regression Diagram

How Regression works in Machine Learning?

For a regression model, the dataset contains, numerical values related to the dependent variable, which is to be predicted. In this case, the machine is not given any classes to learn from, but instead, it is given a set of data points that are already labeled.  So the model tries to understand the distribution of data points and finds the best parameters which best fit the distribution of the data points. These parameters are those which help to make predictions about new data points in the future.

So in the case of house price prediction, we build a Regression model which
  • Learns the relationship between the size, geography, resources, etc. of a house and its price.
  • Makes predictions about the price of a new house based on its size, geography, resources, etc.

Types of Regression

Linear Regression: Linear regression is one of the simplest and most popular types of regression. In this type of regression, the dependent variable is a linear function of the independent variables. It simply finds the best fit line for the given data points and helps to predict the values.

Polynomial Regression: Polynomial regression is a type of regression that is used to fit a polynomial function to a dataset. 

Ridge Regression: Ridge regression is a type of regression that is used to penalize the coefficients of the independent variables.

Lasso Regression: Lasso regression is a type of regression that is used to select the most important variables in a dataset. 

Elastic-net Regression: Elastic-net regression is a type of regression that is used to balance the Ridge and Lasso methods. 

Classification Vs Regression

The main difference between classification and regression is that classification is a technique where the machine is given a set of training data and is required to learn and build a model that can be used to classify a new set of data into its corresponding classes in the future which is the prediction of discrete values. On the other hand, regression is a statistical method used to establish the relationship between a dependent variable and one or more independent variables and help to predict how one value changes based on the other values. Or we can say that regression is used for predicting continuous values.

Classification Vs Regression Diagram


Regression Classification
Output variables must be continuous Output variables must be discrete
Ordered predicted values Unordered
The goal is to predict continuous values based on previous data The goal is to predict discrete values based on previous data
Predictions can be done using a best-fit line to predict accurate values Predictions can be done using a decision boundary which divides the dataset into different classes
Can evaluate using Mean Squared Error technique  Can evaluate using Accuracy
Used for weather prediction, house price prediction, cancer prediction, etc. Used for speech recognition, spam email classification, etc.