What is Ridge Regression?
Ridge Regression is a regularized version of Linear Regression where a regularization term is added to the cost function to shift the regression line in order to reduce the overfitting or variance of the model. To make it simple, Suppose you have the data that is more likely to overfit for linear regression. In this case, the Error rate in training would be zero since the model predicted 100% accurate results in training, although the model will make terrible predictions when it comes to testing. So the Ridge Regression tells us to add a penalty term to the cost function that helps to overcome the problem of high variance by adding a small amount of bias that results in good and long-term predictions.
Bias:- A bias occurs when a model makes wrong assumptions throughout the process. A model with high bias is more likely to underfit the data.
Variance:- This occurs when the model has higher sensitivity to small variations in training. It performs well in training and bad in testing. A model with high variance is more likely to overfit the data.
Bias/Variance Tradeoff:- When the complexity of a model increases, typically its variance also increases and the bias will decrease. Meanwhile, when the model complexity is reduced resulting in a decrease of variance and an increase in bias. This is known as Bias/Variance Tradeoff.
When a Linear Regression model determines the values for the slope-intercept equation y = mx+b, it minimizes the cost, Meanwhile, when a Ridge Regression determines the values for the slope-intercept equation it minimizes the cost and adds a new term ƛm^2 to the Residuals sum of squares(RSS), where m^2 is the penalty and ƛ is the term that determines the strength of the penalty. ƛ can be any value from zero to positive infinity. When the value of ƛ is zero the Ridge Regression is just Linear Regression.
Total Cost Function = RSS(W) + λ*||m||²
ƛ can be chosen on the basis of cross-validation. You can easily determine the value of ƛ by looking at the cross-validation score, for which value of ƛ gives better results.
In this graph, the blue and green lines represent the Linear Regression line and Ridge Regression line respectively. The blue dots represent training data and the orange dots represent testing data.
Here, in the training data, the Linear Regression performs the best since it overlaps through all the blue dots while Ridge Regression has a little bias. But when it comes to testing data we can say that Ridge Regression performs better than Linear Regression since the Ridge Regression line is closer to the testing data than Linear Regression. Thus we can say that the Linear Regression here has a high variance than Ridge Regression.
Implementing Ridge Regression
Here is the simple way to implement Ridge Regression:
import numpy as npfrom sklearn.linear_model import Ridgefrom sklearn.metrics import mean_squared_errorX_train = np.array([[1],[2],[3],[4],[5]])y_train = np.array([[2],[4],[6],[8],[10]])X_test = np.array([[6],[7],[8],[9],[10]])y_test = np.array([[13],[15],[18],[23],[28]])ridge_reg = Ridge(alpha=-2.1, solver='cholesky')ridge_reg.fit(X_train, y_train)ridge_predictions = ridge_reg.predict(X_test)print("Ridge Predictions:", ridge_predictions)ridge_mse = mean_squared_error(ridge_predictions, y_test)print("\n MSE:", ridge_mse)Ridge Predictions: [[13.59493671][16.12658228][18.65822785][21.18987342][23.72151899]]MSE: 4.727671847460352
The predictions are not bad at all, So now let's plug the same data points in Linear Regression.
from sklearn.linear_model import LinearRegressionlin_reg = LinearRegression()lin_reg.fit(X_train, y_train)lin_predictions = lin_reg.predict(X_test)print("Linear Regression predictions:", lin_predictions)lin_mse = mean_squared_error(lin_predictions, y_test)print("\n MSE:", lin_mse)Linear Regression predictions: [[12.][14.][16.][18.][20.]]MSE: 19.0
Notice that the MSE of Linear Regression in testing is 19 while the MSE of Ridge Regression is around 4.8. This contradicts that Linear Regression has encountered a overfit in the training process.
This is a simple demonstration of Ridge Regression and how is it different from Linear Regression. When you are working with real datasets you need to understand more about the structure of data, if there are noises in the data the model is more likely to overfit, or else if there are fewer features in the data provided it's more likely to underfit. Note that too much regularization also causes the model to underfit.
Regression line formed: