Lasso and Ridge Linear Regression Regularization

This post is part 2 of Linear Regression and Regularizaton series. Please check part 1 Machine Learning Linear Regression And Regularization

library(h2o)
h2o.init()

Lets import our data file student-mat.csv

st_mat <- h2o.importFile('student-mat.csv')

  |======================================================================| 100%

students.splits <- h2o.splitFrame(data =  st_mat, ratios = .8)
train <- students.splits[[1]]
valid <- students.splits[[2]]
y = 28
x=-match(c("Walc","Dalc"),names(st_mat))

H2o Generalized Linear Regression Model (GLM)

students.glm <- h2o.glm(x=x,y=y, training_frame = train,
                        validation_frame = valid,remove_collinear_columns = TRUE)

  |======================================================================| 100%

If we do model$model_summary, we can see which model type has been run by h2o default.

students.glm@model$model_summary

Above tables shows that regression type is "gaussian". Also the table shows regularization type which is Elastic Net.

Regularization

H2o's glm fits linear regression using maximum log-likelihood. We can use regularization to better fit the model. Using regularization H2O tries to maximize difference of "GLM max log-likelihood" and "regularization".

There are 3 types of regularization techniques.

Lasso Regression (L1)
Ridge Regression (L2)
Elastic Net (Weighted sum of (L1 + L2))

Regularization depends upon hyper tuning parameter alpha and lambda. For lambda > 0, if alpha = 1, we get Lasso. For alpha = 0, we get Ridge regression. Otherwise for alpha between 0 and 1, we get Elastic Net regression.

Let us check what should be the optimal value of alpha for our dataset. Let us give it a list of values to choose alpha from.

hyper_params <- list( alpha = c(0, .25, .5, .75, .1, 1) )

h2o.grid(x=x,y=y, training_frame = train,
                        validation_frame = valid,hyper_params=hyper_params,
                         search_criteria = list(strategy = "Cartesian"),algorithm = "glm",
                        grid_id = "student_grid")

  |======================================================================| 100%

H2O Grid Details
================

Grid ID: student_grid 
Used hyper parameters: 
  -  alpha 
Number of models: 12 
Number of failed models: 0 

Hyper-Parameter Search Summary: ordered by increasing residual_deviance
    alpha             model_ids  residual_deviance
1   [0.0]  student_grid_model_7  79.50790677500659
2   [1.0] student_grid_model_12   91.2447911418529
3  [0.75] student_grid_model_10  91.55635741162314
4   [0.5]  student_grid_model_9  92.18487887050757
5  [0.25]  student_grid_model_8  94.04144279433028
6   [0.1] student_grid_model_11  98.87271830795697
7   [0.1] student_grid_model_11  98.87271830795697
8   [0.5]  student_grid_model_3 106.02649678592279
9  [0.75]  student_grid_model_4   106.323804549756
10 [0.25]  student_grid_model_2 106.33857113059179
11  [0.1]  student_grid_model_5  108.2715773332973
12  [0.0]  student_grid_model_1 109.03048641410442

As we see above, for alpha = 0.5, we get the least MSE. Ridge regression which is alpha = 0 has the most MSE. Lasso regression which is alpha = 1 also not doing that good.

Lasso Regression

Lasso regression represents the L1 penality. Lasso is also sometimes called a variable selection technique. Lasso depends upon the tunining parameter lambda. As lambda becomes huge, the co-efficient value becomes zero.

To apply Lasso regularization, set alpha = 1

students.glm <- h2o.glm(x=x,y=y, training_frame = train,
                        validation_frame = valid,remove_collinear_columns = TRUE,alpha=1)

  |======================================================================| 100%

Let us look at the Lasso model summary.

students.glm@model$model_summary

As we see above regularization is Lasso, with lamdba = 0.05612

As I said, Lasso is predictor selection technique. We can simlply fiter our predictors based on coefficient value greater than zero as shown below.

students.glm@model$coefficients_table[students.glm@model$coefficients_table$coefficients > 0,]

print(h2o.mse(students.glm, valid=TRUE))

[1] 1.1232

Ridge Regression

In Ridge regression, we set alpha = 0 as shown below.

students.glm <- h2o.glm(x=x,y=y, training_frame = train,
                        validation_frame = valid,remove_collinear_columns = TRUE,alpha=0)

  |======================================================================| 100%

Let us print the MSE.

print(h2o.mse(students.glm, valid=TRUE))

[1] 0.9985721

students.glm@model$model_summary

From the model summary, we can see that number of active predictors in Ridge regression are 40 which are far more than the number of predictors in Lasso regression. In Lasso regression number of predictors were only 6.

family	link	regularization	number_of_predictors_total	number_of_active_predictors	number_of_iterations	training_frame
<chr>	<chr>	<chr>	<int>	<int>	<int>	<chr>
gaussian	identity	Elastic Net (alpha = 0.5, lambda = 0.101 )	57	10	1	RTMP_sid_88ca_2

family	link	regularization	number_of_predictors_total	number_of_active_predictors	number_of_iterations	training_frame
<chr>	<chr>	<chr>	<int>	<int>	<int>	<chr>
gaussian	identity	Lasso (lambda = 0.05048 )	57	10	1	RTMP_sid_88ca_2

	names	coefficients	standardized_coefficients
	<chr>	<dbl>	<dbl>
1	Intercept	2.17423466	2.59851126
48	traveltime	0.16625075	0.12113867
50	failures	0.04568047	0.03478202
53	goout	0.41970504	0.47231209
54	health	0.06863053	0.09553533
55	absences	0.01545513	0.11203287

family	link	regularization	number_of_predictors_total	number_of_active_predictors	number_of_iterations	training_frame
<chr>	<chr>	<chr>	<int>	<int>	<int>	<chr>
gaussian	identity	Ridge ( lambda = 0.05048 )	57	40	1	RTMP_sid_88ca_2

Lasso and Ridge Linear Regression Regularization

H2o Generalized Linear Regression Model (GLM)

Regularization

Lasso Regression

Ridge Regression

Related Notebooks