# Lasso and Ridge Linear Regression Regularization

This post is part 2 of Linear Regression and Regularizaton series. Please check part 1 Machine Learning Linear Regression And Regularization

In :
library(h2o)
h2o.init()


Lets import our data file student-mat.csv

In :
st_mat <- h2o.importFile('student-mat.csv')

  |======================================================================| 100%

In :
students.splits <- h2o.splitFrame(data =  st_mat, ratios = .8)
train <- students.splits[]
valid <- students.splits[]
y = 28
x=-match(c("Walc","Dalc"),names(st_mat))


## H2o Generalized Linear Regression Model (GLM)

In :
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE)

  |======================================================================| 100%


If we do model$model_summary, we can see which model type has been run by h2o default. In : students.glm@model$model_summary

A H2OTable: 1 × 7
<chr><chr><chr><int><int><int><chr>
gaussianidentityElastic Net (alpha = 0.5, lambda = 0.101 )57101RTMP_sid_88ca_2

Above tables shows that regression type is "gaussian". Also the table shows regularization type which is Elastic Net.

## Regularization

H2o's glm fits linear regression using maximum log-likelihood. We can use regularization to better fit the model. Using regularization H2O tries to maximize difference of "GLM max log-likelihood" and "regularization".

There are 3 types of regularization techniques.

1. Lasso Regression (L1)
2. Ridge Regression (L2)
3. Elastic Net (Weighted sum of (L1 + L2))

Regularization depends upon hyper tuning parameter alpha and lambda. For lambda > 0, if alpha = 1, we get Lasso. For alpha = 0, we get Ridge regression. Otherwise for alpha between 0 and 1, we get Elastic Net regression.

Let us check what should be the optimal value of alpha for our dataset. Let us give it a list of values to choose alpha from.

In :
hyper_params <- list( alpha = c(0, .25, .5, .75, .1, 1) )

In :
h2o.grid(x=x,y=y, training_frame = train,
validation_frame = valid,hyper_params=hyper_params,
search_criteria = list(strategy = "Cartesian"),algorithm = "glm",
grid_id = "student_grid")

  |======================================================================| 100%

H2O Grid Details
================

Grid ID: student_grid
Used hyper parameters:
-  alpha
Number of models: 12
Number of failed models: 0

Hyper-Parameter Search Summary: ordered by increasing residual_deviance
alpha             model_ids  residual_deviance
1   [0.0]  student_grid_model_7  79.50790677500659
2   [1.0] student_grid_model_12   91.2447911418529
3  [0.75] student_grid_model_10  91.55635741162314
4   [0.5]  student_grid_model_9  92.18487887050757
5  [0.25]  student_grid_model_8  94.04144279433028
6   [0.1] student_grid_model_11  98.87271830795697
7   [0.1] student_grid_model_11  98.87271830795697
8   [0.5]  student_grid_model_3 106.02649678592279
9  [0.75]  student_grid_model_4   106.323804549756
10 [0.25]  student_grid_model_2 106.33857113059179
11  [0.1]  student_grid_model_5  108.2715773332973
12  [0.0]  student_grid_model_1 109.03048641410442

As we see above, for alpha = 0.5, we get the least MSE. Ridge regression which is alpha = 0 has the most MSE. Lasso regression which is alpha = 1 also not doing that good.

### Lasso Regression

Lasso regression represents the L1 penality. Lasso is also sometimes called a variable selection technique. Lasso depends upon the tunining parameter lambda. As lambda becomes huge, the co-efficient value becomes zero.

To apply Lasso regularization, set alpha = 1

In :
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE,alpha=1)

  |======================================================================| 100%


Let us look at the Lasso model summary.

In :
students.glm@model$model_summary  A H2OTable: 1 × 7 familylinkregularizationnumber_of_predictors_totalnumber_of_active_predictorsnumber_of_iterationstraining_frame <chr><chr><chr><int><int><int><chr> gaussianidentityLasso (lambda = 0.05048 )57101RTMP_sid_88ca_2 As we see above regularization is Lasso, with lamdba = 0.05612 As I said, Lasso is predictor selection technique. We can simlply fiter our predictors based on coefficient value greater than zero as shown below. In : students.glm@model$coefficients_table[students.glm@model$coefficients_table$coefficients > 0,]

A H2OTable: 6 × 3
namescoefficientsstandardized_coefficients
<chr><dbl><dbl>
1Intercept 2.174234662.59851126
48traveltime0.166250750.12113867
50failures 0.045680470.03478202
53goout 0.419705040.47231209
54health 0.068630530.09553533
55absences 0.015455130.11203287
In :
print(h2o.mse(students.glm, valid=TRUE))

 1.1232


## Ridge Regression

In Ridge regression, we set alpha = 0 as shown below.

In :
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE,alpha=0)

  |======================================================================| 100%


Let us print the MSE.

In :
print(h2o.mse(students.glm, valid=TRUE))

 0.9985721

In :
students.glm@model\$model_summary

A H2OTable: 1 × 7