Lasso and Ridge Linear Regression Regularization
This post is part 2 of Linear Regression and Regularizaton series. Please check part 1 Machine Learning Linear Regression And Regularization
library(h2o)
h2o.init()
Lets import our data file student-mat.csv
st_mat <- h2o.importFile('student-mat.csv')
students.splits <- h2o.splitFrame(data = st_mat, ratios = .8)
train <- students.splits[[1]]
valid <- students.splits[[2]]
y = 28
x=-match(c("Walc","Dalc"),names(st_mat))
H2o Generalized Linear Regression Model (GLM)
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE)
If we do model$model_summary, we can see which model type has been run by h2o default.
students.glm@model$model_summary
Above tables shows that regression type is "gaussian". Also the table shows regularization type which is Elastic Net.
Regularization
H2o's glm fits linear regression using maximum log-likelihood. We can use regularization to better fit the model. Using regularization H2O tries to maximize difference of "GLM max log-likelihood" and "regularization".
There are 3 types of regularization techniques.
- Lasso Regression (L1)
- Ridge Regression (L2)
- Elastic Net (Weighted sum of (L1 + L2))
Regularization depends upon hyper tuning parameter alpha and lambda. For lambda > 0, if alpha = 1, we get Lasso. For alpha = 0, we get Ridge regression. Otherwise for alpha between 0 and 1, we get Elastic Net regression.
Let us check what should be the optimal value of alpha for our dataset. Let us give it a list of values to choose alpha from.
hyper_params <- list( alpha = c(0, .25, .5, .75, .1, 1) )
h2o.grid(x=x,y=y, training_frame = train,
validation_frame = valid,hyper_params=hyper_params,
search_criteria = list(strategy = "Cartesian"),algorithm = "glm",
grid_id = "student_grid")
As we see above, for alpha = 0.5, we get the least MSE. Ridge regression which is alpha = 0 has the most MSE. Lasso regression which is alpha = 1 also not doing that good.
Lasso Regression
Lasso regression represents the L1 penality. Lasso is also sometimes called a variable selection technique. Lasso depends upon the tunining parameter lambda. As lambda becomes huge, the co-efficient value becomes zero.
To apply Lasso regularization, set alpha = 1
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE,alpha=1)
Let us look at the Lasso model summary.
students.glm@model$model_summary
As we see above regularization is Lasso, with lamdba = 0.05612
As I said, Lasso is predictor selection technique. We can simlply fiter our predictors based on coefficient value greater than zero as shown below.
students.glm@model$coefficients_table[students.glm@model$coefficients_table$coefficients > 0,]
print(h2o.mse(students.glm, valid=TRUE))
Ridge Regression
In Ridge regression, we set alpha = 0 as shown below.
students.glm <- h2o.glm(x=x,y=y, training_frame = train,
validation_frame = valid,remove_collinear_columns = TRUE,alpha=0)
Let us print the MSE.
print(h2o.mse(students.glm, valid=TRUE))
students.glm@model$model_summary
From the model summary, we can see that number of active predictors in Ridge regression are 40 which are far more than the number of predictors in Lasso regression. In Lasso regression number of predictors were only 6.
Related Notebooks
- Machine Learning Linear Regression And Regularization
- Regularization Techniques in Linear Regression With Python
- Rectified Linear Unit For Artificial Neural Networks Part 1 Regression
- Understanding Logistic Regression Using Python
- How To Solve Linear Equations Using Sympy In Python
- How To Run Logistic Regression In R
- How To Add Regression Line On Ggplot
- Decision Tree Regression With Hyper Parameter Tuning In Python
- Python Iterators And Generators