How To Install R Sparklyr H2O Tensorflow Keras On Centos


Assuming you have the above requirements fulfilled. Lets first make sure we have the latest epel installed.

Run following command.

sudo yum -y install epel-release

How to install R on Centos

Now we can install R using following command.

sudo yum -y install R

How to install R H2O library on Centos

Lets install machine learning package H2O using yum. Create a repo file /etc/yum.repos.d/h2o-rpm.repo using vim.

vi /etc/yum.repos.d/h2o-rpm.repo

Add following in the above file.


Now we can install R related packages for H2O. Lets bring up the R repl. Type R on your bash or zsh cell.


Now run following commands to install H2O R packages.

if ("package:h2o" %in% search()) { detach("package:h2o", unload=TRUE) }
if ("h2o" %in% rownames(installed.packages())) { remove.packages("h2o") }
pkgs <- c("RCurl","jsonlite")
for (pkg in pkgs) {
  if (! (pkg %in% rownames(installed.packages()))) { install.packages(pkg) }

install.packages("h2o", type="source", repos=(c("")))

If you see the following output, it means H2o is installed successfully.

Run the following code to check if H2O is working fine in your R repl.

localH2O = h2o.init()

You should see a GUI like this.

Ok so far so good. Lets install Keras and Tensorflow now.

How to install R Keras and Tensorflow

In your R repl, run following command.


To install Keras, Tensforflow and all their dependencies, we will use py_install which comes with reticulate.

py_install('keras', envname='py37',method = c("auto", "virtualenv", "conda"))

envname='py37' - py37 is the python3 virtual environment I have on my machine. Replace it with your virtual env name.

Now you have Keras and Tensorflow both installed.

How to install Spark R package sparklyr

Install libcurl-devel package. Otherwise you might run in to following error.

Configuration failed because libcurl was not found.

In your bash shell, run following yum command.

sudo yum -y install libcurl-devel

Lets install R package sparklyr. In your R repl, run following command.


Let us test if Spark is working fine in R.

sc <- spark_connect(master = "local")

If above commands get executed without any errors, then you are fine.

At this point, we are done. If you want to access R in python Jupyter notebook. Do the following steps.

How to access R in Jupyter notebook

In your R repl, do following...


Now restart your Jupyter notebook, you should see R repl as one of your kernels and you should be able to access all the machine learning libraries that we installed from the Jupyter notebook.

Related Topics:

How To Analyze Data Using Pyspark RDD

How To Read CSV Using Python Pyspark