How To Plot Histogram In R

Histograms are power visualization tools to analyze and present data. For this example I will use covid19 data.

To read directory from JSON API, i will use jsonlite library. If you don't have just install it using library.install("jsonlinte")

In [1]:
library(jsonlite)
In [2]:
url <- "https://pomber.github.io/covid19/timeseries.json"
covid_data <- fromJSON(url,flatten = TRUE)

Ok we have the JSON data in the variable covid_data. Let us check the names that is columns in our data. Let us check the value of ist column.

In [3]:
names(covid_data)[1]
'Afghanistan'

Ok we have covid19 data by countries. Let us print the first row of the covid19 US data.

In [4]:
head(covid_data$US,1)
A data.frame: 1 × 4
dateconfirmeddeathsrecovered
<chr><int><int><int>
12020-1-22100

As we can see above, data per country is a dataframe. Now we can easily plot the histogram using the R hist() function. Let us draw a histogram of covid19 conirmed cases of country Italy.

In [5]:
hist(covid_data$Italy$confirmed)

The y-axis shows the frequency distribution.

We can change the appearance of the histogram. Let us color it green. Also we can define the bins/breaks. We can define bins by specifying option breaks.

In [6]:
hist(covid_data$Italy$confirmed, breaks = 20, col = "green")

Also instead of frequency distribution, we can change the y-axis to probability distribution.

In [7]:
hist(covid_data$Italy$confirmed,freq = FALSE)

How to plot histogram in R using ggplot

Let us first import the package ggplot. If you dont have ggplot installed. Install using install.packages("ggplot") in your R repl.

In [8]:
library(ggplot2)

Let us plot the histogram of confirmed US covid19 cases. Note below we are passing the data to ggplot function and also adding geom_histogram figure to the plot function.

In [9]:
ggplot(covid_data$US,aes(x=confirmed)) + geom_histogram(bins = 20,color="black",fill="white")

How about plotting two histograms in the same graph. Let us plot the histogram of US recovered and number of deaths in the same graph.

In [10]:
ggplot(covid_data$US) + geom_histogram(aes(x=recovered),bins = 20,color="black",fill="green",position = "stack",alpha=0.2) +
                        geom_histogram(aes(x=deaths),bins = 20,color="black",fill="red",position = "stack",alpha=0.2)

In the command above, we have added two figures to the ggplot. First figure is for the histogram of 'recovered' cases and second figure is for the histogram of 'deaths'. Also note the option alpha=0.2, this will make overlapping regions visible.

Also note aes option which is aesthestic group. This option is responsible for setting various variables which controls the graph. Check out following link for details ggplot2.tidyverse.org/reference/aes_group_order.html

Let us one last example. In this one, we will draw histograms of recovered cases for country US and Italy and plot the histograms in the same graph.

In [11]:
ggplot() + geom_histogram(data = covid_data$US, aes(x=recovered),bins = 20,color="black",alpha = 0.2,fill="red",position = "dodge") +
                        geom_histogram(data=covid_data$Italy,aes(x=recovered),bins = 20,alpha = 0.2,color="black",fill="green",position = "dodge")

In the above example, notice, we are passing the covid19 data of recovered cases of both US and Italy in the same graph.

Wrap Up!

I hope you would find this tutorial useful.