Measures of spread tell how spread the data points are. Some examples of measures of spread are quantiles, variance, standard deviation and mean absolute deviation.

In this excercise we are going to get the measures of spread using python.

We will use a dataset from kaggle, follow https://www.kaggle.com/datasets/himanshunakrani/student-study-hours to access the data

Quantiles

Quantiles are values that split sorted data or a probability distribution into equal parts. There several different types of quantlies, here are some of the examples:

- Quartiles - Divides the data into 4 equal parts.
- Quintiles - Divides the data into 5 equal parts.
- Deciles - Divides the data into 10 equal parts
- Percentiles - Divides the data into 100 equal parts

Let us download the libraries we will use

```
import numpy as np
import pandas as pd
```

We will now load the data that we'll use.

```
df = pd.read_csv('score.csv')
print(df.head())
```

Let's calculate the quartiles for the scores. These are the 5 data points in the scores that divide the scores into 4 equal parts.

```
print(np.quantile(df['Scores'], [0, 0.25, 0.5, 0.75, 1]))
```

Quantiles using linspace( )

It can become quite tedious to list all the points when getting quantiles, more so in cases of higher quantiles such as deciles and percentiles. For such cases we can make use of the linspace( )

Let's get the quartiles of the scores

```
print(np.quantile(df['Scores'], np.linspace(0, 1, 5)))
```

Let's get the quintiles

```
print(np.quantile(df['Scores'], np.linspace(0, 1, 6)))
```

Let's get the deciles

```
print(np.quantile(df['Scores'], np.linspace(0, 1, 11)))
```

Interquartile Range (IQR)

This is the difference between the 3rd and the 1st quartile. The IQR tells the spread of the middle half of the data.

Let's get the IQR for the scores

```
IQR = np.quantile(df['Scores'], 0.75) - np.quantile(df['Scores'], 0.25)
print(IQR)
```

Another way we can get IQR is by using iqr( ) from the scipy library

```
from scipy.stats import iqr
IQR = iqr(df['Scores'])
print(IQR)
```

Outliers

These are data points that are usually different or detached from the rest of the data points.

A data point is an outlier if:

data < 1st quartile − 1.5 * IQR

`or`

data > 3rd quartile + 1.5 * IQR

Let's get the outliers in the scores

```
# first get iqr
iqr= iqr(df['Scores'])
# then get lower & upper threshold
lower_threshold = np.quantile(df['Scores'], 0.25)
upper_threshold = np.quantile(df['Scores'], 0.75)
# then find outliers
outliers = df[(df['Scores'] < lower_threshold) | (df['Scores'] > upper_threshold)]
print(outliers)
```

- Variance

Varience is the average of the squared distance between each data point and the mean of the data.

Let's calculate the variance of the scores. We will use np.var( )

```
print(np.var(df['Scores'], ddof=1))
```

with the 'ddof=1' included, it means that the variance we get is the sample variance, if it is excluded then we get the population variance.

Let's see that here below.

```
print(np.var(df['Scores']))
```

- Standard deviation

This is the squareroot of the variance.

Let's get the standard deviation of the scores

```
print(np.sqrt(np.var(df['Scores'], ddof=1)))
```

Another way we can get standard deviation is by np.std( )

Let's use that

```
print(np.std(df['Scores'], ddof=1))
```

- Mean Absolute Deviation

This is the average of the distance between each data point and the mean of the data.

Let's find the mean absolute distance of the scores

```
# first find the distance between the data points and the mean
dists = df['Scores'] - np.mean(df['Scores'])
# find the mean absolute
print(np.mean(np.abs(dists)))
```

decsribe( ) method

The pandas describe( ) method can be used to calculate some statistical data of a dataframe. The dataframe must contain numerical data for the describe( ) method to be used.

We can make use of it to get some of the measurements that have been mentioned above.

```
df['Scores'].describe()
```

#### Related Notebooks

- How To Install Python With Conda
- How To Parse Yahoo Finance News Feed With Python
- An Anatomy of Key Tricks in word2vec project with examples
- Python IndexError List Index Out of Range
- A Study of the TextRank Algorithm in Python
- How to Sort Pandas DataFrame with Examples
- How To Analyze Yahoo Finance Data With R
- How To Replace na Values with Zeros In R Dataframe
- How To Convert Python List To Pandas DataFrame