# How to Sort Pandas DataFrame with Examples

For this exercise I will be using Movie database which I have downloaded from Kaggle.

```
import pandas as pd
```

For this exercise we will be using ratings.csv file which comes with movie database.

```
ls moviesdb/ml-latest/ratings.csv
```

```
df = pd.read_csv('moviesdb/ml-latest/ratings.csv')
```

```
df.head(2)
```

Lets check the size of our dataframe

```
df.count()
```

The dataframe has roughly 26 million entries. Each User has multiple entries. Lets check how many unique users are there.

```
len(df.userId.unique())
```

So there are 270k Unique users are present in our db. Lets checkout how many unique movies are there.

```
len(df.movieId.unique())
```

Ok, roughly around 45k movies are present for which we have the ratings present.

Lets also check the range of our rating column. For that we can use describe() method on the entire dataframe. Describe() method gives us the nice statistics.

```
df.describe()
```

As we see above, rating has minimum value of 0.5 and max value of 5.0.

Now we know our dataframe much better, lets do sort operations now.

## How To Do Sort By Values

Lets say we want to sort the movies by ratings from high to low that is in descending order.

```
df.sort_values(by=['rating'],ascending=False).head(2)
```

As we see above, the first two rows are of rating 5 now. Remember above command has not saved the data in the descending order. If we want to save the data, use the option inplace=True

```
df.sort_values(by=['rating'],ascending=False,inplace=True)
```

```
df.head(2)
```

For ascending just use the option, ascending=True, as shown below...

```
df.sort_values(by=['rating'],ascending=True).head(2)
```

## How To Sort Multiple Columns In Pandas

We can also sort by multiple columns in Pandas, lets say we want to sort highest to lowest rating and then by highest to lowest movieId.

```
df.sort_values(by=['rating','movieId'],ascending=False).head(2)
```

As we see above, rating and movieId are both showing as highest to lowest. Lets add another column userId, it doesnt make sense to sort movieId and userId in this example but for the sake of exercise we will anyway do it.

```
df.sort_values(by=['rating','movieId','userId'],ascending=False).head(2)
```

We saw that nothing much changed vs our previous command. Lets do a little change now, lets sort userId before movieId and see what happens

```
df.sort_values(by=['rating','userId','movieId'],ascending=False).head(2)
```

We see a different result, what just happened. We sorted first rating then userId. Since 270896 has give more than one movie ratings, the code above sorts the movieId of user 270896 further from highest to lowest.

Lets do a meaniful example, Lets find the average rating by a user and sort it by highest to lowest. For this query, we will have to groupby first user and apply average operation on the column rating and then will have to sort it by rating. Lets try this.

```
import numpy as np
```

```
df1 = df.groupby('userId')['rating'].apply(np.average)
```

Lets check our dataframe df1. Remember we have not sorted the dataframe yet.

```
df1.head()
```

Lets check the datatype of df1

```
df1.__class__
```

It has changed to Pandas Series. Lets change it back to dataframe by using reset_index()

```
df2 = df1.reset_index()
```

```
df2.__class__
```

```
df2.head(2)
```

Now we have our dataframe back, lets sort by the rating from highest to lowest

```
df2.sort_values(by=['rating'],ascending=False,inplace=True)
```

```
df2.head(5)
```

The first column in the above dataframe is the row ids. Our data frame is sorted by rating and now we know which user has on average given hightest rating.

## How To Sort Multi-level Pandas Data Frame

Let us first create two levels data frame. We will use df.set_index() to do that.

```
dfl = df.set_index(['movieId','rating'])
dfl.sort_index(inplace=TruE)
```

```
dfl.head(2)
```

As we see above, Dataframe dfl contains levels "movieId" and "rating".

Let us sort the levels by Ascending movieId and Descending rating. This will sort the movieId starting from lowest to highest and for each movieId, the rating will be sorted from highest to lowest.

```
dfl.sort_index(level=[0,1],ascending=[True,False]).head(2)
```

## Related Notebooks

- Pandas How To Sort Columns And Rows
- How To Iterate Over Rows In A Dataframe In Pandas
- How to Convert Python Pandas DataFrame into a List
- How To Replace na Values with Zeros In R Dataframe
- How to Export Pandas DataFrame to a CSV File
- How To Drop One Or More Columns In Pandas Dataframe
- Convert Pandas DataFrame To Numpy Arrays
- How to Create DataFrame in R Using Examples
- Data Analysis With Pyspark Dataframe