Pandas Groupby Count of Rows In Each Group

Let us first create a dummy dataframe for this tutorial.

In [3]:

import pandas as pd

df = pd.DataFrame({'Director': ['Steven Spielberg', 'Martin Scorsese', 'Steven Spielberg', 'Quentin Tarantino', 'Martin Scorsese', 'Steven Spielberg'],
                   'Movie': ['Jaws', 'Goodfellas', 'Jurassic Park', 'Pulp Fiction', 'Raging Bull', 'E.T.']})

In [4]:

print(df)

            Director          Movie
0   Steven Spielberg           Jaws
1    Martin Scorsese     Goodfellas
2   Steven Spielberg  Jurassic Park
3  Quentin Tarantino   Pulp Fiction
4    Martin Scorsese    Raging Bull
5   Steven Spielberg           E.T.

Using pandas groupby size()

To get the count of rows in each group in a Pandas groupby object based on the movies data, you can use the size() method.

In [5]:

# Group the dataframe by the 'Director' column
grouped_df = df.groupby('Director')

# Get the size of each group
group_sizes = grouped_df.size()

print(group_sizes)

Director
Martin Scorsese      2
Quentin Tarantino    1
Steven Spielberg     3
dtype: int64

Using pandas groupby count()

You can also use the count() method to get the count of rows for each group. This method will count the number of non-NA/null values in each group.`

In [6]:

# Get the count of rows for each group
group_counts = grouped_df.count()

print(group_counts)

                   Movie
Director                
Martin Scorsese        2
Quentin Tarantino      1
Steven Spielberg       3

Difference between size() and count()

Using the size() method, we will get the total number of rows in each group, On the other hand, if we use the count() method, we will get the number of non-NA/null values in each column for each group.

Pandas Groupby Count of Rows In Each Group

Using pandas groupby size()

Using pandas groupby count()

Difference between size() and count()

Related Notebooks