How To Iterate Over Rows In A Dataframe In Pandas

Iterating through Pandas is slow and generally not recommended. Recommended way is to use apply() method.

But if one has to loop through dataframe, there are mainly two ways to iterate rows.

iterrows()
itertuples()

Let us download a following CSV data from the given link.

!wget http://faculty.marshall.usc.edu/gareth-james/ISL/College.csv

import pandas as pd
import time

df = pd.read_csv('College.csv')

df.head(1)

len(df)

777

There are 777 rows in our dataframe.

Loop through dataframe using iterrows()

st = time.time()
for index, row in df.iterrows():
    i,r = index,row['Apps']*1
end = time.time()

print(end-st)

0.10507607460021973

Loop through dataframe using itertuples()

st = time.time()
for row in df.itertuples():
    apps = row.Apps*1
end = time.time()

print(end-st)

0.010402679443359375

Loop through dataframe using apply()

st = time.time()
df['Apps'] = df.apply(lambda x: x['Apps']*1,axis=1)
end = time.time()

print(end-st)

0.02086162567138672

As we see above, surprisingly itertuples() emerged to be fastest and iterrows() to be the slowest. But note, df.apply(), we are changing original dataframe which might be making df.apply() slower. Also df.apply() is less code that is less number of variables and code is much cleaner.

How To Iterate Over Rows In A Dataframe In Pandas

Loop through dataframe using iterrows()

Loop through dataframe using itertuples()

Loop through dataframe using apply()

Related Notebooks