How To Iterate Over Rows In A Dataframe In Pandas

Iterating through Pandas is slow and generally not recommended. Recommended way is to use apply() method.

But if one has to loop through dataframe, there are mainly two ways to iterate rows.

  1. iterrows()
  2. itertuples()

Let us download a following CSV data from the given link.

In [1]:
!wget http://faculty.marshall.usc.edu/gareth-james/ISL/College.csv
In [2]:
import pandas as pd
import time
In [3]:
df = pd.read_csv('College.csv')
In [4]:
df.head(1)
Out[4]:
Unnamed: 0 Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
In [5]:
len(df)
Out[5]:
777

There are 777 rows in our dataframe.

Loop through dataframe using iterrows()

In [6]:
st = time.time()
for index, row in df.iterrows():
    i,r = index,row['Apps']*1
end = time.time()
In [7]:
print(end-st)
0.10507607460021973

Loop through dataframe using itertuples()

In [8]:
st = time.time()
for row in df.itertuples():
    apps = row.Apps*1
end = time.time()
In [9]:
print(end-st)
0.010402679443359375

Loop through dataframe using apply()

In [18]:
st = time.time()
df['Apps'] = df.apply(lambda x: x['Apps']*1,axis=1)
end = time.time()
In [20]:
print(end-st)
0.02086162567138672

As we see above, surprisingly itertuples() emerged to be fastest and iterrows() to be the slowest. But note, df.apply(), we are changing original dataframe which might be making df.apply() slower. Also df.apply() is less code that is less number of variables and code is much cleaner.