How To Iterate Over Rows In A Dataframe In Pandas
Iterating through Pandas is slow and generally not recommended. Recommended way is to use apply() method.
But if one has to loop through dataframe, there are mainly two ways to iterate rows.
- iterrows()
- itertuples()
Let us download a following CSV data from the given link.
In [1]:
!wget http://faculty.marshall.usc.edu/gareth-james/ISL/College.csv
In [2]:
import pandas as pd
import time
In [3]:
df = pd.read_csv('College.csv')
In [4]:
df.head(1)
Out[4]:
In [5]:
len(df)
Out[5]:
There are 777 rows in our dataframe.
Loop through dataframe using iterrows()
In [6]:
st = time.time()
for index, row in df.iterrows():
i,r = index,row['Apps']*1
end = time.time()
In [7]:
print(end-st)
Loop through dataframe using itertuples()
In [8]:
st = time.time()
for row in df.itertuples():
apps = row.Apps*1
end = time.time()
In [9]:
print(end-st)
Loop through dataframe using apply()
In [18]:
st = time.time()
df['Apps'] = df.apply(lambda x: x['Apps']*1,axis=1)
end = time.time()
In [20]:
print(end-st)
As we see above, surprisingly itertuples() emerged to be fastest and iterrows() to be the slowest. But note, df.apply(), we are changing original dataframe which might be making df.apply() slower. Also df.apply() is less code that is less number of variables and code is much cleaner.
Related Notebooks
- How To Append Rows With Concat to a Pandas DataFrame
- How to Export Pandas DataFrame to a CSV File
- How to Convert Python Pandas DataFrame into a List
- How To Drop One Or More Columns In Pandas Dataframe
- How to Plot a Histogram in Python
- How To Write DataFrame To CSV In R
- Pandas How To Sort Columns And Rows
- Pandas Groupby Count of Rows In Each Group
- 3 Ways to Rename Columns in Pandas DataFrame