Select Pandas Dataframe Rows And Columns Using iloc loc and ix

In this post, I will talk about how to use Python library Pandas iloc, loc and ix functions to select rows and columns from csv and excel files

I will be using college.csv data which has details about university admissions.

Lets start with importing pandas library and read_csv to read the csv file

In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('data/College.csv')
In [3]:
df.head(2)
Out[3]:
Unnamed: 0 Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
1 Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56

How To Use dataframe loc To Select Rows

Lets check what df.loc actually used for, if you do df.loc?, you will find following documentation...

Access a group of rows and columns by label(s) or a boolean array.

.loc[] is primarily label based, but may also be used with a boolean array.

Lets try to select the columns by labels first. We will have to use double index [[]]

In [5]:
df.loc[['Apps']]

I got following error

KeyError: "None of [Index(['Apps'], dtype='object')] are in the [index]"

The reason for above error is that Apps is not an index. We will have to set the column names as index first. Lets see what is index set to right now.

In [6]:
df.index
Out[6]:
RangeIndex(start=0, stop=777, step=1)

The index is set from 0 to 777 which is actually row numbers. Of course we can use current index to select the rows. Lets try that.

How To Select Row By Index Using Pandas loc

In [7]:
df.loc[0]
Out[7]:
Unnamed: 0     Abilene Christian University
Private                                 Yes
Apps                                   1660
Accept                                 1232
Enroll                                  721
Top10perc                                23
Top25perc                                52
F.Undergrad                            2885
P.Undergrad                             537
Outstate                               7440
Room.Board                             3300
Books                                   450
Personal                               2200
PhD                                      70
Terminal                                 78
S.F.Ratio                              18.1
perc.alumni                              12
Expend                                 7041
Grad.Rate                                60
Name: 0, dtype: object

As We see, we got the first row. Lets add column University Name as index and see what happens. We will have to rename the University Name which is Unname:0 in dataframe.

In [8]:
df.rename(columns={'Unnamed: 0':'univ'},inplace=True)
In [9]:
df.set_index('univ',inplace=True)
In [10]:
df.head(2)
Out[10]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
univ
Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56

Lets Try to select the row by university name

In [11]:
df.loc['Abilene Christian University']
Out[11]:
Private         Yes
Apps           1660
Accept         1232
Enroll          721
Top10perc        23
Top25perc        52
F.Undergrad    2885
P.Undergrad     537
Outstate       7440
Room.Board     3300
Books           450
Personal       2200
PhD              70
Terminal         78
S.F.Ratio      18.1
perc.alumni      12
Expend         7041
Grad.Rate        60
Name: Abilene Christian University, dtype: object

How do you remove the indexes. Use reset_index()

In [12]:
df = df.reset_index()
In [13]:
df.head(1)
Out[13]:
univ Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60

How To Use dataframe loc To Select Columns

Lets set the university as index again. This time, we want to select a particular column data for a partiular row.

In [14]:
df.set_index('univ',inplace=True)
In [15]:
df.loc['Abilene Christian University',['Apps']]
Out[15]:
Apps    1660
Name: Abilene Christian University, dtype: object

Lets say we want to select Apps column for two rows. Check the double indexes [[]] for rows

In [16]:
df.loc[['Abilene Christian University','Adelphi University'],['Apps']]
Out[16]:
Apps
univ
Abilene Christian University 1660
Adelphi University 2186

Lets say we want to print all the rows for column 'Apps'. Look out for syntax : in the below command, it means all the rows

In [17]:
df.loc[:,'Apps']
Out[17]:
univ
Abilene Christian University       1660
Adelphi University                 2186
Adrian College                     1428
Agnes Scott College                 417
Alaska Pacific University           193
                                  ...  
Worcester State College            2197
Xavier University                  1959
Xavier University of Louisiana     2097
Yale University                   10705
York College of Pennsylvania       2989
Name: Apps, Length: 777, dtype: int64

How To Use Pandas Dataframe iloc

Pandas iloc can be used to select both rows and columns.

Python Select Row By Index Using Pandas iloc

We can give the rows a range, lets say we want to select first 2 rows and just print all the columns.

In [18]:
df.iloc[:2,:]
Out[18]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
univ
Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60
Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56

We can give it different index numbers. Print rows 1 ,4 and 5

In [19]:
df.iloc[[1,4,5],:]
Out[19]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
univ
Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56
Alaska Pacific University Yes 193 146 55 16 44 249 869 7560 4120 800 1500 76 72 11.9 2 10922 15
Albertson College Yes 587 479 158 38 62 678 41 13500 3335 500 675 67 73 9.4 11 9727 55

Lets try the above command by the univ index names but for that you will have to use the loc command

In [20]:
df.loc[['Adelphi University','Alaska Pacific University','Albertson College'],:]
Out[20]:
Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
univ
Adelphi University Yes 2186 1924 512 16 29 2683 1227 12280 6450 750 1500 29 30 12.2 16 10527 56
Alaska Pacific University Yes 193 146 55 16 44 249 869 7560 4120 800 1500 76 72 11.9 2 10922 15
Albertson College Yes 587 479 158 38 62 678 41 13500 3335 500 675 67 73 9.4 11 9727 55

Python Select Column By Index Using Pandas iloc

Let us print the first two columns only.

In [21]:
df.iloc[:,[1,2]].head(2)
Out[21]:
Apps Accept
univ
Abilene Christian University 1660 1232
Adelphi University 2186 1924

Note the indices we are using [1,2], that means column 1 and 2 only. We can combine the indexing on both rows and columns.

Example: Print first two rows from first two columns only without using head(2) method this time.

In [22]:
df.iloc[[1,2],[1,2]]
Out[22]:
Apps Accept
univ
Adelphi University 2186 1924
Adrian College 1428 1097

How To Use .ix in Pandas

ix is a hybrid of both loc and iloc. Meaning we can use ix in place of loc and .loc. Pandas .ix has been deprecated in latest version. But If you are still using older version of Python, the following two commands would work.

Lets try with an example. The below command is same command as df.loc[['Adelphi University','Alaska Pacific University','Albertson College'],:]

In [26]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
df.ix[['Adelphi University','Alaska Pacific University','Albertson College'],:]

Note: I used "import warnings" module just to supress the future warnings. Otherwise you would see big warning message about ix that .ix has been deprecated.

Similarly the below command is same as df.iloc[:2,:]

In [27]:
df.ix[:2,:]

There You go, we got the same result that we got with the iloc.