3 Ways to Rename Columns in Pandas DataFrame

For this excercise lets create a dummy dataframe.

Creating the dataframe is very easy. Lets first create a Python dictionary of arrays as shown below.

In [1]:
import pandas as pd
In [2]:
data = {'Name of President':['Barack Obama', 'George Bush', 'Donald Trump'], 'Age of President':[54, 78, 74]} 

Once we have the data in above form, we can use the command pd.Dataframe to create our dataframe. Keys will be column names and values will be rows

In [3]:
pd.DataFrame(data)
Out[3]:
Name of President Age of President
0 Barack Obama 54
1 George Bush 78
2 Donald Trump 74

Ok lets save the dataframe in to a variable now

In [4]:
df = pd.DataFrame(data)

Lets print out the column names

In [5]:
df.columns
Out[5]:
Index(['Name of President', 'Age of President'], dtype='object')

Columns can be accessed directly by name df.colname

But since we have spaces in the name of columns, we can't do that. Therefore it is not a good practice to have spaces in the column names. Ofcourse we can access them by another way.

In [6]:
df['Name of President']
Out[6]:
0    Barack Obama
1     George Bush
2    Donald Trump
Name: Name of President, dtype: object

How to rename column names in Python Pandas

There are 3 ways of renaming columns in Pandas.

Method 1.

First is just by the position of columns as shown below

In [7]:
df.columns = ['Name_of_President','Age_of_President']
In [8]:
df.columns
Out[8]:
Index(['Name_of_President', 'Age_of_President'], dtype='object')
In [9]:
df.head(2)
Out[9]:
Name_of_President Age_of_President
0 Barack Obama 54
1 George Bush 78

As we see above, the names have been changed, but this way, you have to do all the columns and also keep in mind the relative postitioning of this column names. We can't do selective column change as shown below.

In [11]:
df.columns = ['Name_of_President']

We got following error...

ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements

Better way is by the column name, this way we can choose column.

Method 2.

In [12]:
df.rename(columns={'Age of President':'Age_of_President'})
Out[12]:
Name_of_President Age_of_President
0 Barack Obama 54
1 George Bush 78
2 Donald Trump 74

As we see above, we selected only column "Age of President" to rename. New column has not been saved. We can do that by simply specify inplace=True option as shown below.

In [13]:
df.rename(columns={'Age of President':'Age_of_President'},inplace=True)
In [14]:
df.head()
Out[14]:
Name_of_President Age_of_President
0 Barack Obama 54
1 George Bush 78
2 Donald Trump 74

Lets change the other column name too.

In [15]:
df.rename(columns={'Name of President':'Name_of_President'},inplace=True)
In [16]:
df.head()
Out[16]:
Name_of_President Age_of_President
0 Barack Obama 54
1 George Bush 78
2 Donald Trump 74

Column can now be accessed by the dot notation also

In [17]:
df.Name_of_President
Out[17]:
0    Barack Obama
1     George Bush
2    Donald Trump
Name: Name_of_President, dtype: object

Lets create our original dataframe

In [18]:
df = pd.DataFrame(data)
In [19]:
df.head(1)
Out[19]:
Name of President Age of President
0 Barack Obama 54

How to Rename Column names by lambda Method

Method 3.

If your dataframe is large containing many columns and column names have spaces. Then it will be tedious to rename all the column names one by one. Better way is to use the lambda method

In [20]:
df = pd.DataFrame(data)
In [21]:
df.rename(columns=lambda x: x.replace(" ","_"))
Out[21]:
Name_of_President Age_of_President
0 Barack Obama 54
1 George Bush 78
2 Donald Trump 74

There you go. We achieved the same using the lambda funciton.