Python Pandas String To Integer And Integer To String Of DataFrame

Python Pandas is a great library for doing data analysis. While doing the analysis, we have to often convert data from one format to another. In this tutorial I will show you how to convert String to Integer format and vice versa.

There are two primary ways to convert data type.

  1. astype()
  2. to_numeric()

Before we dive in to each of these methods. Lets first talk about our data for this exercise.

Lets create a dummy dataframe with 5 student with their names and ids. For real example checkout Merge and Join DataFrames with Pandas in Python

In [1]:
import pandas as pd
In [2]:
from pandas import DataFrame

studentinfo = {'studentname': ['John','Kyle','Chloe'],
        'studentid': [1,2,3]
        }

df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])

Ok our dataframe is created. Lets check the datatypes.

In [3]:
df.dtypes
Out[3]:
studentname    object
studentid       int64
dtype: object

Ok our studentname column is type 'object' and studentid is int64.

Convert Integer To Str Using astype() method of Python Pandas Dataframe

Lets convert our column studentid column from int to str first.

In [4]:
df['studentid'].astype('str').dtype
Out[4]:
dtype('O')

As we see above astype('str') has converted integer column to string data type.

We haven't save our new data yet. Lets save our new data type.

In [5]:
df['studentid'] = df['studentid'].astype('str')
In [6]:
df['studentid'].dtype
Out[6]:
dtype('O')

Convert Str To Int Using astype() method of Python Pandas Dataframe

Similarly we can convert back String to Integer back.

In [7]:
df['studentid'] = df['studentid'].astype('int')
In [8]:
df['studentid'].dtype
Out[8]:
dtype('int64')

Convert Str To Int Using to_numeric() method of Python Pandas Dataframe

Lets first convert to string using our astype method.

In [10]:
df['studentid'] = df['studentid'].astype('str')
In [11]:
df['studentid'].dtype
Out[11]:
dtype('O')

Ok lets convert our object type to int now using to_numeric() method of Dataframe.

In [12]:
pd.to_numeric(df['studentid'])
Out[12]:
0    1
1    2
2    3
Name: studentid, dtype: int64

There you go, we got the int64 data type back.

to_numeric has few options which are worth mentioning here. We can use the argument downcast to specify data type.

to_numeric has following data types int8(signed), int64(default), float32(float) and float64(default). It has data types for date too but I will let you explore that.

In [17]:
pd.to_numeric(df['studentid'],downcast='signed')
Out[17]:
0    1
1    2
2    3
Name: studentid, dtype: int8

We got int8 with signed.

In [21]:
pd.to_numeric(df['studentid'],downcast='float')
Out[21]:
0    1.0
1    2.0
2    3.0
Name: studentid, dtype: float32

Your data might have values which couldn't be converted to a particular data type and raise an error. Lets do an example.

How To Handle Empty Values While Converting Data From Str To Int DataFrame

lets add an empty value to our dataframe.

In [22]:
studentinfo = {'studentname': ['John','Kyle','Chloe','Renee'],
        'studentid': [1,2,3,""]
        }
df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])

In our dataframe we added a new student name Renee with student id entry empty. Lets first check our data types.

In [24]:
df.dtypes
Out[24]:
studentname    object
studentid      object
dtype: object

Ok one point to notice here is that, since studentid has an empty entry. DataFrame method converted the id column to object by default.

Lets try our astype(int) method now.

In [26]:
df['studentid'].astype('int')

I got the following error...

ValueError: invalid literal for int() with base 10: ''

astype has option 'errors' which is by default set to errors='raise'

We can set it to errors='ignore' to get rid of above error. Lets try that.

In [31]:
df['studentid'].astype('int',errors='ignore')
Out[31]:
0    1
1    2
2    3
3     
Name: studentid, dtype: object

We got rid of the above error but still the data type didnt change from object to int, but it is still object.

Therefore better way is to use to_numeric() method.

Dataframe to_numeric() method has option errors='coerce', which will convert non numeric values to NaN and at the same time convert the data type to int. Lets try that.

In [48]:
pd.to_numeric(df['studentid'],errors='coerce')
Out[48]:
0    1.0
1    2.0
2    3.0
3    NaN
Name: studentid, dtype: float64

As we see above the non numeric value got changed to NaN, but by default we got the data type float64 although numeric but not int. Lets try to specify the downcast=signed to get int.

In [49]:
pd.to_numeric(df['studentid'],errors='coerce',downcast='signed')
Out[49]:
0    1.0
1    2.0
2    3.0
3    NaN
Name: studentid, dtype: float64

No we didnt' get the int8 value even with downcast='signed' but instead got float64.

Wrap Up!

This post has touched upon basics of astype() and to_numeric() method. There are other data types which can be casted to using the above two methods. Please checkout yourself.

Related Topics