Python Pandas String To Integer And Integer To String Of DataFrame
Python Pandas is a great library for doing data analysis. While doing the analysis, we have to often convert data from one format to another. In this tutorial I will show you how to convert String to Integer format and vice versa.
There are two primary ways to convert data type.
- astype()
- to_numeric()
Before we dive in to each of these methods. Lets first talk about our data for this exercise.
Lets create a dummy dataframe with 5 student with their names and ids. For real example checkout Merge and Join DataFrames with Pandas in Python
import pandas as pd
from pandas import DataFrame
studentinfo = {'studentname': ['John','Kyle','Chloe'],
'studentid': [1,2,3]
}
df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])
Ok our dataframe is created. Lets check the datatypes.
df.dtypes
Ok our studentname column is type 'object' and studentid is int64.
Convert Integer To Str Using astype() method of Python Pandas Dataframe
Lets convert our column studentid column from int to str first.
df['studentid'].astype('str').dtype
As we see above astype('str') has converted integer column to string data type.
We haven't save our new data yet. Lets save our new data type.
df['studentid'] = df['studentid'].astype('str')
df['studentid'].dtype
Convert Str To Int Using astype() method of Python Pandas Dataframe
Similarly we can convert back String to Integer back.
df['studentid'] = df['studentid'].astype('int')
df['studentid'].dtype
Convert Str To Int Using to_numeric() method of Python Pandas Dataframe
Lets first convert to string using our astype method.
df['studentid'] = df['studentid'].astype('str')
df['studentid'].dtype
Ok lets convert our object type to int now using to_numeric() method of Dataframe.
pd.to_numeric(df['studentid'])
There you go, we got the int64 data type back.
to_numeric has few options which are worth mentioning here. We can use the argument downcast to specify data type.
to_numeric has following data types int8(signed), int64(default), float32(float) and float64(default). It has data types for date too but I will let you explore that.
pd.to_numeric(df['studentid'],downcast='signed')
We got int8 with signed.
pd.to_numeric(df['studentid'],downcast='float')
Your data might have values which couldn't be converted to a particular data type and raise an error. Lets do an example.
How To Handle Empty Values While Converting Data From Str To Int DataFrame
lets add an empty value to our dataframe.
studentinfo = {'studentname': ['John','Kyle','Chloe','Renee'],
'studentid': [1,2,3,""]
}
df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])
In our dataframe we added a new student name Renee with student id entry empty. Lets first check our data types.
df.dtypes
Ok one point to notice here is that, since studentid has an empty entry. DataFrame method converted the id column to object by default.
Lets try our astype(int) method now.
df['studentid'].astype('int')
I got the following error...
ValueError: invalid literal for int() with base 10: ''
astype has option 'errors' which is by default set to errors='raise'
We can set it to errors='ignore' to get rid of above error. Lets try that.
df['studentid'].astype('int',errors='ignore')
We got rid of the above error but still the data type didnt change from object to int, but it is still object.
Therefore better way is to use to_numeric() method.
Dataframe to_numeric() method has option errors='coerce', which will convert non numeric values to NaN and at the same time convert the data type to int. Lets try that.
pd.to_numeric(df['studentid'],errors='coerce')
As we see above the non numeric value got changed to NaN, but by default we got the data type float64 although numeric but not int. Lets try to specify the downcast=signed to get int.
pd.to_numeric(df['studentid'],errors='coerce',downcast='signed')
No we didnt' get the int8 value even with downcast='signed' but instead got float64.
Wrap Up!
This post has touched upon basics of astype() and to_numeric() method. There are other data types which can be casted to using the above two methods. Please checkout yourself.
Related Topics
Related Notebooks
- Python Is Integer
- String And Literal In Python 3
- How To Take Integer Input From Command Line In Python
- Five Ways To Remove Characters From A String In Python
- How To Take String Input From Command Line In Python
- How To Convert Python List To Pandas DataFrame
- How to do SQL Select and Where Using Python Pandas
- Pandas How To Sort Columns And Rows
- How to Convert Python Pandas DataFrame into a List