Python Pandas String To Integer And Integer To String Of DataFrame
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Python Pandas is a great library for doing data analysis. While doing the analysis, we have to often convert data from one format to another. In this tutorial I will show you how to convert String to Integer format and vice versa."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There are two primary ways to convert data type."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
astype()
to_numeric()
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Before we dive in to each of these methods. Lets first talk about our data for this exercise."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets create a dummy dataframe with 5 student with their names and ids. For real example checkout Merge and Join DataFrames with Pandas in Python"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"from pandas import DataFrame\n",
"\n",
"studentinfo = {'studentname': ['John','Kyle','Chloe'],\n",
" 'studentid': [1,2,3]\n",
" }\n",
"\n",
"df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok our dataframe is created. Lets check the datatypes."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"studentname object\n",
"studentid int64\n",
"dtype: object"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok our studentname column is type 'object' and studentid is int64."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Convert Integer To Str Using astype() method of Python Pandas Dataframe
Convert Str To Int Using to_numeric() method of Python Pandas Dataframe
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets first convert to string using our astype method."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [],
"source": [
"df['studentid'] = df['studentid'].astype('str')"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"dtype('O')"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['studentid'].dtype"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok lets convert our object type to int now using to_numeric() method of Dataframe."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 3\n",
"Name: studentid, dtype: int64"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_numeric(df['studentid'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"There you go, we got the int64 data type back."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"to_numeric has few options which are worth mentioning here. We can use the argument downcast to specify data type."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"to_numeric has following data types int8(signed), int64(default), float32(float) and float64(default). It has data types for date too but I will let you explore that."
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 3\n",
"Name: studentid, dtype: int8"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_numeric(df['studentid'],downcast='signed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We got int8 with signed."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 2.0\n",
"2 3.0\n",
"Name: studentid, dtype: float32"
]
},
"execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_numeric(df['studentid'],downcast='float')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your data might have values which couldn't be converted to a particular data type and raise an error. Lets do an example."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
How To Handle Empty Values While Converting Data From Str To Int DataFrame
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"lets add an empty value to our dataframe."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"studentinfo = {'studentname': ['John','Kyle','Chloe','Renee'],\n",
" 'studentid': [1,2,3,\"\"]\n",
" }\n",
"df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In our dataframe we added a new student name Renee with student id entry empty. Lets first check our data types."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"studentname object\n",
"studentid object\n",
"dtype: object"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.dtypes"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Ok one point to notice here is that, since studentid has an empty entry. DataFrame method converted the id column to object by default."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Lets try our astype(int) method now."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"df['studentid'].astype('int')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"I got the following error..."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ValueError: invalid literal for int() with base 10: ''"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"astype has option 'errors' which is by default set to errors='raise'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can set it to errors='ignore' to get rid of above error. Lets try that."
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1\n",
"1 2\n",
"2 3\n",
"3 \n",
"Name: studentid, dtype: object"
]
},
"execution_count": 31,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['studentid'].astype('int',errors='ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We got rid of the above error but still the data type didnt change from object to int, but it is still object."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Therefore better way is to use to_numeric() method."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Dataframe to_numeric() method has option errors='coerce', which will convert non numeric values to NaN and at the same time convert the data type to int. Lets try that."
]
},
{
"cell_type": "code",
"execution_count": 48,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 2.0\n",
"2 3.0\n",
"3 NaN\n",
"Name: studentid, dtype: float64"
]
},
"execution_count": 48,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_numeric(df['studentid'],errors='coerce')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we see above the non numeric value got changed to NaN, but by default we got the data type float64 although numeric but not int. Lets try to specify the downcast=signed to get int."
]
},
{
"cell_type": "code",
"execution_count": 49,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 1.0\n",
"1 2.0\n",
"2 3.0\n",
"3 NaN\n",
"Name: studentid, dtype: float64"
]
},
"execution_count": 49,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.to_numeric(df['studentid'],errors='coerce',downcast='signed')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"No we didnt' get the int8 value even with downcast='signed' but instead got float64."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Wrap Up!
"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This post has touched upon basics of astype() and to_numeric() method. There are other data types which can be casted to using the above two methods. Please checkout yourself."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"