{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Python Pandas String To Integer And Integer To String Of DataFrame

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python Pandas is a great library for doing data analysis. While doing the analysis, we have to often convert data from one format to another. In this tutorial I will show you how to convert String to Integer format and vice versa." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are two primary ways to convert data type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
  1. astype()
  2. to_numeric()
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before we dive in to each of these methods. Lets first talk about our data for this exercise." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets create a dummy dataframe with 5 student with their names and ids. For real example checkout Merge and Join DataFrames with Pandas in Python" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from pandas import DataFrame\n", "\n", "studentinfo = {'studentname': ['John','Kyle','Chloe'],\n", " 'studentid': [1,2,3]\n", " }\n", "\n", "df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok our dataframe is created. Lets check the datatypes." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "studentname object\n", "studentid int64\n", "dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok our studentname column is type 'object' and studentid is int64." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Convert Integer To Str Using astype() method of Python Pandas Dataframe

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets convert our column studentid column from int to str first." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['studentid'].astype('str').dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see above astype('str') has converted integer column to string data type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We haven't save our new data yet. Lets save our new data type." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "df['studentid'] = df['studentid'].astype('str')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['studentid'].dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Convert Str To Int Using astype() method of Python Pandas Dataframe

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Similarly we can convert back String to Integer back." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "df['studentid'] = df['studentid'].astype('int')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('int64')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['studentid'].dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Convert Str To Int Using to_numeric() method of Python Pandas Dataframe

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets first convert to string using our astype method." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "df['studentid'] = df['studentid'].astype('str')" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dtype('O')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['studentid'].dtype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok lets convert our object type to int now using to_numeric() method of Dataframe." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "Name: studentid, dtype: int64" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df['studentid'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There you go, we got the int64 data type back." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "to_numeric has few options which are worth mentioning here. We can use the argument downcast to specify data type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "to_numeric has following data types int8(signed), int64(default), float32(float) and float64(default). It has data types for date too but I will let you explore that." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "Name: studentid, dtype: int8" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df['studentid'],downcast='signed')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We got int8 with signed." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 3.0\n", "Name: studentid, dtype: float32" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df['studentid'],downcast='float')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Your data might have values which couldn't be converted to a particular data type and raise an error. Lets do an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

How To Handle Empty Values While Converting Data From Str To Int DataFrame

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "lets add an empty value to our dataframe." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "studentinfo = {'studentname': ['John','Kyle','Chloe','Renee'],\n", " 'studentid': [1,2,3,\"\"]\n", " }\n", "df = DataFrame(studentinfo, columns= ['studentname', 'studentid'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In our dataframe we added a new student name Renee with student id entry empty. Lets first check our data types." ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "studentname object\n", "studentid object\n", "dtype: object" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.dtypes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok one point to notice here is that, since studentid has an empty entry. DataFrame method converted the id column to object by default." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets try our astype(int) method now." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "df['studentid'].astype('int')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I got the following error..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ValueError: invalid literal for int() with base 10: ''" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "astype has option 'errors' which is by default set to errors='raise'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can set it to errors='ignore' to get rid of above error. Lets try that." ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1\n", "1 2\n", "2 3\n", "3 \n", "Name: studentid, dtype: object" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['studentid'].astype('int',errors='ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We got rid of the above error but still the data type didnt change from object to int, but it is still object." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Therefore better way is to use to_numeric() method." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Dataframe to_numeric() method has option errors='coerce', which will convert non numeric values to NaN and at the same time convert the data type to int. Lets try that." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 3.0\n", "3 NaN\n", "Name: studentid, dtype: float64" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df['studentid'],errors='coerce')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As we see above the non numeric value got changed to NaN, but by default we got the data type float64 although numeric but not int. Lets try to specify the downcast=signed to get int." ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1.0\n", "1 2.0\n", "2 3.0\n", "3 NaN\n", "Name: studentid, dtype: float64" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pd.to_numeric(df['studentid'],errors='coerce',downcast='signed')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "No we didnt' get the int8 value even with downcast='signed' but instead got float64." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Wrap Up!

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This post has touched upon basics of astype() and to_numeric() method. There are other data types which can be casted to using the above two methods. Please checkout yourself." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Related Topics

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "export pandas dataframe to csv\n", "\n", "how to plot histogram in python\n", "\n", "create pandas dataframe from list" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }