{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

How to Convert Python Pandas DataFrame into a List

" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are scenarios when you need to convert Pandas DataFrame to Python list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I will be using college.csv data which has details about university admissions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets start with importing pandas library and read_csv to read the csv file" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv('College.csv')" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Unnamed: 0PrivateAppsAcceptEnrollTop10percTop25percF.UndergradP.UndergradOutstateRoom.BoardBooksPersonalPhDTerminalS.F.Ratioperc.alumniExpendGrad.Rate
0Abilene Christian UniversityYes1660123272123522885537744033004502200707818.112704160
\n", "
" ], "text/plain": [ " Unnamed: 0 Private Apps Accept Enroll Top10perc \\\n", "0 Abilene Christian University Yes 1660 1232 721 23 \n", "\n", " Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal \\\n", "0 52 2885 537 7440 3300 450 2200 \n", "\n", " PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate \n", "0 70 78 18.1 12 7041 60 " ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.head(1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For the this exercise there are too many columns in this data. Lets just drop all but 3 columns from dataframe." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets just keep the columns Private, Apps, and Accept from the dataframe above." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "dfn = df[['Private','Apps','Accept']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check how many rows are there in this dataframe using pd.DataFrame.shape" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(777, 3)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dfn.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "

Ok lets just select first 5 rows from our dataframe. checkout tutorial Select Pandas Dataframe Rows And Columns Using iloc loc and ix" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "df5r = dfn.loc[:4,:]" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(5, 3)" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df5r.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Remember pd.DataFrame.size will give you the size of the dataframe rowsxcolumns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So We got first 5 rows and 3 columns." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "15" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df5r.size" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "

\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PrivateAppsAccept
0Yes16601232
1Yes21861924
2Yes14281097
3Yes417349
4Yes193146
\n", "
" ], "text/plain": [ " Private Apps Accept\n", "0 Yes 1660 1232\n", "1 Yes 2186 1924\n", "2 Yes 1428 1097\n", "3 Yes 417 349\n", "4 Yes 193 146" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df5r.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we got our desired Dataframe in the desired shape. lets proceed with the our current tutorial of converting DataFrame to list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The command to convert Dataframe to list is pd.DataFrame.values.tolist(). Lets go step by step. Lets get the values first." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([['Yes', 1660, 1232],\n", " ['Yes', 2186, 1924],\n", " ['Yes', 1428, 1097],\n", " ['Yes', 417, 349],\n", " ['Yes', 193, 146]], dtype=object)" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df5r.values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Note DataFrame.values is giving us array object. To convert it to list use tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets try values.tolist() on top of it." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['Yes', 1660, 1232],\n", " ['Yes', 2186, 1924],\n", " ['Yes', 1428, 1097],\n", " ['Yes', 417, 349],\n", " ['Yes', 193, 146]]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df5r.values.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So we get list of lists. we can loop through it as any normal Python list. Lets try that." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['Yes', 1660, 1232]\n", "['Yes', 2186, 1924]\n", "['Yes', 1428, 1097]\n", "['Yes', 417, 349]\n", "['Yes', 193, 146]\n" ] } ], "source": [ "for l in df5r.values.tolist():\n", " print(l)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok that is good. But notice we lost the column names. How do we retain the column names when using values.tolist() method." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "It is very simple. We will use Pythons zip method. Lets see how we can do this." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets first save the columns and save it to a seperate list." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [], "source": [ "cnames = df5r.columns.values.tolist()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets also save our columns to a variable." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "cvalues = df5r.values.tolist()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "Ok we have now our two lists, we can simply use zip method as shown below." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Private ['Yes', 1660, 1232]\n", "Apps ['Yes', 2186, 1924]\n", "Accept ['Yes', 1428, 1097]\n" ] } ], "source": [ "for c,v in zip(cnames,cvalues):\n", " print(c,v)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets flatten the list so it appears better." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Private - Yes 1660 1232\n", "Apps - Yes 2186 1924\n", "Accept - Yes 1428 1097\n" ] } ], "source": [ "for c,value in zip(cnames,cvalues):\n", " print(c, \"-\",\" \".join(str(v) for v in value))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok so far so good. But there is better way to retain the spreadsheet format. Lets try that." ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "final_list = [cnames] + cvalues" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['Private', 'Apps', 'Accept'],\n", " ['Yes', 1660, 1232],\n", " ['Yes', 2186, 1924],\n", " ['Yes', 1428, 1097],\n", " ['Yes', 417, 349],\n", " ['Yes', 193, 146]]" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "final_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets check the data type." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[]" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "final_list.__class__()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is still a python list. Lets loop through the list again. " ] }, { "cell_type": "code", "execution_count": 58, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Private |Apps |Accept \n", "Yes |1660 |1232 \n", "Yes |2186 |1924 \n", "Yes |1428 |1097 \n", "Yes |417 |349 \n", "Yes |193 |146 \n" ] } ], "source": [ "f = '{:<10}|{:<10}|{:<10}'\n", "for l in final_list:\n", " print(f.format(*l))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There we go, it looks better now." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 4 }