How to Convert Python Pandas DataFrame into a List

There are scenarios when you need to convert Pandas DataFrame to Python list.

I will be using college.csv data which has details about university admissions.

Lets start with importing pandas library and read_csv to read the csv file

In [1]:
import pandas as pd
In [2]:
df = pd.read_csv('College.csv')
In [3]:
df.head(1)
Out[3]:
Unnamed: 0 Private Apps Accept Enroll Top10perc Top25perc F.Undergrad P.Undergrad Outstate Room.Board Books Personal PhD Terminal S.F.Ratio perc.alumni Expend Grad.Rate
0 Abilene Christian University Yes 1660 1232 721 23 52 2885 537 7440 3300 450 2200 70 78 18.1 12 7041 60

For the this exercise there are too many columns in this data. Lets just drop all but 3 columns from dataframe.

Lets just keep the columns Private, Apps, and Accept from the dataframe above.

In [5]:
dfn = df[['Private','Apps','Accept']]

Lets check how many rows are there in this dataframe using pd.DataFrame.shape

In [15]:
dfn.shape
Out[15]:
(777, 3)

Ok lets just select first 5 rows from our dataframe. checkout tutorial Select Pandas Dataframe Rows And Columns Using iloc loc and ix

In [18]:
df5r = dfn.loc[:4,:]
In [19]:
df5r.shape
Out[19]:
(5, 3)

Remember pd.DataFrame.size will give you the size of the dataframe rowsxcolumns

So We got first 5 rows and 3 columns.

In [20]:
df5r.size
Out[20]:
15
In [23]:
df5r.head()
Out[23]:
Private Apps Accept
0 Yes 1660 1232
1 Yes 2186 1924
2 Yes 1428 1097
3 Yes 417 349
4 Yes 193 146

Now we got our desired Dataframe in the desired shape. lets proceed with the our current tutorial of converting DataFrame to list.

The command to convert Dataframe to list is pd.DataFrame.values.tolist(). Lets go step by step. Lets get the values first.

In [22]:
df5r.values
Out[22]:
array([['Yes', 1660, 1232],
       ['Yes', 2186, 1924],
       ['Yes', 1428, 1097],
       ['Yes', 417, 349],
       ['Yes', 193, 146]], dtype=object)
In [ ]:
Note DataFrame.values is giving us array object. To convert it to list use tolist()

Lets try values.tolist() on top of it.

In [25]:
df5r.values.tolist()
Out[25]:
[['Yes', 1660, 1232],
 ['Yes', 2186, 1924],
 ['Yes', 1428, 1097],
 ['Yes', 417, 349],
 ['Yes', 193, 146]]

So we get list of lists. we can loop through it as any normal Python list. Lets try that.

In [26]:
for l in df5r.values.tolist():
    print(l)
['Yes', 1660, 1232]
['Yes', 2186, 1924]
['Yes', 1428, 1097]
['Yes', 417, 349]
['Yes', 193, 146]

Ok that is good. But notice we lost the column names. How do we retain the column names when using values.tolist() method.

In [ ]:
It is very simple. We will use Pythons zip method.  Lets see how we can do this.

Lets first save the columns and save it to a seperate list.

In [34]:
cnames = df5r.columns.values.tolist()

Lets also save our columns to a variable.

In [35]:
cvalues = df5r.values.tolist()
In [ ]:
Ok we have now our two lists, we can simply use zip method as shown below.
In [37]:
for c,v in zip(cnames,cvalues):
    print(c,v)
Private ['Yes', 1660, 1232]
Apps ['Yes', 2186, 1924]
Accept ['Yes', 1428, 1097]

Lets flatten the list so it appears better.

In [41]:
for c,value in zip(cnames,cvalues):
    print(c, "-"," ".join(str(v) for v in value))
Private - Yes 1660 1232
Apps - Yes 2186 1924
Accept - Yes 1428 1097

Ok so far so good. But there is better way to retain the spreadsheet format. Lets try that.

In [51]:
final_list = [cnames] + cvalues
In [52]:
final_list
Out[52]:
[['Private', 'Apps', 'Accept'],
 ['Yes', 1660, 1232],
 ['Yes', 2186, 1924],
 ['Yes', 1428, 1097],
 ['Yes', 417, 349],
 ['Yes', 193, 146]]

Lets check the data type.

In [53]:
final_list.__class__()
Out[53]:
[]

It is still a python list. Lets loop through the list again.

In [58]:
f = '{:<10}|{:<10}|{:<10}'
for l in final_list:
    print(f.format(*l))
Private   |Apps      |Accept    
Yes       |1660      |1232      
Yes       |2186      |1924      
Yes       |1428      |1097      
Yes       |417       |349       
Yes       |193       |146       

There we go, it looks better now.