Most Frequently Asked Questions Python Pandas Part1

For this exercise, I am using College.csv data. You can download the data from here. github.com/jstjohn/IntroToStatisticalLearningR-/blob/master/data/College.csv I would also create dummy dataframes to explain some of the concepts.

import pandas as pd

Check out how to read csv file name.

df = pd.read_csv('College.csv')

df.head(1)

How to rename column in Python Pandas

Lets check if we are missing a column name in our csv file. We can print out the header using unix command.

!head -1 College.csv

Yes, the first column is header is missing. Check out https://www.nbshare.io/notebook/58467897/3-Ways-to-Rename-Columns-in-Pandas-DataFrame/ to rename columns in Python Pandas.

How to copy dataframe in Python Pandas

Why would I need to make a copy explicitly in dataframe?

Indexing in Python Pandas doesn't make a seperate copy of the dataframe but it makes a reference to the original dataframe. Therefore if you make any change to the dataframe,it will change the original dataframe. Lets do an example.

df = pd.DataFrame({'name':['John','Evan']})

dfn = df[0:2]

print(dfn)

   name
0  John
1  Evan

dfn.iloc[0,0] = 'Adam'

df

As we above our original dataframe has changed. Therefore correct way is to make a copy first.

df = pd.DataFrame({'name':['John','Evan']})
dfn = df[0:2].copy()

dfn

dfn.iloc[0,0] = 'Adam'

df

dfn

As we see above our original dataframe df has not changed when we used df.copy() command.

How to create empty dataframe in Python Pandas

dfe = pd.DataFrame([])

How to add columns to add empty dataframe?

dfe = dfe.assign(col1=None,col2=None)

dfe.head()

How to append values to empty dataframe?

Appending in dataframe is very easy. Just use the append command.

dfe = dfe.append({'col1':1,'col2':2},ignore_index=True)

Remember above command although works, but it is not memory efficient. Above will reallocate the memory every time we do the append to dataframe. Dont use the pd.append inside the loop. Best way is to build the data in the python list and then use pd.DataFrame to create the dataframe at once as shown below.

data = []
data.append([3,4])
data.append([5,6])

data

[[3, 4], [5, 6]]

Now create the dataframe using above data.

dfe = pd.DataFrame(data,columns=['col1','col2'])

dfe.head()

How to convert Pandas dataframe to Numpy array

Lets use our previous dataframe dfe for this.

import numpy as np

dfe.to_numpy()

array([[3, 4],
       [5, 6]])

Also we can do this way.

np.array(dfe)

array([[3, 4],
       [5, 6]])

How to Concat Pandas Dataframe

Concat is used to concatenate dataframe either using rows or columns.

df1 = pd.DataFrame({'A':[1,2],'B':[3,4]})
df2 = pd.DataFrame({'C':[1,2],'D':[3,4]})

Lets concatenate df1 and df2 so that rows append.

pd.concat([df1,df2],sort=False)

We see that two columns have been created since, column names dont match in df1 and df2

How about concatenate the dataframes so that columns concatenate.

pd.concat([df1,df2],sort=False,axis=1)

How about concatenating the dataframes with same headers. Lets create a 3rd dataframe with same headers as df1.

df3 = pd.DataFrame({'A':[56,57],'B':[100,101]})

Lets concatenate df1 and df3 so that row append.

pd.concat([df1,df3])

As we see above, while concatenating row indexing are preserved from the original dataframe. We can ignore the indexes and make it incremental using option ignore_index=True

pd.concat([df1,df3],ignore_index=True)

with pd.concat, we can create an outside hierarchy by creating an index.

dfc = pd.concat([df1,df3],keys=['s1','s2'])

dfc.head()

Now we can access the data using the new index keys s1 and s2

	A	B	C	D
0	1.0	3.0	NaN	NaN
1	2.0	4.0	NaN	NaN
0	NaN	NaN	1.0	3.0
1	NaN	NaN	2.0	4.0