Most Frequently Asked Questions Python Pandas Part1
For this exercise, I am using College.csv data. You can download the data from here. github.com/jstjohn/IntroToStatisticalLearningR-/blob/master/data/College.csv I would also create dummy dataframes to explain some of the concepts.
import pandas as pd
Check out how to read csv file name.
df = pd.read_csv('College.csv')
df.head(1)
How to rename column in Python Pandas
Lets check if we are missing a column name in our csv file. We can print out the header using unix command.
!head -1 College.csv
Yes, the first column is header is missing. Check out https://www.nbshare.io/notebook/58467897/3-Ways-to-Rename-Columns-in-Pandas-DataFrame/ to rename columns in Python Pandas.
How to copy dataframe in Python Pandas
Why would I need to make a copy explicitly in dataframe?
Indexing in Python Pandas doesn't make a seperate copy of the dataframe but it makes a reference to the original dataframe. Therefore if you make any change to the dataframe,it will change the original dataframe. Lets do an example.
df = pd.DataFrame({'name':['John','Evan']})
dfn = df[0:2]
print(dfn)
dfn.iloc[0,0] = 'Adam'
df
As we above our original dataframe has changed. Therefore correct way is to make a copy first.
df = pd.DataFrame({'name':['John','Evan']})
dfn = df[0:2].copy()
dfn
dfn.iloc[0,0] = 'Adam'
df
dfn
As we see above our original dataframe df has not changed when we used df.copy() command.
How to create empty dataframe in Python Pandas
dfe = pd.DataFrame([])
How to add columns to add empty dataframe?
dfe = dfe.assign(col1=None,col2=None)
dfe.head()
How to append values to empty dataframe?
Appending in dataframe is very easy. Just use the append command.
dfe = dfe.append({'col1':1,'col2':2},ignore_index=True)
Remember above command although works, but it is not memory efficient. Above will reallocate the memory every time we do the append to dataframe. Dont use the pd.append inside the loop. Best way is to build the data in the python list and then use pd.DataFrame to create the dataframe at once as shown below.
data = []
data.append([3,4])
data.append([5,6])
data
Now create the dataframe using above data.
dfe = pd.DataFrame(data,columns=['col1','col2'])
dfe.head()
How to convert Pandas dataframe to Numpy array
Lets use our previous dataframe dfe for this.
import numpy as np
dfe.to_numpy()
Also we can do this way.
np.array(dfe)
How to Concat Pandas Dataframe
Concat is used to concatenate dataframe either using rows or columns.
df1 = pd.DataFrame({'A':[1,2],'B':[3,4]})
df2 = pd.DataFrame({'C':[1,2],'D':[3,4]})
Lets concatenate df1 and df2 so that rows append.
pd.concat([df1,df2],sort=False)
We see that two columns have been created since, column names dont match in df1 and df2
How about concatenate the dataframes so that columns concatenate.
pd.concat([df1,df2],sort=False,axis=1)
How about concatenating the dataframes with same headers. Lets create a 3rd dataframe with same headers as df1.
df3 = pd.DataFrame({'A':[56,57],'B':[100,101]})
Lets concatenate df1 and df3 so that row append.
pd.concat([df1,df3])
As we see above, while concatenating row indexing are preserved from the original dataframe. We can ignore the indexes and make it incremental using option ignore_index=True
pd.concat([df1,df3],ignore_index=True)
with pd.concat, we can create an outside hierarchy by creating an index.
dfc = pd.concat([df1,df3],keys=['s1','s2'])
dfc.head()
Now we can access the data using the new index keys s1 and s2
Related Notebooks
- Pivot Tables In Python Pandas
- Summarising Aggregating and Grouping data in Python Pandas
- Merge and Join DataFrames with Pandas in Python
- How To Read JSON Data Using Python Pandas
- Polynomial Interpolation Using Python Pandas Numpy And Sklearn
- How To Convert Python List To Pandas DataFrame
- How to Convert Python Pandas DataFrame into a List
- How To Analyze Wikipedia Data Tables Using Python Pandas
- Covid 19 Curve Fit Using Python Pandas And Numpy