Pandas Read and Write Excel File

Make sure you have openpyxl package installed. Otherwise you will get following error
...

ModuleNotFoundError: No module named 'openpyxl'

Install the package with following command
... pip install openpyxl

Pandas print excel sheet names

In [1]:

import pandas as pd

Pandas has ExcelFile method which returns Pandas excel object.

In [2]:

excel = pd.ExcelFile("stocks.xlsx")
excel.sheet_names

Out[2]:

['Sheet12']

Note you might run in to following error

ValueError: Worksheet index 0 is invalid, 0 worksheets found

which usually means the Excel file is corrupt. To fix this error, copy the data in to another excel file and save it.

ExcelFile has many methods. For example excel.dict will print the data of spreadsheet in dictionary format.

In [3]:

excel.__dict__

Out[3]:

{'io': 'stocks.xlsx',
 '_io': 'stocks.xlsx',
 'engine': 'openpyxl',
 'storage_options': None,
 '_reader': <pandas.io.excel._openpyxl.OpenpyxlReader at 0x7f4cb232c8e0>}

To convert the data in to Pandas Dataframe. We will use ExcelFile.parse() method.

Pandas Read Excel Files

In [4]:

excel = pd.ExcelFile("stocks.xlsx")
df = excel.parse()

In [5]:

df.head()

Out[5]:

	Unnamed: 0	Unnamed: 1	Unnamed: 2	Unnamed: 3
0	NaN	Stock	Price	Date
1	NaN	INTC	28.9	2022-11-29 00:00:00
2	NaN	AAPL	141.17	2022-11-29 00:00:00

Since our excel sheet has first column and row empty that is why we see headers and ist column as Unnamed and NaN respectively.

Let us fix it by specifying that header starts at row1.

In [6]:

excel.parse(header=1)

Out[6]:

	Unnamed: 0	Stock	Price	Date
0	NaN	INTC	28.90	2022-11-29
1	NaN	AAPL	141.17	2022-11-29

To fix the column indexing, we can use "usecols" option as shown below.

In [7]:

excel.parse(usecols=[1,2,3],header=1)

Out[7]:

	Stock	Price	Date
0	INTC	28.90	2022-11-29
1	AAPL	141.17	2022-11-29

To specify stock symbol as our index column, we can ues "index_col" option.

In [8]:

excel.parse(index_col="Stock",usecols=[1,2,3],header=1)

Out[8]:

	Price	Date
Stock
INTC	28.90	2022-11-29
AAPL	141.17	2022-11-29

We can also use pd.read_excel() method to achieve the same

In [9]:

pd.read_excel("stocks.xlsx",index_col="Stock",usecols=[1,2,3],header=1)

Out[9]:

	Price	Date
Stock
INTC	28.90	2022-11-29
AAPL	141.17	2022-11-29

Instead of specifying each column number, we can use range function to specify the columns which contain the data.

In [10]:

excel.parse(usecols=range(1,4),header=1)

Out[10]:

	Stock	Price	Date
0	INTC	28.90	2022-11-29
1	AAPL	141.17	2022-11-29

let us save the dataframe in to a variable.

In [11]:

dfef = pd.read_excel("stocks.xlsx",usecols=range(1,4),header=1)

In [12]:

dfef.head()

Out[12]:

	Stock	Price	Date
0	INTC	28.90	2022-11-29
1	AAPL	141.17	2022-11-29

Pandas write Dataframe to Excel File

We can write the dataframe in to Excel file using pd.to_excel() method.

In [13]:

dfef.to_excel("stocktmp.xlsx")

In [14]:

!ls -lrt stocktmp.xlsx