Install the package with following command
...
pip install openpyxl
import pandas as pd
Pandas has ExcelFile method which returns Pandas excel object.
excel = pd.ExcelFile("stocks.xlsx")
excel.sheet_names
['Sheet12']
Note you might run in to following error
ValueError: Worksheet index 0 is invalid, 0 worksheets found
which usually means the Excel file is corrupt. To fix this error, copy the data in to another excel file and save it.
ExcelFile has many methods. For example excel.dict will print the data of spreadsheet in dictionary format.
excel.__dict__
{'io': 'stocks.xlsx', '_io': 'stocks.xlsx', 'engine': 'openpyxl', 'storage_options': None, '_reader': <pandas.io.excel._openpyxl.OpenpyxlReader at 0x7f4cb232c8e0>}
To convert the data in to Pandas Dataframe. We will use ExcelFile.parse() method.
excel = pd.ExcelFile("stocks.xlsx")
df = excel.parse()
df.head()
Unnamed: 0 | Unnamed: 1 | Unnamed: 2 | Unnamed: 3 | |
---|---|---|---|---|
0 | NaN | Stock | Price | Date |
1 | NaN | INTC | 28.9 | 2022-11-29 00:00:00 |
2 | NaN | AAPL | 141.17 | 2022-11-29 00:00:00 |
Since our excel sheet has first column and row empty that is why we see headers and ist column as Unnamed and NaN respectively.
Let us fix it by specifying that header starts at row1.
excel.parse(header=1)
Unnamed: 0 | Stock | Price | Date | |
---|---|---|---|---|
0 | NaN | INTC | 28.90 | 2022-11-29 |
1 | NaN | AAPL | 141.17 | 2022-11-29 |
To fix the column indexing, we can use "usecols" option as shown below.
excel.parse(usecols=[1,2,3],header=1)
Stock | Price | Date | |
---|---|---|---|
0 | INTC | 28.90 | 2022-11-29 |
1 | AAPL | 141.17 | 2022-11-29 |
To specify stock symbol as our index column, we can ues "index_col" option.
excel.parse(index_col="Stock",usecols=[1,2,3],header=1)
Price | Date | |
---|---|---|
Stock | ||
INTC | 28.90 | 2022-11-29 |
AAPL | 141.17 | 2022-11-29 |
We can also use pd.read_excel() method to achieve the same
pd.read_excel("stocks.xlsx",index_col="Stock",usecols=[1,2,3],header=1)
Price | Date | |
---|---|---|
Stock | ||
INTC | 28.90 | 2022-11-29 |
AAPL | 141.17 | 2022-11-29 |
Instead of specifying each column number, we can use range function to specify the columns which contain the data.
excel.parse(usecols=range(1,4),header=1)
Stock | Price | Date | |
---|---|---|---|
0 | INTC | 28.90 | 2022-11-29 |
1 | AAPL | 141.17 | 2022-11-29 |
let us save the dataframe in to a variable.
dfef = pd.read_excel("stocks.xlsx",usecols=range(1,4),header=1)
dfef.head()
Stock | Price | Date | |
---|---|---|---|
0 | INTC | 28.90 | 2022-11-29 |
1 | AAPL | 141.17 | 2022-11-29 |
We can write the dataframe in to Excel file using pd.to_excel() method.
dfef.to_excel("stocktmp.xlsx")
!ls -lrt stocktmp.xlsx
-rw-r--r-- 1 root root 5078 Nov 30 05:21 stocktmp.xlsx
Related Notebooks
- How To Fix Error Pandas Cannot Open An Excel xlsx File
- Write Single And Multi Line Comments In Python
- How To Read CSV File Using Python PySpark
- How To Read JSON Data Using Python Pandas
- How To Write DataFrame To CSV In R
- Select Pandas Dataframe Rows And Columns Using iloc loc and ix
- Pandas How To Sort Columns And Rows
- Summarising Aggregating and Grouping data in Python Pandas
- Merge and Join DataFrames with Pandas in Python