How To Analyze Excel Data Using Python(Pandas)?

Introduction
In today’s world everything is data-driven, either we are generating data or processing the same data to analyze trend and make best decisions for our business. Here, comes Microsoft Excel in the picture which is one of the most widely used tools for managing and storing structured data. But when it comes to analyzing large or complex datasets, relying solely on Excel can be extremely error-prone and time-consuming.
But you need not to worry because python libraries are to the rescue!!
Python libraries like numpy, pandas, pytorch, scikit-learn, matplotlib, etc., are really helpful when it comes to analyzing Excel sheet data efficiently. With just a few lines of code, you can automate, clean, and analyze Excel files efficiently. So, whether you’re a beginner or an experienced professional transitioning from Excel to Python, this guide will definitely help you get started.
If you’re looking to analyze Excel data using Python, the pandas library offers the most flexible and beginner-friendly approach. With just a few lines of code, you can clean, filter, and extract powerful insights from your spreadsheets. Whether you’re working with sales reports, finance sheets, or customer databases, learning how to analyze Excel data using Python can save hours of manual work. In fact, more data analysts today are turning to Python as a faster, more scalable alternative to Excel. So, if you’ve never tried it before, now is the perfect time to start analyzing Excel data using Python and pandas.
Prerequisites
Before We begin make sure you have the following installed in your working system:
Python (You can use Anaconda or install Python manually)
pandas library
openpyxl (for reading
.xlsx
files)
Here, pandas
is the main library we’ll use for data analysis and openpyxl
is an engine that helps pandas read.xlsx
Excel files.
To install them, run:
pip install pandas openpyxl
Step-by-Step: Read Excel File in Pandas
Step 1: Load Excel File into imported Pandas library
Now, start your script by importing the required libraries and let’s start by reading an Excel file using pd.read_excel()
:
import pandas as pd
df = pd.read_excel("sales_data.xlsx")
print(df.head())
What’s happening in the code?
- First, we imported pandas library using an alian name for it as pd. You may use any other name of your choice.
- Now in the second line of code we are going to read our excel file and save the data into a variable named df using pandas, i.e., via pd.read_excel().
- “sales_data.xlsx” is a name of the file we are importing. You may copy the path of the file, if you file is in some other folder.
- print(df.head()) prints first 5 rows of your excel sheet. df.head() when called it defaultly calls first 5 rows to print.
You may use openpyxl engine to handle .xlsx
files like this:
df = pd.read_excel("sales_data.xlsx", engine='openpyxl')
✅ Tip: If your file has multiple sheets, you can specify the sheet name using sheet_name='Sheet1'
.
Step 2: Explore Your Data
Remember real world data is way too messy! And you cannot directly work upon it.
Therefore, you first need to perform few basic cleaning steps to drop any outliers and unnecessary information that can affect our analysis.
# Check for missing values
print(df.isnull().sum())
# Drop rows with missing values (if necessary)
df.dropna(inplace=True)
# Optional: Rename columns for clarity
df.rename(columns={'Total Sales': 'Sales'}, inplace=True)
You may also fill the missing values, like this:
# Fill missing values
df['Sales'].fillna(0, inplace=True)
Step 3: Analyze The Data
Let’s analyze your data and visualize with the help of matplotlib library!
First, let us analyze the metrics of our excel sheet data:
# Total sales per product
print(df.groupby('Product')['Sales'].sum())
# Average sales by region
print(df.groupby('Region')['Sales'].mean())
# Filter high sales
high_sales = df[df['Sales'] > 5000]
print(high_sales)
After, extracting the useful insights above now let’s visualize the data using matplotlib
import matplotlib.pyplot as plt
df.groupby('Product')['Sales'].sum().plot(kind='bar')
plt.title('Total Sales per Product')
plt.xlabel('Product')
plt.ylabel('Sales')
plt.tight_layout()
plt.show()
In the above code, we are plotting a bar graph of Product vs Sales:
- We used .plot(kind = ‘bar’) to define the type of graph we need.
- plt.title() was used to give a title for the plotted graph.
- plt.xlabel(), plt.ylabel() are used to give label and assign Product and Sales columns to them, respectively.
- plt.tight_layout() binds them together and then full graph is printed by plt.show().
Step 4: Export The Cleaned Data Back to Excel
After cleaning and analyzing, you may want to export the updated data:
df.to_excel("cleaned_sales_data.xlsx", index=False)
Now you have a clean, analysis-ready Excel file!
Bonus Tips:
Before you begin analysis, it’s important to understand the structure of your data. Therefore, you may use some of the following code to explore your dataset:
print(df.shape) # Rows and columns
print(df.columns) # Column names
print(df.dtypes) # Data types
print(df.describe()) # Summary statistics
Conclusion
If you’ve been manually analyzing Excel files, switching to Python and pandas can drastically improve your productivity. From reading and cleaning data to performing meaningful analysis and exporting results, pandas makes the entire process smoother and more scalable. With just a few lines of Python and pandas, you can turn raw Excel data into meaningful insights. This method saves time, avoids manual Excel errors, and scales effortlessly as your data grows.
Whether you’re a data analyst, student, or Excel enthusiast, learning how to analyze Excel data using python with pandas is a skill worth mastering.
In the next blogs, we’ll explore visualizing this data using Power BI, Matplotlib, or Seaborn! Till then keep learning.
FAQs
1. Can I use Python to analyze Excel files without Excel installed?
Answer: Yes! Python reads Excel files using libraries like pandas
and openpyxl
, without needing Microsoft Excel on your system.
2. What’s the advantage of using pandas over Excel?
Answer: Pandas is faster for large datasets, allows automation, and supports complex analysis that’s difficult to do manually in Excel.
3. How do I write data back to Excel using pandas?
Answer: Use df.to_excel('filename.xlsx')
to export any DataFrame to Excel.
4. I don’t know Python. How can I learn Python?
Answer: Don’t worry!! We got you covered. Visit our Python Roadmap article, we have listed all the topics you need to learn to master Python.
5. What’s the difference between read_csv()
and read_excel()
in pandas?
Answer: read_csv()
is used to read CSV files, while read_excel()
is used for Excel files (.xls
or .xlsx
). The underlying engines and file formats differ, but both return a pandas DataFrame.
What’s Next?
Once you’ve mastered the basics to analyze Excel data using Python with pandas, you can expand your workflow to include data visualization, automated reporting, and even machine learning models. Pandas pairs well with other powerful Python libraries like Matplotlib, Seaborn, and Scikit-learn, enabling you to convert raw spreadsheets into insights, dashboards, and predictions. Whether you’re a beginner in Python or looking to optimize your data tasks, this skill can boost your productivity and unlock new opportunities in data science, business analytics, and reporting automation. So don’t stop here — start experimenting with your own datasets today!
Also, don’t forget to check our YouTube channel.