Exploratory Data Analysis (EDA) with Python & Power BI

Exploratory Data Analysis (EDA) is a fundamental step in the data analysis process. It helps data analysts and data scientists understand the dataset’s structure, identify patterns, detect anomalies, and prepare data for further modeling or machine learning.

In this blog post, we’ll cover key EDA steps with hands-on examples using Python (Pandas, Seaborn, Matplotlib) and Power BI to visualize and analyze data.

1. Understanding the Data

Before diving into analysis, it’s essential to understand the dataset’s structure.

Example (Python): Checking Dataset Structure

import pandas as pd

# Load dataset
df = pd.read_csv("data.csv")

# View first 5 rows
print(df.head())

# Check dataset structure
print(df.info())

# Summary statistics
print(df.describe())

Example (Power BI): Checking Dataset Structure

Load Data: Import your dataset into Power BI.
View Fields: Open the Data View to explore columns and data types.
Summary Statistics: Use the “Summarize” feature or DAX Measures like:

Summary_Stats = SUMMARIZECOLUMNS(
    "Average", AVERAGE(Table[Column]),
    "Min", MIN(Table[Column]),
    "Max", MAX(Table[Column]),
    "StdDev", STDEV.P(Table[Column])
)

2. Handling Missing Data

Missing data can significantly impact analysis and model accuracy.

Example (Python): Handling Missing Values

# Check for missing values
print(df.isnull().sum())

# Fill missing numerical values with mean
df['Column'] = df['Column'].fillna(df['Column'].mean())

# Drop rows with missing values
df = df.dropna()

Example (Power BI): Handling Missing Data

Use Power Query Editor → “Transform” → “Replace Values”
Use DAX for imputation:

Column_Filled = IF(ISBLANK(Table[Column]), AVERAGE(Table[Column]), Table[Column])

3. Univariate Analysis (Single Variable Analysis)

Analyzing one variable at a time helps understand distributions.

Example (Python): Visualizing Univariate Data

import matplotlib.pyplot as plt
import seaborn as sns

# Histogram
sns.histplot(df['Column'], bins=30, kde=True)
plt.show()

# Box Plot
sns.boxplot(x=df['Column'])
plt.show()

Example (Power BI): Visualizing Univariate Data

Histogram: Use a Clustered Column Chart and set the X-axis to the variable.
Box Plot: Import Violin & Box Plot visual from the Power BI marketplace.

4. Bivariate Analysis (Two Variable Analysis)

This step helps analyze relationships between two variables.

Example (Python): Correlation Between Two Numerical Variables

# Scatter Plot
sns.scatterplot(x=df['Feature1'], y=df['Feature2'])
plt.show()

# Correlation
print(df[['Feature1', 'Feature2']].corr())

Example (Power BI): Scatter Plot & Correlation

Use Scatter Chart visualization to plot relationships.
Use DAX to calculate correlation:

Correlation = CORREL(Table[Feature1], Table[Feature2])

5. Multivariate Analysis (Multiple Variables)

This helps uncover deeper patterns.

Example (Python): Pair Plot & PCA

# Pair Plot
sns.pairplot(df[['Feature1', 'Feature2', 'Feature3']])
plt.show()

# Principal Component Analysis (PCA)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
df_pca = pca.fit_transform(df[['Feature1', 'Feature2', 'Feature3']])

Example (Power BI): Multivariate Analysis

Use Table Visual with multiple measures.
Create a Correlation Matrix using the Power BI Heatmap visual.

6. Detecting Outliers

Outliers can distort analysis and need careful handling.

Example (Python): Outlier Detection with IQR & Z-score

# Using IQR (Interquartile Range)
Q1 = df['Column'].quantile(0.25)
Q3 = df['Column'].quantile(0.75)
IQR = Q3 - Q1

# Filtering out outliers
df_no_outliers = df[(df['Column'] >= (Q1 - 1.5 * IQR)) & (df['Column'] <= (Q3 + 1.5 * IQR))]

# Using Z-score
from scipy.stats import zscore
df['Z_Score'] = zscore(df['Column'])
df_filtered = df[df['Z_Score'].abs() < 3]

Example (Power BI): Handling Outliers

Create a new column for Z-score using DAX:

Z_Score = (Table[Column] - AVERAGE(Table[Column])) / STDEV.P(Table[Column])

Filter records with Z-score between -3 and +3 using a visual filter.

7. Checking Correlations & Relationships

Example (Python): Heatmap for Correlations

# Correlation Heatmap
plt.figure(figsize=(10,6))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')
plt.show()

Example (Power BI): Correlation Heatmap

Use the Matrix Visual.
Import the Heatmap Custom Visual from the Power BI marketplace.

8. Feature Engineering & Data Transformation

Feature engineering enhances model performance.

Example (Python): Creating New Features

# Creating a new column
df['New_Feature'] = df['Feature1'] * df['Feature2']

# Encoding categorical variables
df = pd.get_dummies(df, columns=['Category_Column'], drop_first=True)

Example (Power BI): Creating New Features

Use DAX Calculated Columns:

New_Column = Table[Feature1] * Table[Feature2]

9. Detecting Data Quality Issues

Example (Python): Handling Duplicates & Inconsistencies

# Checking duplicates
print(df.duplicated().sum())

# Removing duplicates
df = df.drop_duplicates()

Example (Power BI): Handling Duplicates

Use Power Query → “Remove Duplicates”.

Tools for EDA

Python: Pandas, NumPy, Seaborn, Matplotlib, Plotly
Power BI & Tableau: Visual Analytics
Excel: Pivot Tables, Charts

Conclusion

EDA is a critical step before applying machine learning or predictive modeling. By following these techniques, you can uncover hidden insights and make data-driven decisions.

Want a hands-on Power BI dashboard for EDA? Let me know in the comments!