Data cleaning is a crucial step in data analysis, ensuring accuracy and consistency in reports. Power Query in Power BI provides powerful tools to clean, filter, and structure data efficiently. One of the most common tasks is removing duplicates and cleaning unnecessary or incorrect data.
In this guide, we will walk through:
- Removing duplicates in Power Query
- Cleaning messy data
- Best practices for data transformation
1. Why Clean Data in Power Query?
Before analysis, raw data often contains inconsistencies, such as: ✔️ Duplicate records that inflate counts and distort insights.
✔️ Missing or null values that break calculations.
✔️ Extra spaces, inconsistent text formatting, or unwanted characters.
✔️ Mismatched column types leading to errors in calculations.
By cleaning data in Power Query, you improve data integrity, accuracy, and efficiency in Power BI.
2. How to Remove Duplicates in Power Query
Step 1: Load Data into Power Query
- Open Power BI Desktop.
- Click on Transform Data to open Power Query Editor.
- Select the table where you want to remove duplicates.
Step 2: Identify and Remove Duplicates
- Select the column(s) that may contain duplicates.
- Go to the Home tab and click Remove Duplicates.
- Power Query automatically removes duplicate rows based on the selected column(s).
- Click Close & Apply to save changes.
✔ Tip: Be cautious when removing duplicates. If a dataset has different timestamps for similar records, check which column(s) uniquely identify a record.
3. Cleaning Data in Power Query
Handling Missing or Null Values
✔ Click on the Transform tab > Replace Values.
✔ Use “Replace Errors” to handle invalid data.
✔ Remove rows with missing values using Remove Rows > Remove Blank Rows.
✔ Fill missing values using Fill Down or Fill Up (especially useful for structured data like sales or inventory).
Standardizing Text and Data Formats
✔ Use Trim to remove extra spaces from text fields.
✔ Convert text to lowercase/uppercase/proper case for consistency.
✔ Change data types (e.g., Date, Number, Text) using the Data Type dropdown.
✔ Use Find & Replace to correct common errors in text fields.
Splitting & Merging Columns
✔ Split names, addresses, or codes into separate columns using Split Column by Delimiter.
✔ Merge related columns (e.g., first and last name) using Merge Columns.
4. Best Practices for Data Cleaning in Power Query
✅ Always keep a backup of raw data before making transformations.
✅ Use Applied Steps in Power Query to track changes and revert if needed.
✅ Check for inconsistent data types before applying transformations.
✅ Use filters to preview affected rows before removing duplicates.
✅ Document key transformations for easy troubleshooting and collaboration.
Conclusion
Cleaning data in Power Query ensures accurate, reliable, and optimized reports in Power BI. By removing duplicates, handling missing values, and standardizing data, you can improve your analysis and decision-making.
Mastering Power Query’s data cleaning techniques will make you a more efficient data analyst or Power BI user!