Despite the availability of various enterprise-grade ETL tools in the market, Power Query and Dataflows remain the transformation tools of choice for many companies and BI professionals. Whether due to budget constraints, ease of use, or seamless integration with the Microsoft ecosystem, organizations continue to rely on Power Query for their data transformation needs.
However, without proper optimization, Power Query can become a performance bottleneck in your BI solution. As data volumes grow and transformations become more complex, understanding how to optimize your Power Query operations becomes crucial. In this guide, I’ll share essential tips that can significantly improve your query performance and refresh times.
This article is also available on my Medium.com
The Fundamentals: Query Structure
The foundation of an efficient Power Query solution lies in how you structure your queries. Here are the key principles:
Filter Early, Transform Later
Always filter and reduce the size of your data as early as possible in your query. This simple practice can dramatically reduce processing time and memory usage.
// Bad Practice
let
Source = Table.FromRows(...),
Transform = Table.TransformColumns(...), // Transforming entire dataset
Filter = Table.SelectRows(Transform, each [Amount] > 100)
in
Filter
// Good Practice
let
Source = Table.FromRows(...),
Filter = Table.SelectRows(Source, each [Amount] > 100), // Filter first
Transform = Table.TransformColumns(Filter, ...) // Transform smaller dataset
in
Transform
Remove Unnecessary Columns Early
Don’t carry unused columns through your transformations. Remove them as early as possible to reduce memory usage.
// Do this early in your query
let
Source = Excel.Workbook(...),
SelectColumns = Table.SelectColumns(
Source,
{"ID", "Date", "Amount"} // Keep only what you need
)
Set Correct Data Types Immediately
Set your data types right after sourcing the data. This prevents multiple type transformations and potential errors down the line.
Smart Query Organization
Reference vs. Duplicate: Choose Wisely
Power Query offers two ways to reuse queries: references and duplicates. References are generally the preferred choice for performance as they execute transformations only once and reuse the results across all referring queries. This is particularly beneficial when dealing with complex transformations or large datasets.
While duplicates create independent copies and might seem simpler, they can impact performance as each duplicate performs the same transformations independently. However, duplicates might be appropriate in specific scenarios:
- When you need to make significant changes to the query logic
- When you want to ensure changes to one query don’t affect others
- For documentation or testing purposes
In most cases, using references will lead to better performance and maintainability. They ensure consistency across your solution and reduce the overall processing load during refresh.
The choice between reference and duplicate can also depend on your refresh strategy. If you want queries to run in parallel, duplicates allow independent processing. However, if you prefer transformations to be performed once and then reused subsequently, references are the better choice. Consider your specific scenario and refresh requirements when making this decision.
Group Queries Logically
Instead of having all your queries in a flat structure, organize them into logical groups. Create staging queries for initial loads, reference queries for shared logic, and final queries for output. This not only improves maintainability but also helps in performance optimization by making dependencies clear.
Query Folding: The Performance Multiplier
Query folding is one of Power Query’s most powerful yet least understood features. When query folding occurs, Power Query translates your transformations back into native queries (like SQL) that execute directly at the data source, rather than processing row by row in memory.
For example, if you’re connecting to a SQL database with millions of rows and apply a filter in Power Query, with query folding, that filter becomes part of the SQL query - meaning only the filtered data is actually sent to Power BI.
Operations That Support Folding:
- Filtering
- Sorting
- Basic aggregations
- Merges (joins)
- Column selection
Operations That Break Folding:
- Complex custom columns
- Index columns
- Running totals
- Most custom functions
Query folding is usually a good practice, however, there are situations that query folding is not recommended such as:
- Source computation is expensive
- Heavy load on source
To verify if your transformations are folding in Power Query Desktop, right-click on any step and select “View Native Query”. If you see a SQL statement, it means your transformations up to that point are being folded back to the source. If the option is grayed out or no SQL appears, that step has broken folding.
For Dataflows, if there is a query folding icon on the right side of step, it means that the operations up to that step are evaluated by the data source (are folding).
To learn more about Query Folding, please refer to Microsoft documentation.
Memory Management: To Buffer or Not to Buffer
Table.Buffer() is a powerful function that can either significantly improve or severely hurt performance, depending on how it’s used. When you buffer a table, Power Query loads the entire table into memory, which has important implications.
How Buffer Works:
- The entire table is loaded into memory
- Stays in memory until query completion
- Can speed up repeated operations
- Breaks query folding
- Memory is automatically released after query completion
Use Buffer When:
- Working with small lookup tables used multiple times
// Good: Buffer small product lookup table used repeatedly
let
Products = Table.Buffer(SmallProductTable),
Result1 = Table.Join(Sales1, "ProductID", Products, "ID"),
Result2 = Table.Join(Sales2, "ProductID", Products, "ID")
in
Result2
- After filtering/aggregating to a smaller dataset
- Preventing multiple executions of expensive operations
Avoid Buffer When:
- Working with large fact tables (can cause memory issues)
- Before filtering operations (buffers unnecessary data)
- On single-use transformations (adds overhead with no benefit)
- When query folding is more beneficial
Buffer Tricky Scenarios:
// Tricky: Buffering before vs after filtering
let
Source = Sql.Database("server", "db"),
FullTable = Source{[Schema="dbo",Item="Sales"]}[Data],
// BAD: Buffering full table
BufferedFull = Table.Buffer(FullTable),
FilteredAfter = Table.SelectRows(BufferedFull, each [Year] = 2024),
// GOOD: Filter first, then buffer if needed
FilteredFirst = Table.SelectRows(FullTable, each [Year] = 2024),
BufferedFiltered = Table.Buffer(FilteredFirst)
in
BufferedFiltered
Common Anti-Patterns to Avoid
- Multiple Type Transformations
- Set types once, early in your query
- Avoid changing types multiple times
- Complex Calculations on Full Datasets
- Filter and reduce data size first
- Perform complex calculations on smaller datasets
- Unnecessary Transformations
- Every step has a performance cost
- Combine steps when possible
- Remove unused steps
Advanced Optimization Tips
Parallel Query Processing
Power Query can process multiple queries in parallel, but only if they don’t have dependencies. When designing your solution, consider breaking large queries into independent parts that can be processed simultaneously.
Error Handling Strategy
While error handling is important, excessive error handling in every step can impact performance. Focus error handling on critical points where errors are likely to occur, such as data source connections and type transformations.
// Strategic error handling
let
Source = try Sql.Database("server", "db")
otherwise #table({"Error"}, {{"Connection Failed"}}),
// Don't wrap every step in try/otherwise
Transform = Table.SelectRows(Source, each [Amount] > 0)
in
Transform
Date Tables and Calendar Calculations
Instead of calculating date-related transformations in every query, create a single date table and reference it. This is especially important when working with fiscal calendars or custom date logic.
Performance Optimization Checklist
✅ Filter data as early as possible
✅ Remove unnecessary columns early
✅ Set correct data types immediately
✅ Check for query folding
✅ Buffer small, frequently-used tables only
✅ Combine steps when reasonable
✅ Remove unused steps
✅ Use reference queries for shared logic
✅ Organize queries logically
Monitoring Performance
To monitor and optimize Power Query performance:
- Use Query Diagnostics to analyze query execution
- Monitor Windows Task Manager for memory usage during refresh
- Test refresh times with different optimization strategies
- Review data source query logs (if available)
Conclusion
Optimizing Power Query doesn’t have to be complex. Start with proper query structure, understand query folding, and use buffering judiciously. These fundamentals will help you build efficient, maintainable data transformations.
Remember: The best optimization is often the simplest one. Focus on getting the basics right first, then dive into more advanced techniques as needed.