Power Query Optimization: Essential Tips for Better Performance

Learn essential Power Query optimization techniques to dramatically improve refresh times and performance. From query folding to efficient memory management, discover practical tips for faster data transformations in Power BI and Excel. A must-read guide for BI developers and data analysts working with large datasets.

Despite the availability of various enterprise-grade ETL tools in the market, Power Query and Dataflows remain the transformation tools of choice for many companies and BI professionals. Whether due to budget constraints, ease of use, or seamless integration with the Microsoft ecosystem, organizations continue to rely on Power Query for their data transformation needs.

However, without proper optimization, Power Query can become a performance bottleneck in your BI solution. As data volumes grow and transformations become more complex, understanding how to optimize your Power Query operations becomes crucial. In this guide, I’ll share essential tips that can significantly improve your query performance and refresh times.

This article is also available on my Medium.com

The Fundamentals: Query Structure

The foundation of an efficient Power Query solution lies in how you structure your queries. Here are the key principles:

Filter Early, Transform Later

Always filter and reduce the size of your data as early as possible in your query. This simple practice can dramatically reduce processing time and memory usage.

// Bad Practice
let
    Source = Table.FromRows(...),
    Transform = Table.TransformColumns(...),  // Transforming entire dataset
    Filter = Table.SelectRows(Transform, each [Amount] > 100)
in
    Filter

// Good Practice
let
    Source = Table.FromRows(...),
    Filter = Table.SelectRows(Source, each [Amount] > 100),  // Filter first
    Transform = Table.TransformColumns(Filter, ...)  // Transform smaller dataset
in
    Transform

Remove Unnecessary Columns Early

Don’t carry unused columns through your transformations. Remove them as early as possible to reduce memory usage.

// Do this early in your query
let
    Source = Excel.Workbook(...),
    SelectColumns = Table.SelectColumns(
        Source, 
        {"ID", "Date", "Amount"}  // Keep only what you need
    )

Set Correct Data Types Immediately

Set your data types right after sourcing the data. This prevents multiple type transformations and potential errors down the line.

Smart Query Organization

Reference vs. Duplicate: Choose Wisely

Power Query offers two ways to reuse queries: references and duplicates. References are generally the preferred choice for performance as they execute transformations only once and reuse the results across all referring queries. This is particularly beneficial when dealing with complex transformations or large datasets.

While duplicates create independent copies and might seem simpler, they can impact performance as each duplicate performs the same transformations independently. However, duplicates might be appropriate in specific scenarios:

  • When you need to make significant changes to the query logic
  • When you want to ensure changes to one query don’t affect others
  • For documentation or testing purposes

In most cases, using references will lead to better performance and maintainability. They ensure consistency across your solution and reduce the overall processing load during refresh.

The choice between reference and duplicate can also depend on your refresh strategy. If you want queries to run in parallel, duplicates allow independent processing. However, if you prefer transformations to be performed once and then reused subsequently, references are the better choice. Consider your specific scenario and refresh requirements when making this decision.

Group Queries Logically

Instead of having all your queries in a flat structure, organize them into logical groups. Create staging queries for initial loads, reference queries for shared logic, and final queries for output. This not only improves maintainability but also helps in performance optimization by making dependencies clear.

Query Folding: The Performance Multiplier

Query folding is one of Power Query’s most powerful yet least understood features. When query folding occurs, Power Query translates your transformations back into native queries (like SQL) that execute directly at the data source, rather than processing row by row in memory.

For example, if you’re connecting to a SQL database with millions of rows and apply a filter in Power Query, with query folding, that filter becomes part of the SQL query - meaning only the filtered data is actually sent to Power BI.

Operations That Support Folding:

  • Filtering
  • Sorting
  • Basic aggregations
  • Merges (joins)
  • Column selection

Operations That Break Folding:

  • Complex custom columns
  • Index columns
  • Running totals
  • Most custom functions

Query folding is usually a good practice, however, there are situations that query folding is not recommended such as:

  • Source computation is expensive
  • Heavy load on source

To verify if your transformations are folding in Power Query Desktop, right-click on any step and select “View Native Query”. If you see a SQL statement, it means your transformations up to that point are being folded back to the source. If the option is grayed out or no SQL appears, that step has broken folding.

For Dataflows, if there is a query folding icon on the right side of step, it means that the operations up to that step are evaluated by the data source (are folding).

To learn more about Query Folding, please refer to Microsoft documentation.

Memory Management: To Buffer or Not to Buffer

Table.Buffer() is a powerful function that can either significantly improve or severely hurt performance, depending on how it’s used. When you buffer a table, Power Query loads the entire table into memory, which has important implications.

How Buffer Works:

  • The entire table is loaded into memory
  • Stays in memory until query completion
  • Can speed up repeated operations
  • Breaks query folding
  • Memory is automatically released after query completion

Use Buffer When:

  • Working with small lookup tables used multiple times
// Good: Buffer small product lookup table used repeatedly
let
    Products = Table.Buffer(SmallProductTable),
    Result1 = Table.Join(Sales1, "ProductID", Products, "ID"),
    Result2 = Table.Join(Sales2, "ProductID", Products, "ID")
in
    Result2
  • After filtering/aggregating to a smaller dataset
  • Preventing multiple executions of expensive operations

Avoid Buffer When:

  • Working with large fact tables (can cause memory issues)
  • Before filtering operations (buffers unnecessary data)
  • On single-use transformations (adds overhead with no benefit)
  • When query folding is more beneficial

Buffer Tricky Scenarios:

// Tricky: Buffering before vs after filtering
let
    Source = Sql.Database("server", "db"),
    FullTable = Source{[Schema="dbo",Item="Sales"]}[Data],
    
    // BAD: Buffering full table
    BufferedFull = Table.Buffer(FullTable),
    FilteredAfter = Table.SelectRows(BufferedFull, each [Year] = 2024),
    
    // GOOD: Filter first, then buffer if needed
    FilteredFirst = Table.SelectRows(FullTable, each [Year] = 2024),
    BufferedFiltered = Table.Buffer(FilteredFirst)
in
    BufferedFiltered

Common Anti-Patterns to Avoid

  1. Multiple Type Transformations
    • Set types once, early in your query
    • Avoid changing types multiple times
  2. Complex Calculations on Full Datasets
    • Filter and reduce data size first
    • Perform complex calculations on smaller datasets
  3. Unnecessary Transformations
    • Every step has a performance cost
    • Combine steps when possible
    • Remove unused steps

Advanced Optimization Tips

Parallel Query Processing

Power Query can process multiple queries in parallel, but only if they don’t have dependencies. When designing your solution, consider breaking large queries into independent parts that can be processed simultaneously.

Error Handling Strategy

While error handling is important, excessive error handling in every step can impact performance. Focus error handling on critical points where errors are likely to occur, such as data source connections and type transformations.

// Strategic error handling
let
    Source = try Sql.Database("server", "db")
            otherwise #table({"Error"}, {{"Connection Failed"}}),
    // Don't wrap every step in try/otherwise
    Transform = Table.SelectRows(Source, each [Amount] > 0)
in
    Transform

Date Tables and Calendar Calculations

Instead of calculating date-related transformations in every query, create a single date table and reference it. This is especially important when working with fiscal calendars or custom date logic.

Performance Optimization Checklist

✅ Filter data as early as possible 
✅ Remove unnecessary columns early 
✅ Set correct data types immediately 
✅ Check for query folding 
✅ Buffer small, frequently-used tables only 
✅ Combine steps when reasonable 
✅ Remove unused steps 
✅ Use reference queries for shared logic 
✅ Organize queries logically

Monitoring Performance

To monitor and optimize Power Query performance:

  • Use Query Diagnostics to analyze query execution
  • Monitor Windows Task Manager for memory usage during refresh
  • Test refresh times with different optimization strategies
  • Review data source query logs (if available)

Conclusion

Optimizing Power Query doesn’t have to be complex. Start with proper query structure, understand query folding, and use buffering judiciously. These fundamentals will help you build efficient, maintainable data transformations.

Remember: The best optimization is often the simplest one. Focus on getting the basics right first, then dive into more advanced techniques as needed.

Leave a Reply