Table of contents
INTRODUCTION
Since their first introduction in February 2024, the Excel functions PivotBy and GroupBy have quickly attracted and excited Excel users around the world.
Yes, believe it or not, people can get super excited over Excel functions
What are the PivotBy and GroupBy functions about?
PivotBy and GroupBy allow you to aggregate and display data dynamically and flexibly, in ways that are very similar to what conventional PivotTables can do, but freeing you from certain constraints associated with PivotTables.
While PivotTables are often perceived as powerful but inflexible when it comes to manipulating large quantities of data, PivotBy and GroupBy offer an alternative that allows users to structure their data directly with formulas.
In this article, we will explore their usefulness, detail their syntax, and provide concrete examples of how to harness their potential.
About array formulas
Before diving into the heart of our subject, let's first clarify that PivotBy and GroupBy are array formulas.
Despite having been out there since 2019, many Excel users are still unaware of array formulas. Simply put, an array formula is a single formula that can process a whole group of data—like rows or columns—simultaneously, spilling over a variable number of rows and columns. they bring powerful capabilities and a lot of flexibility to handle complex tasks with modern Excel.
If you want to know more about array formulas, feel free to have a look at those 2 articles:
How to use the PivotBy function in Excel
What is PivotBy?
Excel's PivotBy function dynamically groups and summarizes data according to one or more columns. It is particularly useful for quickly organizing data and obtaining results without the need for a conventional PivotTable.
Syntax
Argument | Description |
row_fields | Required. A column-based array or range that holds the values used to group rows and create row headers. |
col_fields | Required. A range of values designated for grouping data into columns. This range determines how the data is structured and displayed across the columns. |
values | Required. A column-oriented array or range containing the data to be aggregated. This range serves as the input for calculations or summaries, such as sums or averages, based on the specified grouping criteria. |
function | Required. A specific or simplified lambda expression (such as SUM, PERCENTOF, AVERAGE, COUNT, etc.) used to aggregate values. This defines the method for calculating or summarizing the grouped data within the array or range. |
field_header | A numerical value that determines if the row_fields and values include headers and whether field headers should appear in the results. The possible values are:
This range is used to determine how the data is grouped, aggregated, and displayed in the output. |
row_total_depth | This setting specifies whether the row headers should include totals and the type of totals to display. The possible values are:
|
row_sort_order | A numerical value that determines how rows are sorted. The numbers correspond to the columns in row_fields, followed by the columns in values. If the value is negative, the rows are sorted in descending or reverse order.This setting is used to control the order in which rows are displayed in the output. |
col_total_depth | This option determines whether column headers should include totals and the type of totals to display. The possible values are:
|
col-sort order | A numeric value specifying the sorting order for rows. The value corresponds to columns in col_fields followed by columns in values. Negative numbers indicate sorting in descending or reverse order. This setting is used to control the sorting order of rows in the output. |
filter_array | A column-oriented 1D array of boolean values that determines whether each corresponding row of data is included in the output. This array allows you to filter data by defining which rows should be considered, making it easier to focus on specific subsets of information. |
relative_to | When using an aggregation function that requires two arguments, relative_to specifies the values provided as the second argument. This is commonly used with the PERCENTOF function. The available options are:
The relative_to setting allows you to control the reference values used in calculations for aggregation functions, such as PERCENTOF. |
This table seems awfully complicated, so let's check some concrete examples to make things clearer.
Example 1: Generating the total sales reporting per product per year
For now, let's use only mandatory fields and leave aside optional fields for now.
In the above example; PivotBy groups the Products in rows, groups the Years in columns and, then agregates the values from the Sales column with a SUM function.
As you can see, by default Excel adds totals for both rows and columns.
Example 2: Filtering out specific rows with both grand totals and subtotals
Now let's say we want to exclude bananas from the sum of yearly sales per products, we thus need to use the filter_array argument.
You can do the same using the usual logical operators. In the example above, the "<>" sign means "different from".
Example 3: Sorting results
Let's saw we now want to sort the results by ascending order. To achieve this, we can use the row_sort_order or col_sort_order arguments.
In the example above, we use the row_sort_order argument to sort our data by descending order using the "-2" parameter, as seen in the syntax section.
As a result, the values in the total column will be sorted from largest to smallest, providing a clear view of which products are generating the most revenue. If you want to sort instead by ascending order, we use "2" instead of "-2".
How to use the GroupBy function in Excel
What is GroupBy?
The Excel GroupBy function groups data into a defined table or range, applying customized aggregations. It allows you to structure complex data while calculating metrics tailored to each group.
Why use GroupBy?
In-depth analysis: It's perfect for creating advanced aggregations, such as sum and average on the same groups, or even calculations on sub-groups.
Customization: Unlike PivotBy, GroupBy gives you more freedom to define your aggregation criteria and manage multiple grouping columns.
Strength : By manipulating an entire table, it enables you to create rich, detailed reports in a single formula.
Syntax
The only thing that differs form the Pivotby's syntax is the argument field_relationship which defines the relationship between fields when multiple columns are used in row_fields. The options are:
0: Hierarchy (default): Later field columns are sorted based on the hierarchy of earlier columns.
1: Table: Each field column is sorted independently. Subtotals are not supported in this mode, as they depend on hierarchical data.
By selecting the appropriate relationship, you can control how the data is grouped and sorted, providing more flexibility in organizing and analyzing your dataset.
Example 1: Generate the total sales by year
In the above example, we simply grouped the total of sales by year.
Example 2: Apply multiple aggregation functions simultaneously to the same dataset
Sometimes, it is necessary to compare different aggregations within the same table. To summarize the same data using different aggregations, you can position these aggregations in adjacent rows or columns to view it all side by side. Use HSTACK for horizontal arrangements or VSTACK for vertical arrangements to achieve this layout.
Let's push again the example further and sort the aggregated values with row_sort_order. We use the exact same technic as previously seen with Pivotby:
As a result, the values in the total column will be sorted from largest to smallest providing a clear view of which year represented contributed the largest share of sales.
How do the PivotBy and GroupBy functions compare to Excel's PivotTables?
What we like about PivotBy:
✅Flexibility: You can easily configure (or change) different aggregators (sum, average, count), and you can easily add extra dimensions for more detailed analysis.
✅Dynamicity: Results update automatically as the source data changes. No need to manually trigger a refresh of your PivotTable for instance.
✅Visualization: Functions (including array functions) can be used to create charts of variable size. With PivotTables, certain chart types are not allowed (Maps or Waterfall charts for instance), but using formulas can remove these limitations.
✅Filtering: If you want to filter data based on a cell or range from another sheet, no problem you can just use the reference in the filter argument. For PivotTable, it would be much more complicated to get a similar result.
Yet there are some significant limitations:
❌Complexity: Maybe it just takes a while to get used to it, but it feels more complex to use PivotBy compared to a simple PivotTable. When you use a function, you need to get the function right so it doesn't return an error, while with a PivotTable you can more easily adjust and complexity your report along the way.
❌Managing multiple fields: Using these functions when working with a multiple-fields report can quickly become very complex, while all it takes for PivotTables is the drag and drop of a new field.
❌Formatting: With an array function like PivotBy, the results are dynamic, but the formatting is not. With a PivotTable, it would be much easier to apply a specific format to all headers and totals for instance.
❌Visualization: Except for the chart types mentioned above, it's probably a bit easier to create visualizations using PivotCharts, especially when you combine them with Slicers and Timelines (see How to create impressive Excel dashboards).
❌No direct access to the underlying data: With PivotTables, you can double-click on values to get a sheet containing a copy of the underlying data for that value. There is no similar possibility with PivotBy and GroupBy.
To summarize, we believe that most users will probably still rely on PivotTables because of their ease of use and intuitivity. Situations where PivotBy and GroupBy would perform better will definitey arise, but they may not be that frequent.