How to use the new PivotBy and GroupBy functions in Excel

Table of contents

Intro
- What are the PivotBy and GroupBy functions about?
- What are array formulas?
How to use the PivotBy function in Excel
How to use the GroupBy function in Excel
How do the PivotBy and GroupBy functions compare to Excel's PivotTables?

INTRODUCTION

Since their first introduction in February 2024, the Excel functions PivotBy and GroupBy have quickly attracted and excited Excel users around the world.

Yes, believe it or not, people can get super excited over Excel functions

What are the PivotBy and GroupBy functions about?

PivotBy and GroupBy allow you to aggregate and display data dynamically and flexibly, in ways that are very similar to what conventional PivotTables can do, but freeing you from certain constraints associated with PivotTables.

While PivotTables are often perceived as powerful but inflexible when it comes to manipulating large quantities of data, PivotBy and GroupBy offer an alternative that allows users to structure their data directly with formulas.

In this article, we will explore their usefulness, detail their syntax, and provide concrete examples of how to harness their potential.

About array formulas

Before diving into the heart of our subject, let's first clarify that PivotBy and GroupBy are array formulas.

Despite having been out there since 2019, many Excel users are still unaware of array formulas. Simply put, an array formula is a single formula that can process a whole group of data—like rows or columns—simultaneously, spilling over a variable number of rows and columns. they bring powerful capabilities and a lot of flexibility to handle complex tasks with modern Excel.

If you want to know more about array formulas, feel free to have a look at those 2 articles:

How to use the PivotBy function in Excel

What is PivotBy?

Excel's PivotBy function dynamically groups and summarizes data according to one or more columns. It is particularly useful for quickly organizing data and obtaining results without the need for a conventional PivotTable.

Syntax

Argument	Description
row_fields	Required. A column-based array or range that holds the values used to group rows and create row headers.This range determines how the data is structured and displayed across the rows.
col_fields	Required. A range of values designated for grouping data into columns. This range determines how the data is structured and displayed across the columns.
values	Required. A column-oriented array or range containing the data to be aggregated. This range serves as the input for calculations or summaries, such as sums or averages, based on the specified grouping criteria.
function	Required. A specific or simplified lambda expression (such as SUM, PERCENTOF, AVERAGE, COUNT, etc.) used to aggregate values. This defines the method for calculating or summarizing the grouped data within the array or range.
field_header	A numerical value that determines if the row_fields and values include headers and whether field headers should appear in the results. The possible values are: Missing (default): Automatic behavior. 0: No headers. 1: Yes, but do not display them. 2: No headers, but generate them. 3: Yes, and display them. This range is used to determine how the data is grouped, aggregated, and displayed in the output.
row_total_depth	This setting specifies whether the row headers should include totals and the type of totals to display. The possible values are: Missing (default): Automatic, with grand totals and, where applicable, subtotals. 0: No totals displayed. 1: Grand totals only. 2: Both grand totals and subtotals. -1: Grand totals displayed at the top. -2: Both grand totals and subtotals displayed at the top
row_sort_order	A numerical value that determines how rows are sorted. The numbers correspond to the columns in row_fields, followed by the columns in values. If the value is negative, the rows are sorted in descending or reverse order.This setting is used to control the order in which rows are displayed in the output.
col_total_depth	This option determines whether column headers should include totals and the type of totals to display. The possible values are: Missing (default): Automatic, with grand totals and, where applicable, subtotals. 0: No totals displayed. 1: Grand totals only. 2: Both grand totals and subtotals. -1: Grand totals displayed at the top. -2: Both grand totals and subtotals displayed at the top.
col-sort order	A numeric value specifying the sorting order for rows. The value corresponds to columns in col_fields followed by columns in values. Negative numbers indicate sorting in descending or reverse order. This setting is used to control the sorting order of rows in the output.
filter_array	A column-oriented 1D array of boolean values that determines whether each corresponding row of data is included in the output. This array allows you to filter data by defining which rows should be considered, making it easier to focus on specific subsets of information.
relative_to	When using an aggregation function that requires two arguments, relative_to specifies the values provided as the second argument. This is commonly used with the PERCENTOF function. The available options are: 0: Total of all columns (default). 1: Total of all rows. 2: Grand totals. 3: Total of the parent column. 4: Total of the parent row. The relative_to setting allows you to control the reference values used in calculations for aggregation functions, such as PERCENTOF.

This table seems awfully complicated, so let's check some concrete examples to make things clearer.

Example 1: Generating the total sales reporting per product per year

Group the products in rows, group the years in columns and we agregated the sales by using the SUM function with Pivotby in Excel

For now, let's use only mandatory fields and leave aside optional fields.

In the above example; PivotBy groups the Products in rows, groups the Years in columns and, then agregates the values from the Sales column with a SUM function.

As you can see, by default Excel adds totals for both rows and columns.

Example 2: Filtering out specific rows with both grand totals and subtotals

Now let's say we want to exclude bananas from the sum of yearly sales per products, we thus need to use the [filter_array] argument.

exclude bananas from the sum of yearly sales per products, we thus need to use the filter_array argument and can combine it with logical operators with Pivotby

You can do the same using the usual logical operators. In the example above, the "<>" sign means "different from".

Example 3: Sorting results

Let's saw we now want to sort the results by ascending order. To achieve this, we can use the [row_sort_order] or [col_sort_order] arguments.

Sort the results by aggregated values to provide more meaningful insights with Pivotby on Excel. To achieve this, you can modify the row_sort_order or col_sort_order argument as needed.

In the example above, we use the [row_sort_order] argument to sort our data by descending order using the "-2" parameter, as seen in the syntax section.

As a result, the values in the total column will be sorted from largest to smallest, providing a clear view of which products are generating the most revenue. If you want to sort instead by ascending order, we use "2" instead of "-2".

How to use the GroupBy function in Excel

What is GroupBy?

The Excel GroupBy function groups data into a defined table or range, applying customized aggregations. It allows you to structure complex data while calculating metrics tailored to each group.

Why use GroupBy?

In-depth analysis: It's perfect for creating advanced aggregations, such as sum and average on the same groups, or even calculations on sub-groups.
Customization: Unlike PivotBy, GroupBy gives you more freedom to define your aggregation criteria and manage multiple grouping columns.
Strength: By manipulating an entire table, it enables you to create rich, detailed reports in a single formula.

Syntax

The only things that differs from the Pivotby's syntax is the argument [field_relationship] which defines the relationship between fields when multiple columns are used in row_fields. The options are:

0: Hierarchy (default): Later field columns are sorted based on the hierarchy of earlier columns.
1: Table: Each field column is sorted independently. Subtotals are not supported in this mode, as they depend on hierarchical data.

But also the argument col_fields that is not present within Groupby's syntax. Meaning that a range of values cannot be designated for grouping data into columns, we can only chose values to be grouped in rows and thus create row headers.

By selecting the appropriate relationship, you can control how the data is grouped and sorted, providing more flexibility in organizing and analyzing your dataset.

Example 1: Generate the total sales by year

Group the total of sales by year with the Goupby function in Excel.

In the above example, we simply grouped the total of sales by year.

Example 2: Apply multiple aggregation functions simultaneously to the same dataset

compare different aggregations within the same table. To analyze the same metric using different aggregations, you can position these aggregations in adjacent rows or columns to view all summaries side by side. Use HSTACK for horizontal arrangements or VSTACK for vertical arrangements to achieve this layout.

Sometimes, it is necessary to compare different aggregations within the same table. To summarize the same data using different aggregations, you can position these aggregations in adjacent rows or columns to view it all side by side. Use HSTACK for horizontal arrangements or VSTACK for vertical arrangements to achieve this layout.

Let's push again the example further and sort the aggregated values with [row_sort_order]. We use the exact same technic as previously seen with Pivotby:

sort the aggregated values with row_sort_order on Excel with the new GroupBy function

As a result, the values in the total column will be sorted from largest to smallest providing a clear view of which year represented contributed the largest share of sales.

N.B., if you want to compare this last example with how it would look with Pivotby, here is our table when we use HSTACK as a function within Pivotby's syntax:

Here, we have the detail per product per year.

How do the PivotBy and GroupBy functions compare to Excel's PivotTables?

What we like about PivotBy:

✅Flexibility: You can easily configure (or change) different aggregators (sum, average, count), and you can easily add extra dimensions for more detailed analysis.

✅Dynamicity: Results update automatically as the source data changes. No need to manually trigger a refresh of your PivotTable for instance.

✅Visualization: Functions (including array functions) can be used to create charts of variable size. With PivotTables, certain chart types are not allowed (Maps or Waterfall charts for instance), but using formulas can remove these limitations.

✅Filtering: If you want to filter data based on a cell or range from another sheet, no problem you can just use the reference in the filter argument. For PivotTable, it would be much more complicated to get a similar result.

Yet there are some significant limitations:

❌Complexity: Maybe it just takes a while to get used to it, but it feels more complex to use PivotBy compared to a simple PivotTable. When you use a function, you need to get the function right so it doesn't return an error, while with a PivotTable you can more easily adjust and complexify your report along the way.

❌Managing multiple fields: Using these functions when working with a multiple-fields report can quickly become very complex, while all it takes for PivotTables is the drag and drop of a new field.

❌Formatting: With an array function like PivotBy, the results are dynamic, but the formatting is not. With a PivotTable, it would be much easier to apply a specific format to all headers and totals for instance.

❌Visualization: Except for the chart types mentioned above, it's probably a bit easier to create visualizations using PivotCharts, especially when you combine them with Slicers and Timelines (see How to create impressive Excel dashboards).

❌No direct access to the underlying data: With PivotTables, you can double-click on values to get a sheet containing a copy of the underlying data for that value. There is no similar possibility with PivotBy and GroupBy.

To summarize, we believe that most users will probably still rely on PivotTables because of their ease of use and intuitivity. Situations where PivotBy and GroupBy would perform better will definitey arise, but they may not be that frequent.