Aggregate functions

Andrey Aksenov

Contents

Overview
Usage examples

Aggregate functions compute a single result from a set of input values and are typically used within SELECT queries to summarize data across multiple rows. Common examples include calculating sums, counts, or averages.

Typical aggregate functions are SUM(), COUNT(), AVG(), MIN(), and MAX(). When combined with GROUP BY, these functions compute aggregated values for each group. The HAVING clause allows you to filter groups after aggregation, similar to how WHERE filters individual rows before grouping.

Most aggregate functions take a single argument, which is the column or expression to be aggregated:

<aggregate_name>(expression)

where:

aggregate_name — the name of the function (for example, SUM, AVG, COUNT) that performs the aggregation.
expression — the column or expression whose values are aggregated.

For example, SUM(quantity) calculates the total quantity by summing all values in the quantity column. COUNT(product) counts the number of non-null values in the product column.

Some aggregate functions accept multiple arguments:

<aggregate_name>(expression [, ...] )

For example, string_agg(<expression>, <delimiter>) can concatenate strings with the specified delimiter.

You can specify whether to aggregate all or distinct values and filter rows before aggregation using the following syntax:

<aggregate_name>( [ ALL | DISTINCT ] expression [, ...] ) [ FILTER ( WHERE <condition> ) ]

where:

ALL — (optional, default) includes all input values in the aggregation.
DISTINCT — (optional) aggregates only unique input values, ignoring duplicates.
FILTER (WHERE <condition>) — (optional) applies a condition to filter rows used in aggregation. When FILTER is used, only the input rows that satisfy the <condition> are included in the aggregate calculation.

An aggregate expression applies an aggregate function to a set of rows selected by a query. An aggregate function performs a calculation over multiple values and returns a single result, such as a sum, count, or average.

The syntax of an aggregate expression is one of the following:

<aggregate_name>(expression [, …]) [ FILTER (WHERE <condition>) ] — aggregates all non-null input values.
<aggregate_name>(ALL expression [, …]) [ FILTER (WHERE <condition>) ] — identical to the previous form; ALL is the default and aggregates all non-null input values.
<aggregate_name>(DISTINCT expression [, …]) [ FILTER (WHERE <condition>) ] — aggregates only distinct non-null input values.
<aggregate_name>(*) [ FILTER (WHERE <condition>) ] — aggregates all rows, including those with null values. Typically used with COUNT(*) to count rows.
<aggregate_name>( [ expression [, … ] ] ) WITHIN GROUP ( <order_by_clause> ) [ FILTER ( WHERE <condition> ) ] — ordered-set aggregate form, where input values are sorted before aggregation.

An aggregate expression can appear only in the result list or HAVING clause of the SELECT command. It is not allowed in clauses like WHERE because those are evaluated before aggregate results are calculated. This restriction applies to the query level to which the aggregate belongs.
When an aggregate expression is used in a subquery, it typically operates over the rows produced by that subquery. However, if the aggregate’s arguments reference only columns from an outer query, the aggregate is evaluated at the outer query level. In such cases, the subquery treats the aggregate result as a constant value.
Using an aggregate function as an argument to another aggregate function is not allowed.
Window functions cannot be passed as arguments to aggregate functions.

Greengage DB allows you to use aggregate functions supported by PostgreSQL, for example:

SUM() — returns the sum of values.
AVG() — returns the average value.
COUNT() — returns the number of values.
MAX() — returns the maximum value.
MIN() — returns the minimum value.

To view the complete list of aggregate functions, see the corresponding topic in the PostgreSQL documentation: Aggregate Functions.

In addition to PostgreSQL-specific aggregate functions, Greengage DB provides a set of advanced analytics functions listed in the table below.

Function Return type Description

Function	Return type	Description
MEDIAN(expr)	`timestamp`, `timestamptz`, `interval`, `float`	Calculates the median value. Accepts two-dimensional arrays and treats them as matrices
SUM(array[])	`smallint[]`, `int[]`, `bigint[]`, `float[]`	Computes the element-wise sum of a two-dimensional numeric array (matrix summation)
PIVOT_SUM(label[], label, expr)	`int[]`, `bigint[]`, `float[]`	Performs a pivot aggregation by summing values grouped by labels. If duplicate labels exist, their values are summed
UNNEST(array[])	set of `anyelement`	Expands an array into a set of rows, returning each element individually

MEDIAN(expr)

timestamp, timestamptz, interval, float

Calculates the median value. Accepts two-dimensional arrays and treats them as matrices

SUM(array[])

smallint[], int[], bigint[], float[]

Computes the element-wise sum of a two-dimensional numeric array (matrix summation)

PIVOT_SUM(label[], label, expr)

int[], bigint[], float[]

Performs a pivot aggregation by summing values grouped by labels. If duplicate labels exist, their values are summed

UNNEST(array[])

set of anyelement

Expands an array into a set of rows, returning each element individually

NOTE

You can create custom aggregate functions using CREATE AGGREGATE.

Prerequisites

To follow the examples in this topic, connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql. Create a new database and connect to it:

CREATE DATABASE marketplace;
\c marketplace

Then, create a table and populate it with data:

Create a table

CREATE TABLE sales
(
    id           INT,
    product_name TEXT,
    category     TEXT,
    quantity     INT,
    price        NUMERIC(8, 2),
    sale_date    DATE,
    region       TEXT
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (product_name);

INSERT INTO sales
VALUES (1, 'Laptop', 'Electronics', 1, 1200.00, '2025-01-03', 'North'),
       (2, 'Headphones', 'Electronics', 2, 150.00, '2025-01-05', 'North'),
       (3, 'Coffee Maker', 'Home', 1, 85.00, '2025-01-07', 'South'),
       (4, 'T-Shirt', 'Clothing', 3, 25.00, '2025-01-10', 'East'),
       (5, 'Desk Chair', 'Home', 1, 200.00, '2025-01-15', 'South'),
       (6, 'Smartphone', 'Electronics', 2, 800.00, '2025-01-22', 'West'),
       (7, 'Headphones', 'Electronics', 1, 150.00, '2025-02-01', 'North'),
       (8, 'T-Shirt', 'Clothing', 2, 25.00, '2025-02-03', 'East'),
       (9, 'Laptop', 'Electronics', 1, 1200.00, '2025-02-07', 'North'),
       (10, 'Blender', 'Home', 1, 55.00, '2025-02-10', 'South'),
       (11, 'Jacket', 'Clothing', 2, 60.00, '2025-02-13', 'East'),
       (12, 'Smartphone', 'Electronics', 1, 800.00, '2025-02-18', 'West'),
       (13, 'Coffee Maker', 'Home', 1, 85.00, '2025-03-01', 'South'),
       (14, 'Desk Chair', 'Home', 2, 200.00, '2025-03-05', 'South'),
       (15, 'Jacket', 'Clothing', 1, 60.00, '2025-03-07', 'East'),
       (16, 'Headphones', 'Electronics', 3, 150.00, '2025-03-10', 'North'),
       (17, 'Laptop', 'Electronics', 1, 1200.00, '2025-03-15', 'North'),
       (18, 'T-Shirt', 'Clothing', 2, 25.00, '2025-03-20', 'East'),
       -- The following two rows have NULL region values
       (19, 'Tablet', 'Electronics', 1, 500.00, '2025-03-21', NULL),
       (20, 'Office Lamp', 'Home', 2, 45.00, '2025-03-22', NULL);

This query uses the SUM() aggregate function to calculate the total quantity of all sales across the table:

SELECT SUM(quantity) AS total_quantity
FROM sales;

The result looks as follows:

 total_quantity
----------------
             31
(1 row)

The following query uses the SUM() function to calculate the total sales amount by summing the results of the expression price * quantity for all rows in the table:

SELECT SUM(price * quantity) AS total_sales
FROM sales;

The result looks like this:

 total_sales
-------------
     8670.00
(1 row)

This query sums the quantity of items sold for each category using GROUP BY:

SELECT category,
       SUM(quantity) AS total_quantity
FROM sales
GROUP BY category;

The output is as follows:

  category   | total_quantity
-------------+----------------
 Home        |              8
 Clothing    |             10
 Electronics |             13
(3 rows)

The next query filters aggregated results using HAVING and returns categories where the total quantity sold is equal to or greater than 10:

SELECT category,
       SUM(quantity) AS total_quantity
FROM sales
GROUP BY category
HAVING SUM(quantity) >= 10;

The result is as follows:

  category   | total_quantity
-------------+----------------
 Electronics |             13
 Clothing    |             10
(2 rows)

This query counts all rows in the sales table, including rows with NULL values in any column:

SELECT COUNT(*) AS total_row_count
FROM sales;

The result looks like this:

 total_row_count
-----------------
              20
(1 row)

The following query counts the number of rows where the region column is not NULL, including all non-unique values:

SELECT COUNT(region) AS non_null_region_count
FROM sales;

The result is as follows:

 non_null_region_count
-----------------------
                    18
(1 row)

This query performs the same operation as the previous one, but explicitly includes the ALL keyword:

SELECT COUNT(ALL region) AS non_null_region_count
FROM sales;

This query counts the number of unique non-null values in the region column, showing how many different regions appear in the sales data:

SELECT COUNT(DISTINCT region) AS distinct_region_count
FROM sales;

The output should look like this:

 distinct_region_count
-----------------------
                     4
(1 row)

This query counts how many rows have the region equal to 'North' by using the FILTER clause to include only those rows in the count:

SELECT COUNT(*) FILTER (WHERE region = 'North') AS north_region_count
FROM sales;

The result looks like this:

 north_region_count
--------------------
                  6
(1 row)

This query uses SUM() to calculate the total quantity sold for each product category:

SELECT category,
       SUM(quantity) AS total_quantity
FROM sales
GROUP BY category;

Output

  category   | total_quantity
-------------+----------------
 Home        |              8
 Electronics |             13
 Clothing    |             10
(3 rows)

The following query calculates the average price of products within each category using the AVG() function:

SELECT category,
       AVG(price) AS avg_price
FROM sales
GROUP BY category;

Output

  category   |      avg_price
-------------+----------------------
 Electronics | 683.3333333333333333
 Clothing    |  39.0000000000000000
 Home        | 111.6666666666666667
(3 rows)

Use the ROUND() function to round numeric values to a specified number of decimal places.

This query counts the number of sales records for each category using the COUNT() function:

SELECT category,
       COUNT(*) AS sales_count
FROM sales
GROUP BY category;

Output

  category   | sales_count
-------------+-------------
 Home        |           6
 Clothing    |           5
 Electronics |           9
(3 rows)

The next query uses MAX() to find the highest price for each product category:

SELECT category,
       MAX(price) AS max_price
FROM sales
GROUP BY category;

Output

  category   | max_price
-------------+-----------
 Clothing    |     60.00
 Electronics |   1200.00
 Home        |    200.00
(3 rows)

The query below returns the lowest price within each product category using the MIN() function:

SELECT category,
       MIN(price) AS min_price
FROM sales
GROUP BY category;

Output

  category   | min_price
-------------+-----------
 Electronics |    150.00
 Home        |     45.00
 Clothing    |     25.00
(3 rows)

This query calculates the median price for each product category:

SELECT category,
       MEDIAN(price) AS median_price
FROM sales
GROUP BY category;

Output

  category   | median_price
-------------+--------------
 Home        |           85
 Clothing    |           25
 Electronics |          800
(3 rows)

The following query finds the median sale date for each product category:

SELECT category,
       MEDIAN(sale_date) AS median_sale_date
FROM sales
GROUP BY category;

Output

  category   |    median_sale_date
-------------+------------------------
 Clothing    | 2025-02-13 00:00:00+00
 Electronics | 2025-02-07 00:00:00+00
 Home        | 2025-02-19 12:00:00+00
(3 rows)

This query aggregates the total quantity sold per product. It breaks down the quantities by region using the PIVOT_SUM() function, which sums values grouped by specified region labels:

SELECT product_name,
       PIVOT_SUM(ARRAY ['North', 'South', 'East', 'West'], region, quantity) AS quantity_per_region
FROM sales
WHERE region IS NOT NULL
GROUP BY product_name;

Output

 product_name | quantity_per_region
--------------+---------------------
 Smartphone   | {0,0,0,3}
 Laptop       | {3,0,0,0}
 Headphones   | {6,0,0,0}
 T-Shirt      | {0,0,7,0}
 Desk Chair   | {0,3,0,0}
 Blender      | {0,1,0,0}
 Jacket       | {0,0,3,0}
 Coffee Maker | {0,2,0,0}
(8 rows)

Create a table

To follow the examples in this section, create the metrics_data table:

CREATE TABLE metrics_data
(
    id     SERIAL,
    values FLOAT8[][]
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);

Insert data:

INSERT INTO metrics_data (values)
VALUES (ARRAY [[1.5, 2.0], [3.1, 4.0]]),
       (ARRAY [[0.5, 1.0], [1.9, 2.0]]);

This query calculates the total sum of all elements in the two-dimensional arrays stored in the values column of the metrics_data table, summing all matrices across the rows:

SELECT SUM(values) AS total_array_sum
FROM metrics_data;

Output

 total_array_sum
-----------------
 {{2,3},{5,6}}
(1 row)

This query demonstrates how to use the UNNEST() function to expand a two-dimensional array column into individual scalar values, returning each element as a separate row:

SELECT UNNEST(values) AS value
FROM metrics_data;

Output

 value
-------
   1.5
     2
   3.1
     4
   0.5
     1
   1.9
     2
(8 rows)

Combine queries

COPY command

Aggregate functions

Overview

Syntax overview

Aggregate expressions

Limitations of aggregate expressions

Built-in aggregate functions

Usage examples

Basic aggregate queries

Aggregate expressions

General-purpose aggregate functions

Advanced analytics functions

Advanced analytics functions: 2D arrays