Analyze queries

Andrey Aksenov

Contents

EXPLAIN
EXPLAIN ANALYZE
Analyze SELECT queries
- Filtering
- Sorting
- Grouping
- Indexes
- Partitioned tables
- Joins
EXPLAIN options
- Output formats
Analyze data-modifying queries
Diagnose query performance issues

Greengage DB devises a query plan for each query. Selecting the most appropriate plan based on the query and data structure is necessary for optimal performance. A query plan defines how Greengage DB executes the query in a parallel processing environment.

The query optimizer uses statistics to select a plan with the lowest estimated cost. Use the EXPLAIN command to view the execution plan for a query, whether it is a SELECT or a data-modifying statement (INSERT, UPDATE, DELETE). EXPLAIN displays the query optimizer’s estimated cost and the strategy chosen to execute the query.

NOTE

In Greengage DB, the GPORCA optimizer coexists with the Postgres planner. EXPLAIN output may differ depending on which optimizer is used. By default, Greengage DB relies on GPORCA to generate execution plans when possible.

Prerequisites

To follow the examples in this topic, connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql. Create a new database and connect to it:

CREATE DATABASE marketplace;
\c marketplace

Then, create a table and populate it with data:

CREATE TABLE sales
(
    id     INT,
    date   DATE,
    region TEXT,
    amount DECIMAL(10, 2)
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);

INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

The EXPLAIN command displays the execution plan for a query along with the optimizer’s estimated cost. This query uses EXPLAIN to display the execution plan for selecting data from the sales table:

EXPLAIN
SELECT date, amount
FROM sales;

The output shows the query plan, including the execution steps, estimated costs, and the used optimizer:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..432.71 rows=40000 width=10)
   ->  Seq Scan on sales  (cost=0.00..431.18 rows=10000 width=10)
 Optimizer: Pivotal Optimizer (GPORCA)
(3 rows)

Read plans from the bottom to the top: each node feeds rows into the node directly above it:

Seq Scan — reads all rows from the sales table sequentially, with each segment processing approximately 10000 rows.
Gather Motion — collects rows from all segments (a total of 40000 rows) and combines them on the master.

The execution plan is divided into slices, each representing a separate process on a segment. A slice is an independent part of the query plan that executes in parallel and passes its results to a higher-level slice.

A query plan is a tree of nodes. Each node in the plan represents a single operation, such as a table scan, join, aggregation, or sort:

Bottom nodes

The bottom nodes of a plan are typically table scan operations, including sequential, index, or bitmap index scans. See the Scan nodes section below.
Intermediate nodes

If the query requires joins, aggregations, sorts, or other operations on the rows, additional nodes are placed above the scan nodes to perform these operations. See the Join nodes and Other nodes sections below.
Topmost nodes

The topmost plan nodes are usually Greengage DB motion nodes, such as Redistribute Motion or Gather Motion. These operations move rows between segment instances during query processing. See the Motion nodes section below.

The output of EXPLAIN also shows execution cost estimates for each plan node. Learn more in the Execution cost estimates section below.

Scan operators read rows from a table to identify those that match a query. The specific scan operator is chosen based on the query and the way the data is accessed. Common scan operators include:

Seq Scan — reads all rows in the table sequentially.
Index Scan — uses an index to locate and fetch specific rows.
Bitmap Heap Scan — collects row pointers from an index and accesses the table rows in disk order. Note that this operator is called Bitmap Heap Scan even for append-optimized tables.
Dynamic Seq Scan — scans only the relevant partitions.

Join operators include:

Hash Join — constructs a hash table from the smaller table using the join column(s) as the hash key (hash join). It then scans the larger table, computes the hash key for its join column(s), and looks in the hash table for matching rows. Hash joins are usually the fastest join method in Greengage DB. Hash Cond in the EXPLAIN plan shows the columns used in the join.
Nested Loop — iterates through rows in the larger dataset, scanning the rows in the smaller dataset for each iteration (nested loop join). This join requires broadcasting one of the tables so that all rows in one table can be compared to all rows in the other table.

Nested loop joins perform well for small tables or when an index can limit the number of rows scanned. It is also used for Cartesian and range joins. There are performance implications when using the nested loop join with large tables. For plan nodes that contain a Nested Loop join operator, validate the SQL and ensure that the results match the intended outcome. Ensure the enable_nestloop server configuration parameter is set to OFF (default) to favor hash joins.
Merge Join — sorts both datasets and merges them (merge join). Merge joins are efficient for pre-sorted data, but they are uncommon in real-world scenarios. To favor merge joins over hash joins, set the enable_mergejoin configuration parameter to ON.

Some query plan nodes perform motion operations, which move rows between segments when necessary to execute the query. The main motion operators are:

Broadcast Motion — each segment sends its rows to all other segments, so that every segment has a full copy of the table. The optimizer usually chooses this only for small tables, as it can be costly for large ones. If the join key does not match the distribution of the data, the system may dynamically redistribute rows from one table to another segment using Broadcast Motion.
Redistribute Motion — each segment rehashes the data and sends the rows to the appropriate segments according to the hash key.
Gather Motion — rows from all segments are collected into a single stream. This is typically the last step in a query plan.

The following are some examples of other operators that may appear in query plans:

Materialize — stores the result of a subquery, so that it can be reused without recomputation for each outer row (for example, when a CTE is referenced multiple times).
InitPlan — a pre-query executed at runtime, often used for dynamic partition elimination when the required partition values are not known until execution.
Sort — orders rows, usually as input to operations that require sorted data (for example, Aggregate or Merge Join).
GroupAggregate — groups rows based on one or more columns.
HashAggregate — computes aggregates using a hash-based method.
Append — concatenates multiple result sets, such as rows scanned from different partitions of a partitioned table.
Filter — selects rows according to a WHERE condition.
Limit — restricts the number of rows returned.

The output of EXPLAIN includes the following execution cost estimates for each plan node:

cost — measured in units of disk page fetches. 1.0 equals one sequential disk page read. The first estimate is the startup cost of getting the first row, and the second is the total cost of getting all rows. The total cost assumes all rows will be retrieved, which is not always true; for example, if the query uses LIMIT, not all rows are retrieved.

Cost is an internal optimizer number and should not be taken as a direct measure of query execution time.

NOTE

Cost values produced by GPORCA and the Postgres planner are not directly comparable. Each optimizer uses a different cost model. Costs are valid only for comparing alternative plans of the same query with the same optimizer.
rows — the estimated number of rows output by this plan node. This is usually less than the number of rows scanned or processed, reflecting the selectivity of any WHERE conditions. For the topmost node, the estimate should approximate the number of rows the query actually returns, updates, or deletes.
width — the estimated average size (in bytes) of a single row output by this plan node.

NOTE

Note the following:

The cost of a node includes the cost of its child nodes. The topmost plan node has the estimated total execution cost for the plan. This is the number the optimizer attempts to minimize.
The cost reflects only the aspects of plan execution considered by the optimizer. For example, it does not include time spent sending rows to the client.

The EXPLAIN ANALYZE command runs the query and shows the actual execution plan. This allows you to compare the actual performance with the optimizer’s estimates.

This query uses EXPLAIN ANALYZE to display the execution plan and run the query:

EXPLAIN ANALYZE
SELECT date, amount
FROM sales;

The result includes both the estimated plan and the actual execution statistics:

                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..432.71 rows=40000 width=10) (actual time=7.141..14.390 rows=40000 loops=1)
   ->  Seq Scan on sales  (cost=0.00..431.18 rows=10000 width=10) (actual time=0.116..2.266 rows=10055 loops=1)
 Planning time: 2.337 ms
   (slice0)    Executor memory: 74K bytes.
   (slice1)    Executor memory: 204K bytes avg x 4 workers, 204K bytes max (seg0).
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 21.573 ms
(8 rows)

EXPLAIN ANALYZE also shows:

The total execution time (in milliseconds) of the query.
The memory used by each slice of the query plan, and the memory reserved for the entire query.
The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
The maximum number of rows returned by a single segment in the operation. If multiple segments return the same number of rows, EXPLAIN ANALYZE shows the segment with the longest time to end.
The ID of the segment that produced the most rows for the operation.
For relevant operations, the amount of memory (work_mem) used. If work_mem was insufficient, the plan shows the amount of data spilled to disk for the lowest-performing segment. Learn more in EXPLAIN ANALYZE command.
The time (in milliseconds) taken by the segment that produced the most rows to return the first row and all rows. The time to the first row may be omitted if it matches the time to end.

IMPORTANT

Use EXPLAIN ANALYZE carefully:

It actually runs the query and generates the full workload on the system, which can impact cluster performance.
For INSERT, UPDATE, or DELETE statements, it modifies data; if you must analyze such queries, wrap them in a transaction that you can roll back. Learn more in Analyze data-modifying queries.

This example shows an EXPLAIN plan for a query filtering rows by region:

EXPLAIN
SELECT date, amount
FROM sales
WHERE region = 'Europe';

The output shows the query execution plan: the WHERE clause (region = 'Europe') is applied as a filter condition during a sequential scan of the sales table:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..432.34 rows=20075 width=10)
   ->  Seq Scan on sales  (cost=0.00..431.67 rows=5019 width=10)
         Filter: (region = 'Europe'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

The results of the sequential scan are passed to a Gather Motion operation. In this example, four segment instances send their data to the master instance.

The estimated startup cost is 0.00 (no cost), with a total cost of 432.34 disk page fetches. The optimizer estimates that the query will return 20075 rows.

This snippet uses EXPLAIN to display the execution plan for a query that filters rows with a WHERE clause and applies sorting with ORDER BY date DESC:

EXPLAIN
SELECT date, amount
FROM sales
WHERE region = 'Europe'
ORDER BY date DESC;

The Sort node indicates that after scanning and filtering the sales table, the matching rows are ordered based on the date column (Sort Key):

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..435.83 rows=20075 width=10)
   Merge Key: date
   ->  Sort  (cost=0.00..435.16 rows=5019 width=10)
         Sort Key: date
         ->  Seq Scan on sales  (cost=0.00..431.67 rows=5019 width=10)
               Filter: (region = 'Europe'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(7 rows)

This snippet uses EXPLAIN to show the execution plan for a query that groups rows by region and calculates the total sales amount with SUM(amount):

EXPLAIN
SELECT region,
       SUM(amount) AS total_amount
FROM sales
GROUP BY region;

The output contains multiple nodes, including:

HashAggregate — performs local aggregation on each segment.
Redistribute Motion — redistributes rows across segments to align groups.
GroupAggregate — finalizes the grouping and summarization on the sorted, redistributed data.

                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Gather Motion 4:1  (slice2; segments: 4)  (cost=0.00..432.67 rows=2 width=14)
   ->  GroupAggregate  (cost=0.00..432.67 rows=1 width=14)
         Group Key: region
         ->  Sort  (cost=0.00..432.67 rows=1 width=14)
               Sort Key: region
               ->  Redistribute Motion 4:4  (slice1; segments: 4)  (cost=0.00..432.67 rows=1 width=14)
                     Hash Key: region
                     ->  Result  (cost=0.00..432.67 rows=1 width=14)
                           ->  HashAggregate  (cost=0.00..432.67 rows=1 width=14)
                                 Group Key: region
                                 ->  Seq Scan on sales  (cost=0.00..431.19 rows=10000 width=12)
 Optimizer: Pivotal Optimizer (GPORCA)
(12 rows)

The query below creates a bitmap index on the region column:

CREATE INDEX sales_region_idx ON sales USING bitmap (region);

After creating an index, you need to collect statistics via ANALYZE:

ANALYZE;

The snippet uses EXPLAIN to show the execution plan for a query that calculates the average amount for rows filtered by the specified region:

EXPLAIN
SELECT AVG(amount) AS avg_amount
FROM sales
WHERE region = 'Asia';

The result should include Bitmap Index Scan on the sales_region_idx index:

                                           QUERY PLAN
-------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..430.71 rows=1 width=8)
   ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..430.71 rows=1 width=8)
         ->  Aggregate  (cost=0.00..430.71 rows=1 width=8)
               ->  Bitmap Heap Scan on sales  (cost=0.00..430.66 rows=4998 width=6)
                     Recheck Cond: (region = 'Asia'::text)
                     ->  Bitmap Index Scan on sales_region_idx  (cost=0.00..0.00 rows=0 width=0)
                           Index Cond: (region = 'Asia'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(8 rows)

Sample table

CREATE TABLE sales
(
    id     INT,
    date   DATE,
    region TEXT,
    amount DECIMAL(10, 2)
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id)
    PARTITION BY RANGE (date)
        (
        START (DATE '2025-01-01') INCLUSIVE
            END (DATE '2025-04-01') EXCLUSIVE
            EVERY (INTERVAL '1 month')
        );

INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

In Greengage DB, partitioned tables enable the query planner to skip irrelevant partitions, resulting in significant performance improvements. In this example, the WHERE condition filters dates that fall within the earliest partition (January 2025):

EXPLAIN
SELECT AVG(amount) AS avg_amount
FROM sales
WHERE date < '2025-01-15';

The plan shows that only one partition of the sales table is scanned. This is done using Dynamic Seq Scan, which applies the filter condition to the selected partition:

                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=0.00..431.00 rows=1 width=8)
   ->  Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..431.00 rows=1 width=8)
         ->  Result  (cost=0.00..431.00 rows=1 width=8)
               ->  Sequence  (cost=0.00..431.00 rows=1 width=12)
                     ->  Partition Selector for sales (dynamic scan id: 1)  (cost=10.00..100.00 rows=25 width=4)
                           Partitions selected: 1 (out of 3)
                     ->  Dynamic Seq Scan on sales (dynamic scan id: 1)  (cost=0.00..431.00 rows=1 width=12)
                           Filter: (date < '2025-01-15'::date)
 Optimizer: Pivotal Optimizer (GPORCA)
(9 rows)

Sample tables

Create a table to store the different product categories:

CREATE TABLE categories
(
    category_code TEXT PRIMARY KEY,
    category_name TEXT
)
    DISTRIBUTED REPLICATED;

Insert data:

INSERT INTO categories (category_code, category_name)
SELECT 'cat' || gs.id,
       'Category ' || gs.id
FROM generate_series(1, 100) AS gs(id);

Create a table to hold information about each product, including the category it belongs to:

CREATE TABLE products
(
    id            INT,
    product_name  TEXT,
    category_code TEXT REFERENCES categories (category_code)
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);

Insert data:

INSERT INTO products (id, product_name, category_code)
SELECT gs.id,
       'Product ' || gs.id,
       'cat' || ((gs.id % 100) + 1)
FROM generate_series(1, 1000) AS gs(id);

Create a table holding individual sales transactions:

CREATE TABLE sales
(
    id         INT,
    product_id INT,
    date       DATE
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (product_id);

Insert data:

INSERT INTO sales (id, product_id, date)
SELECT gs.id,
       ((gs.id % 1000) + 1),
       DATE '2025-01-01' + (gs.id % 90)
FROM generate_series(1, 40000) AS gs(id);

This EXPLAIN generates the query plan for a cross join between categories and products:

EXPLAIN
SELECT categories.category_name,
       products.product_name
FROM categories
         CROSS JOIN products;

The query plan uses a Nested Loop join, comparing each row from one table with all rows of the other table:

                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..1326704.61 rows=100000 width=22)
   ->  Nested Loop  (cost=0.00..1326697.25 rows=25000 width=22)
         Join Filter: true
         ->  Seq Scan on categories  (cost=0.00..431.00 rows=100 width=11)
         ->  Seq Scan on products  (cost=0.00..431.00 rows=250 width=11)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

This EXPLAIN generates the query plan for an inner join between products and sales:

EXPLAIN
SELECT products.id,
       products.product_name,
       sales.id   AS sale_id,
       sales.date AS sale_date
FROM products
         JOIN sales ON products.id = sales.product_id;

The query plan shows a Hash Join, where products is scanned and hashed, then matched with rows from sales to combine only the rows with equal id values:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..868.18 rows=40000 width=23)
   ->  Hash Join  (cost=0.00..865.10 rows=10000 width=23)
         Hash Cond: (sales.product_id = products.id)
         ->  Seq Scan on sales  (cost=0.00..431.14 rows=10000 width=12)
         ->  Hash  (cost=431.00..431.00 rows=250 width=15)
               ->  Seq Scan on products  (cost=0.00..431.00 rows=250 width=15)
 Optimizer: Pivotal Optimizer (GPORCA)
(7 rows)

Sample table

CREATE TABLE sales
(
    id     INT,
    date   DATE,
    region TEXT,
    amount DECIMAL(10, 2)
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);

INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

Apart from the ANALYZE option, which executes the query to collect runtime statistics, EXPLAIN supports several other options. These options control the level of detail displayed in the plan, the output format, and whether specific information is shown.

The COSTS OFF option hides estimated execution costs in the query plan:

EXPLAIN (COSTS OFF)
SELECT date, amount
FROM sales
WHERE region = 'Asia';

The result looks like this:

                QUERY PLAN
------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)
   ->  Seq Scan on sales
         Filter: (region = 'Asia'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

The VERBOSE option shows detailed information about the query plan, including internal node identifiers, relation names, and column details:

EXPLAIN (VERBOSE)
SELECT date, amount
FROM sales
WHERE region = 'Asia';

The result looks as follows:

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)  (cost=0.00..432.33 rows=19973 width=10)
   Output: date, amount
   ->  Seq Scan on public.sales  (cost=0.00..431.66 rows=4994 width=10)
         Output: date, amount
         Filter: (sales.region = 'Asia'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

NOTE

By enabling additional configuration parameters, such as explain_memory_verbosity and gp_enable_explain_allstat, and using EXPLAIN (ANALYZE, VERBOSE) instead of EXPLAIN ANALYZE, you can obtain more detailed information about memory usage and query execution.

You can combine several EXPLAIN options:

EXPLAIN (COSTS OFF, VERBOSE)
SELECT date, amount
FROM sales
WHERE region = 'Asia';

The result should look as follows:

                  QUERY PLAN
-----------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)
   Output: date, amount
   ->  Seq Scan on public.sales
         Output: date, amount
         Filter: (sales.region = 'Asia'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

You can configure EXPLAIN to output the query plan in various formats, including JSON, XML, or YAML. In the example below, FORMAT JSON is used to produce the plan in JSON format:

EXPLAIN (FORMAT JSON)
SELECT date, amount
FROM sales
WHERE region = 'Asia';

The result looks like this:

                   QUERY PLAN
-------------------------------------------------
 [                                              +
   {                                            +
     "Plan": {                                  +
       "Node Type": "Gather Motion",            +
       "Senders": 4,                            +
       "Receivers": 1,                          +
       "Slice": 1,                              +
       "Segments": 4,                           +
       "Gang Type": "primary reader",           +
       "Startup Cost": 0.00,                    +
       "Total Cost": 432.33,                    +
       "Plan Rows": 19972,                      +
       "Plan Width": 10,                        +
       "Plans": [                               +
         {                                      +
           "Node Type": "Seq Scan",             +
           "Parent Relationship": "Outer",      +
           "Slice": 1,                          +
           "Segments": 4,                       +
           "Gang Type": "primary reader",       +
           "Relation Name": "sales",            +
           "Alias": "sales",                    +
           "Startup Cost": 0.00,                +
           "Total Cost": 431.66,                +
           "Plan Rows": 19972,                  +
           "Plan Width": 10,                    +
           "Filter": "(region = 'Asia'::text)"  +
         }                                      +
       ]                                        +
     },                                         +
     "Settings": {                              +
       "Optimizer": "Pivotal Optimizer (GPORCA)"+
     }                                          +
   }                                            +
 ]
(1 row)

Sample table

CREATE TABLE sales
(
    id     INT,
    date   DATE,
    region TEXT,
    amount DECIMAL(10, 2)
)
    DISTRIBUTED BY (id);

INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

EXPLAIN can also be used with data-modifying statements, such as INSERT, UPDATE, or DELETE. In the example below, it shows the plan for inserting rows into the table without actually executing the insert:

EXPLAIN
INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

The query plan looks as follows:

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Insert  (cost=0.00..27.42 rows=250 width=24)
   ->  Result  (cost=0.00..0.08 rows=250 width=28)
         ->  Result  (cost=0.00..0.05 rows=250 width=24)
               ->  Result  (cost=0.00..0.01 rows=250 width=4)
                     ->  Function Scan on generate_series  (cost=0.00..0.00 rows=250 width=4)
 Optimizer: Pivotal Optimizer (GPORCA)
(6 rows)

To analyze a data-modifying query without changing your tables, run it inside a transaction and roll it back afterward:

BEGIN;

EXPLAIN ANALYZE
INSERT INTO sales (id, date, region, amount)
SELECT gs.id,
       DATE '2025-01-01' + (gs.id % 90),
       CASE WHEN gs.id % 2 = 0 THEN 'Asia' ELSE 'Europe' END,
       round((random() * 1000)::NUMERIC, 2)
FROM generate_series(1, 40000) AS gs(id);

ROLLBACK;

The result is as follows:

BEGIN
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Insert  (cost=0.00..27.42 rows=250 width=24) (actual time=3.567..55.458 rows=10055 loops=1)
   ->  Result  (cost=0.00..0.08 rows=250 width=28) (actual time=3.211..49.076 rows=10055 loops=1)
         ->  Result  (cost=0.00..0.05 rows=250 width=24) (actual time=3.209..38.545 rows=10055 loops=1)
               ->  Result  (cost=0.00..0.01 rows=250 width=4) (actual time=3.179..11.862 rows=10055 loops=1)
                     ->  Function Scan on generate_series  (cost=0.00..0.00 rows=250 width=4) (actual time=3.174..6.194 rows=40000 loops=1)
 Planning time: 4.740 ms
   (slice0)    Executor memory: 2745K bytes avg x 4 workers, 2745K bytes max (seg0).  Work_mem: 1763K bytes max.
 Memory used:  128000kB
 Optimizer: Pivotal Optimizer (GPORCA)
 Execution time: 67.447 ms
(10 rows)

ROLLBACK

If a query performs poorly, examine its query plan and ask the following questions:

Do operations in the plan take an exceptionally long time?

Look for an operation that consumes the majority of query processing time. For example, if an index scan takes longer than expected, the index could be out-of-date and needs to be reindexed. You can also adjust enable_<operator> parameters (for example, enable_hashjoin, enable_mergejoin) to see if you can force the Postgres planner to choose a different plan by deactivating a particular query plan operator for that query.
Does the query planning time exceed query execution time?

When a query joins many tables, the Postgres planner uses a dynamic algorithm that partly depends on the number of joins. You can reduce the amount of time that the Postgres planner spends planning the query by setting the join_collapse_limit and from_collapse_limit server configuration parameters to a smaller value, such as 8. Note that while smaller values reduce planning time, they may also yield inferior query plans.
Are the optimizer’s estimates close to reality?

Run EXPLAIN ANALYZE and see if the number of rows the optimizer estimates is close to the number of rows the query operation actually returns. If there is a large discrepancy, update statistics on the relevant columns.
Are selective predicates applied early in the plan?

Apply the most selective filters early in the plan, so fewer rows move up the plan tree. If the query plan does not correctly estimate query predicate selectivity, update statistics on the relevant columns. You can also try reordering the WHERE clause of your SQL statement.
Does the optimizer choose the best join order?

When you have a query that joins multiple tables, make sure that the optimizer chooses the most selective join order. Joins that eliminate the largest number of rows should be performed earlier in the plan, so fewer rows are moved up the plan tree.

If the plan does not choose the optimal join order, set join_collapse_limit=1 and use explicit JOIN syntax in your SQL statement to force the Postgres planner to use the specified join order. You can also update statistics on the relevant join columns.
Does the optimizer selectively scan partitioned tables?

If you use table partitioning, is the optimizer selectively scanning only the child tables required to satisfy the query predicates? Scans of the parent tables should return zero rows, as they do not contain any data.
Does the optimizer choose Hash Aggregate and Hash Join operations where applicable?

Hash operations are typically much faster than other types of joins or aggregations. Row comparison and sorting are done in memory rather than reading/writing from disk. To enable the query optimizer to choose hash operations, sufficient memory must be available to hold the estimated number of rows. Try increasing work_mem to improve query performance. If possible, run EXPLAIN ANALYZE for the query to show which plan operations spilled to disk, how much work memory they used, and how much memory was required to avoid spilling to disk. Learn more in EXPLAIN ANALYZE command.

JSON

PL/Container

Analyze queries

EXPLAIN

Overview

How to read EXPLAIN plans

Query plan nodes

Scan nodes

Join nodes

Motion nodes

Other nodes

Execution cost estimates

EXPLAIN ANALYZE

Analyze SELECT queries

Filtering

Sorting

Grouping

Indexes

Partitioned tables

Joins

EXPLAIN options

Output formats

Analyze data-modifying queries

Diagnose query performance issues