EXPLAIN

Contents

Synopsis
Description
Parameters
Notes
Examples
Compatibility
See also

Shows the query plan for a statement.

EXPLAIN [ ( <option> [, ...] ) ] <statement>
EXPLAIN [ANALYZE] [VERBOSE] <statement>

where option can be one of:

    ANALYZE [ <boolean> ]
    VERBOSE [ <boolean> ]
    COSTS [ <boolean> ]
    BUFFERS [ <boolean> ]
    TIMING [ <boolean> ]
    FORMAT { TEXT | XML | JSON | YAML }

Description

EXPLAIN displays the query plan that the GPORCA or Postgres planner generates for the specified statement. Query plans are a tree of nodes. Each node in the plan represents a single operation, such as a table scan, a join, an aggregation, or a sort.

Plans should be read from the bottom up, as each node feeds rows into the node directly above it. The bottom nodes of a plan are usually table scan operations (sequential, index, or bitmap index scans). If the query requires joins, aggregations, or sorts (or other operations on the raw rows), there will be additional nodes above the scan nodes to perform these operations. The topmost plan nodes are usually the Greengage DB motion nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the operations responsible for moving rows between the segment instances during query processing.

The output of EXPLAIN has one line for each node in the plan tree, showing the basic node type plus the following cost estimates that the planner made for the execution of that plan node:

cost — the planner’s estimate of how long it will take to run the statement (measured in cost units that are arbitrary, but conventionally mean disk page fetches). Two cost numbers are shown: the start-up cost before the first row can be returned, and the total cost to return all the rows. Note that the total cost assumes that all rows will be retrieved, which may not always be the case (for example, when using LIMIT).
rows — the total number of rows output by this plan node. This is usually less than the actual number of rows processed or scanned by the plan node, reflecting the estimated selectivity of any WHERE clause conditions. Ideally the top-level node’s estimate will approximate the number of rows actually returned, updated, or deleted by the query.
width — the estimated average size (in bytes) of a single row output by this plan node.

It is important to note that the cost of an upper-level node includes the cost of all its child nodes. The topmost node of the plan has the estimated total execution cost for the plan. This is the number that the planner seeks to minimize. It is also important to realize that the cost only reflects things that the query optimizer cares about. In particular, the cost does not consider the time spent transmitting result rows to the client.

EXPLAIN ANALYZE causes the statement to be actually run, not only planned. The EXPLAIN ANALYZE plan shows the actual results along with the planner’s estimates. This is useful for seeing whether the planner’s estimates are close to reality. In addition to the information shown in the EXPLAIN plan, EXPLAIN ANALYZE will show the following additional information:

The total elapsed time (in milliseconds) that it took to run the query.
The number of workers (segments) involved in a plan node operation. Only segments that return rows are counted.
The maximum number of rows returned by the segment that produced the most rows for an operation. If multiple segments produce an equal number of rows, the one with the longest time to finish is chosen.
The ID of the segment that produced the most rows for an operation.
For relevant operations, the amount of memory (work_mem) used. If work_mem was not sufficient to perform the operation in memory, the plan will show how much data was spilled to disk and how many passes over the data were required for the slowest-performing segment. For example:
```
Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile
I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to
workfile
[seg0] pass 1: 263 groups made from 263 rows
```
The time (in milliseconds) it took to retrieve the first row from the segment that produced the most rows, and the total time taken to retrieve all rows from that segment. The <time> to first row may be omitted if it is the same as the <time> to end.

IMPORTANT

Keep in mind that the statement is actually run when ANALYZE is used. Although EXPLAIN ANALYZE will discard any output that a SELECT would return, other side effects of the statement will happen as usual. If you wish to use EXPLAIN ANALYZE on a DML statement without letting the command affect your data, use this approach:

BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;

Only the ANALYZE and VERBOSE options can be specified, and only in that order, without surrounding the option list in parentheses.

Parameter Description

Parameter	Description
ANALYZE	Carry out the command and show the actual run times and other statistics. This parameter defaults to `FALSE` if you omit it; specify `ANALYZE` to enable it
VERBOSE	Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. This parameter defaults to `FALSE` if you omit it; specify `VERBOSE` to enable it
COSTS	Include information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. This parameter defaults to `TRUE` if you omit it; specify `COSTS OFF` to deactivate it
BUFFERS	Include information on buffer usage. This parameter may be specified only when `ANALYZE` is also specified. If omitted, the default value is `FALSE`, buffer usage information is not included. NOTE Greengage DB does not support specifying `BUFFERS [true]` for distributed queries; ignore any displayed buffer usage information.
TIMING	Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to `FALSE` when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter may only be used when `ANALYZE` is also enabled. It defaults to `TRUE`
FORMAT	Specify the output format, which can be `TEXT`, `XML`, `JSON`, or `YAML`. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults to `TEXT`
boolean	Specifies whether the selected option should be turned on or off. You can write `TRUE`, `ON`, or `1` to enable the option, and `FALSE`, `OFF`, or `0` to deactivate it. The boolean value can also be omitted, in which case `TRUE` is assumed
statement	Any `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `VALUES`, `EXECUTE`, `DECLARE`, or `CREATE TABLE AS` statement, whose execution plan you wish to see

ANALYZE

Carry out the command and show the actual run times and other statistics. This parameter defaults to FALSE if you omit it; specify ANALYZE to enable it

VERBOSE

Display additional information regarding the plan. Specifically, include the output column list for each node in the plan tree, schema-qualify table and function names, always label variables in expressions with their range table alias, and always print the name of each trigger for which statistics are displayed. This parameter defaults to FALSE if you omit it; specify VERBOSE to enable it

COSTS

Include information on the estimated startup and total cost of each plan node, as well as the estimated number of rows and the estimated width of each row. This parameter defaults to TRUE if you omit it; specify COSTS OFF to deactivate it

BUFFERS

Include information on buffer usage. This parameter may be specified only when ANALYZE is also specified. If omitted, the default value is FALSE, buffer usage information is not included.

NOTE

Greengage DB does not support specifying BUFFERS [true] for distributed queries; ignore any displayed buffer usage information.

TIMING

Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow down the query significantly on some systems, so it may be useful to set this parameter to FALSE when only actual row counts, and not exact times, are needed. Run time of the entire statement is always measured, even when node-level timing is turned off with this option. This parameter may only be used when ANALYZE is also enabled. It defaults to TRUE

FORMAT

Specify the output format, which can be TEXT, XML, JSON, or YAML. Non-text output contains the same information as the text output format, but is easier for programs to parse. This parameter defaults to TEXT

boolean

Specifies whether the selected option should be turned on or off. You can write TRUE, ON, or 1 to enable the option, and FALSE, OFF, or 0 to deactivate it. The boolean value can also be omitted, in which case TRUE is assumed

statement

Any SELECT, INSERT, UPDATE, DELETE, VALUES, EXECUTE, DECLARE, or CREATE TABLE AS statement, whose execution plan you wish to see

In order to allow the query optimizer to make reasonably informed decisions when optimizing queries, the ANALYZE statement should be run to record statistics about the distribution of data within the table. If you have not done this (or if the statistical distribution of the data in the table has changed significantly since the last time ANALYZE was run), the estimated costs are unlikely to conform to the real properties of the query, and consequently an inferior query plan may be chosen.

An SQL statement executed during an EXPLAIN ANALYZE command is excluded from Greengage DB resource queues.

To illustrate how to read an EXPLAIN query plan, consider the following query:

EXPLAIN
SELECT *
FROM names
WHERE name = 'Joelle';

Result:

                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.27 rows=1 width=58)
   ->  Seq Scan on names  (cost=0.00..431.27 rows=1 width=58)
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

If the plan is read from the bottom up, the query optimizer starts by doing a sequential scan of the names table. Notice that the WHERE clause is being applied as a Filter condition. This means that the scan operation checks the condition for each row it scans, and outputs only the ones that pass the condition.

The results of the scan operation are passed up to a Gather Motion operation. In Greengage DB, Gather Motion is when segments send rows up to the master. In this case there are three segment instances sending to one master instance (3:1). This operation is working on slice1 of the parallel query execution plan. In Greengage DB, a query plan is divided into slices so that portions of the query plan can be worked on in parallel by the segments.

The estimated startup cost for this plan is 0.00 (no cost) and a total cost of 431.27. The planner is estimating that this query will return one row.

Here is the same query, with cost estimates suppressed:

EXPLAIN (COSTS OFF)
SELECT *
FROM names
WHERE name = 'Joelle';

Result:

                QUERY PLAN
------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on names
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

Here is the same query, with JSON formatting:

EXPLAIN (FORMAT JSON)
SELECT *
FROM names
WHERE name = 'Joelle';

Result:

                  QUERY PLAN
-------------------------------------------------
 [                                              +
   {                                            +
     "Plan": {                                  +
       "Node Type": "Gather Motion",            +
       "Senders": 3,                            +
       "Receivers": 1,                          +
       "Slice": 1,                              +
       "Segments": 3,                           +
       "Gang Type": "primary reader",           +
       "Startup Cost": 0.00,                    +
       "Total Cost": 431.27,                    +
       "Plan Rows": 1,                          +
       "Plan Width": 58,                        +
       "Plans": [                               +
         {                                      +
           "Node Type": "Seq Scan",             +
           "Parent Relationship": "Outer",      +
           "Slice": 1,                          +
           "Segments": 3,                       +
           "Gang Type": "primary reader",       +
           "Relation Name": "names",            +
           "Alias": "names",                    +
           "Startup Cost": 0.00,                +
           "Total Cost": 431.27,                +
           "Plan Rows": 1,                      +
           "Plan Width": 58,                    +
           "Filter": "(name = 'Joelle'::text)"  +
         }                                      +
       ]                                        +
     },                                         +
     "Settings": {                              +
       "Optimizer": "Pivotal Optimizer (GPORCA)"+
     }                                          +
   }                                            +
 ]
(1 row)

If there is an index and a query with an indexable WHERE condition is used, EXPLAIN might show a different plan. This query generates a plan with an index scan, with YAML formatting:

EXPLAIN (FORMAT YAML)
SELECT *
FROM names
WHERE location = 'Sydney, Australia';

Result:

                          QUERY PLAN
--------------------------------------------------------------
 - Plan:                                                     +
     Node Type: "Gather Motion"                              +
     Senders: 3                                              +
     Receivers: 1                                            +
     Slice: 1                                                +
     Segments: 3                                             +
     Gang Type: "primary reader"                             +
     Startup Cost: 0.00                                      +
     Total Cost: 10.81                                       +
     Plan Rows: 10000                                        +
     Plan Width: 70                                          +
     Plans:                                                  +
       - Node Type: "Index Scan"                             +
         Parent Relationship: "Outer"                        +
         Slice: 1                                            +
         Segments: 3                                         +
         Gang Type: "primary reader"                         +
         Scan Direction: "Forward"                           +
         Index Name: "names_idx_loc"                         +
         Relation Name: "names"                              +
         Alias: "names"                                      +
         Startup Cost: 0.00                                  +
         Total Cost: 7.77                                    +
         Plan Rows: 10000                                    +
         Plan Width: 70                                      +
         Index Cond: "(location = 'Sydney, Australia'::text)"+
   Settings:                                                 +
     Optimizer: "Pivotal Optimizer (GPORCA)"
(1 row)

There is no EXPLAIN statement defined in the SQL standard.

EXECUTE

FETCH

EXPLAIN

Synopsis

Description

Parameters

Notes

Examples

Compatibility

See also