analyzedb

Contents

Synopsis
Description
- Partitioned append-optimized tables
Notes
Options
Examples

A utility that performs ANALYZE operations on tables incrementally and concurrently. For append-optimized tables, analyzedb updates statistics only if the statistics are not current.

analyzedb -d <dbname>
          { -s <schema>
          | -t <schema>.<table>
              [ -i <col1>[,<col2>, ...]
              | -x <col1>[,<col2>, ...] ]
          | { -f | --file } <config_file>
          }
          [ -l | --list ]
          [ --gen_profile_only ]
          [ -p <parallel_level> ]
          [ --full ]
          [ --skip_root_stats ]
          [ --skip_orca_root_stats ]
          [ -v | --verbose ]
          [ -a ]

analyzedb { --clean_last | --clean_all }

analyzedb --version

analyzedb { -? | -h | --help }

Description

The analyzedb utility updates statistics on data for the specified tables incrementally and concurrently.

While performing ANALYZE operations, analyzedb creates a snapshot of the table metadata and stores it on disk on the master host. An ANALYZE operation is performed only if the table has been modified. If a table or partition has not been modified since the last time it was analyzed, analyzedb automatically skips the table or partition because it already contains up-to-date statistics:

For append-optimized tables, analyzedb updates statistics incrementally, if the statistics are not current (for example, if table data is changed after statistics were collected for the table). If there are no statistics for the table, statistics are collected.
For heap tables, statistics are always updated.

Specify the --full option to update append-optimized table statistics even if the table statistics are current.

By default, analyzedb creates a maximum of 5 concurrent sessions to analyze tables in parallel. For each session, analyzedb issues an ANALYZE command to the database and specifies different table names. The -p <parallel_level> option controls the maximum number of concurrent sessions.

Partitioned append-optimized tables

For a partitioned append-optimized table, analyzedb checks the partitioned table root partition and leaf partitions. If needed, the utility updates statistics for non-current partitions and the root partition.

analyzedb must sample additional partitions within a partitioned table when it encounters a stale partition, even when statistics are already collected. Consider it a best practice to run analyzedb on the root partition any time that you add new partitions to a partitioned table. This operation both analyzes the child leaf partitions in parallel and merges any updated statistics into the root partition.

The analyzedb utility updates append-optimized table statistics if the table has been modified by DML or DDL commands, including INSERT, DELETE, UPDATE, CREATE TABLE, ALTER TABLE, and TRUNCATE. The utility determines if a table has been modified by comparing catalog metadata of tables with the previous snapshot of metadata taken during a previous analyzedb operation. The snapshots of table metadata are stored as state files in the directory db_analyze/<db_name>/<timestamp> in the Greengage DB master data directory.

The utility preserves old snapshot information from the past 8 days, and the 3 most recent state directories regardless of age, while all other directories are automatically removed. You can also specify the --clean_last or --clean_all option to remove state files generated by analyzedb.

If you do not specify a table, set of tables, or schema, the analyzedb utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.

External tables are not affected by analyzedb.

Table names that contain spaces are not supported.

Running the ANALYZE command directly on a table (instead of using the analyzedb utility) does not update the metadata that analyzedb uses to determine whether table statistics are up to date.

--clean_last

Remove the state files generated by the last analyzedb operation. All other options except -d are ignored.

--clean_all

Remove all the state files generated by analyzedb. All other options except -d are ignored.

-d <dbname>

The name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the PGDATABASE environment variable. If PGDATABASE is not set, the user name specified for the connection is used.

-f <config_file>, --file <config_file>

A text file that contains a list of tables to be analyzed. A relative file path from the current directory can be specified.

The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using -i or -x. No other options are allowed in the file. Options such as --full must be specified on the command line.

Only one of the following options can be used to specify the tables to be analyzed: -f/--file, -t, or -s.

When performing ANALYZE operations on multiple tables, analyzedb creates concurrent sessions to analyze tables in parallel. The -p option controls the maximum number of concurrent sessions.

Example file contents:

public.nation
public.lineitem -i l_shipdate,l_receiptdate

--full

Perform an ANALYZE operation on all specified tables, even if the statistics are up to date.

--gen_profile_only

Update the analyzedb snapshot of table statistics without performing any ANALYZE operations.

If other options specify tables or a schema, the snapshot information is updated only for the specified tables.

Use this option if the ANALYZE command was run manually and you want to synchronize the analyzedb snapshot.

-i <col1>,<col2>,…

(Optional) Collect statistics only for the specified columns of the table. Must be specified with -t.

Only one of -i or -x can be specified.

-l, --list

List the tables that would be analyzed with the specified options. No ANALYZE operations are performed.

-p <parallel_level>

Number of tables to analyze in parallel. The value must be an integer between 1 and 10, inclusive. The default is 5.

--skip_root_stats

This option is deprecated and no longer used. You may remove it from your scripts.

--skip_orca_root_stats

Skip collection of root partition statistics.

NOTE

Do not use this option if GPORCA is enabled.

Use this option if ANALYZE ROOTPARTITION commands take a very long time to complete.

CAUTION

After you run analyzedb with this option, subsequent executions will not update root partition statistics unless the table has changed.

-s <schema>

Analyze all tables in the specified schema. Only one schema name can be specified.

Only one of the following options can be used to specify the tables to be analyzed: -f/--file, -t, or -s.

-t <schema>.<table>

Analyze only the specified table. The table name must be schema-qualified. Only one table can be specified.

Only one of the following options can be used to specify the tables to be analyzed: -f/--file, -t, or -s.

-x <col1>,<col2>,…

(Optional) Exclude the specified columns from statistics collection. Must be specified with -t.

Only one of -i or -x can be specified.

-a

Quiet mode. Do not prompt for user confirmation.

-h, -?, --help

Display help.

-v, --verbose

Enable verbose logging. The output includes the list of analyzed tables (including leaf partitions) and the duration of each ANALYZE operation.

--version

Display the version of this utility.

This example collects statistics only for selected table columns. In the mytest database, it collects statistics on the shipdate and receiptdate columns of the public.orders table:

$ analyzedb -d mytest -t public.orders -i shipdate,receiptdate

The following example collects statistics on a table while excluding a set of columns. In the mytest database, it collects statistics on the public.foo table and excludes the bar and test2 columns:

$ analyzedb -d mytest -t public.foo -x bar,test2

The following example specifies a file that contains a list of tables. This command collects statistics on the tables listed in the analyze-tables file in the mytest database:

$ analyzedb -d mytest -f analyze-tables

If no table, set of tables, or schema is specified, the analyzedb utility collects statistics as needed on all catalog and user-defined tables in the specified database. This command refreshes statistics on the system catalog and user-defined tables in the mytest database:

$ analyzedb -d mytest

You can create a PL/Python function to run the analyzedb utility as a Greengage DB function. The following CREATE FUNCTION command creates a user-defined PL/Python function that runs the analyzedb utility and displays output on the command line. Specify analyzedb options as the function parameter:

CREATE OR REPLACE FUNCTION analyzedb(params TEXT)
  RETURNS VOID AS
$BODY$
    import subprocess
    cmd = ['analyzedb', '-a'] + params.split()
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

    # verbose output of process
    for line in iter(p.stdout.readline, ''):
        plpy.info(line);

    p.wait()
$BODY$
LANGUAGE plpythonu VOLATILE;

When run by the gpadmin user, this SELECT command invokes the analyzedb utility to analyze the public.mytable table in the mytest database:

SELECT analyzedb('-d mytest -t public.mytable');

NOTE

To create a PL/Python function, the PL/Python procedural language must be registered as a language in the database. For example, this CREATE LANGUAGE command run as gpadmin registers PL/Python as an untrusted language:

CREATE LANGUAGE plpythonu;

Server configuration parameters (GUC)

clusterdb

analyzedb

Synopsis

Description

Partitioned append-optimized tables

Notes

Options

Examples