analyzedb
A utility that performs ANALYZE operations on tables incrementally and concurrently.
For append-optimized tables, analyzedb updates statistics only if the statistics are not current.
Synopsis
analyzedb -d <dbname>
{ -s <schema>
| -t <schema>.<table>
[ -i <col1>[,<col2>, ...]
| -x <col1>[,<col2>, ...] ]
| { -f | --file } <config_file>
}
[ -l | --list ]
[ --gen_profile_only ]
[ -p <parallel_level> ]
[ --full ]
[ --skip_root_stats ]
[ --skip_orca_root_stats ]
[ -v | --verbose ]
[ -a ]
analyzedb { --clean_last | --clean_all }
analyzedb --version
analyzedb { -? | -h | --help }
Description
The analyzedb utility updates statistics on data for the specified tables incrementally and concurrently.
While performing ANALYZE operations, analyzedb creates a snapshot of the table metadata and stores it on disk on the master host.
An ANALYZE operation is performed only if the table has been modified.
If a table or partition has not been modified since the last time it was analyzed, analyzedb automatically skips the table or partition because it already contains up-to-date statistics:
-
For append-optimized tables,
analyzedbupdates statistics incrementally, if the statistics are not current (for example, if table data is changed after statistics were collected for the table). If there are no statistics for the table, statistics are collected. -
For heap tables, statistics are always updated.
Specify the --full option to update append-optimized table statistics even if the table statistics are current.
By default, analyzedb creates a maximum of 5 concurrent sessions to analyze tables in parallel.
For each session, analyzedb issues an ANALYZE command to the database and specifies different table names.
The -p <parallel_level> option controls the maximum number of concurrent sessions.
Partitioned append-optimized tables
For a partitioned append-optimized table, analyzedb checks the partitioned table root partition and leaf partitions.
If needed, the utility updates statistics for non-current partitions and the root partition.
analyzedb must sample additional partitions within a partitioned table when it encounters a stale partition, even when statistics are already collected.
Consider it a best practice to run analyzedb on the root partition any time that you add new partitions to a partitioned table.
This operation both analyzes the child leaf partitions in parallel and merges any updated statistics into the root partition.
Notes
The analyzedb utility updates append-optimized table statistics if the table has been modified by DML or DDL commands, including INSERT, DELETE, UPDATE, CREATE TABLE, ALTER TABLE, and TRUNCATE.
The utility determines if a table has been modified by comparing catalog metadata of tables with the previous snapshot of metadata taken during a previous analyzedb operation.
The snapshots of table metadata are stored as state files in the directory db_analyze/<db_name>/<timestamp> in the Greengage DB master data directory.
The utility preserves old snapshot information from the past 8 days, and the 3 most recent state directories regardless of age, while all other directories are automatically removed.
You can also specify the --clean_last or --clean_all option to remove state files generated by analyzedb.
If you do not specify a table, set of tables, or schema, the analyzedb utility collects the statistics as needed on all system catalog tables and user-defined tables in the database.
External tables are not affected by analyzedb.
Table names that contain spaces are not supported.
Running the ANALYZE command directly on a table (instead of using the analyzedb utility) does not update the metadata that analyzedb uses to determine whether table statistics are up to date.
Options
- --clean_last
-
Remove the state files generated by the last
analyzedboperation. All other options except-dare ignored. - --clean_all
-
Remove all the state files generated by
analyzedb. All other options except-dare ignored. - -d <dbname>
-
The name of the database that contains the tables to be analyzed. If this option is not specified, the database name is read from the
PGDATABASEenvironment variable. IfPGDATABASEis not set, the user name specified for the connection is used. - -f <config_file>, --file <config_file>
-
A text file that contains a list of tables to be analyzed. A relative file path from the current directory can be specified.
The file lists one table per line. Table names must be qualified with a schema name. Optionally, a list of columns can be specified using
-ior-x. No other options are allowed in the file. Options such as--fullmust be specified on the command line.Only one of the following options can be used to specify the tables to be analyzed:
-f/--file,-t, or-s.When performing
ANALYZEoperations on multiple tables,analyzedbcreates concurrent sessions to analyze tables in parallel. The-poption controls the maximum number of concurrent sessions.Example file contents:
public.nation public.lineitem -i l_shipdate,l_receiptdate
- --full
-
Perform an
ANALYZEoperation on all specified tables, even if the statistics are up to date. - --gen_profile_only
-
Update the
analyzedbsnapshot of table statistics without performing anyANALYZEoperations.If other options specify tables or a schema, the snapshot information is updated only for the specified tables.
Use this option if the
ANALYZEcommand was run manually and you want to synchronize theanalyzedbsnapshot. - -i <col1>,<col2>,…
-
(Optional) Collect statistics only for the specified columns of the table. Must be specified with
-t.Only one of
-ior-xcan be specified. - -l, --list
-
List the tables that would be analyzed with the specified options. No
ANALYZEoperations are performed. - -p <parallel_level>
-
Number of tables to analyze in parallel. The value must be an integer between
1and10, inclusive. The default is5. - --skip_root_stats
-
This option is deprecated and no longer used. You may remove it from your scripts.
- --skip_orca_root_stats
-
Skip collection of root partition statistics.
NOTEDo not use this option if GPORCA is enabled.
Use this option if
ANALYZE ROOTPARTITIONcommands take a very long time to complete.CAUTIONAfter you run
analyzedbwith this option, subsequent executions will not update root partition statistics unless the table has changed. - -s <schema>
-
Analyze all tables in the specified schema. Only one schema name can be specified.
Only one of the following options can be used to specify the tables to be analyzed:
-f/--file,-t, or-s. - -t <schema>.<table>
-
Analyze only the specified table. The table name must be schema-qualified. Only one table can be specified.
Only one of the following options can be used to specify the tables to be analyzed:
-f/--file,-t, or-s. - -x <col1>,<col2>,…
-
(Optional) Exclude the specified columns from statistics collection. Must be specified with
-t.Only one of
-ior-xcan be specified. - -a
-
Quiet mode. Do not prompt for user confirmation.
- -h, -?, --help
-
Display help.
- -v, --verbose
-
Enable verbose logging. The output includes the list of analyzed tables (including leaf partitions) and the duration of each
ANALYZEoperation. - --version
-
Display the version of this utility.
Examples
This example collects statistics only for selected table columns.
In the mytest database, it collects statistics on the shipdate and receiptdate columns of the public.orders table:
$ analyzedb -d mytest -t public.orders -i shipdate,receiptdate
The following example collects statistics on a table while excluding a set of columns.
In the mytest database, it collects statistics on the public.foo table and excludes the bar and test2 columns:
$ analyzedb -d mytest -t public.foo -x bar,test2
The following example specifies a file that contains a list of tables.
This command collects statistics on the tables listed in the analyze-tables file in the mytest database:
$ analyzedb -d mytest -f analyze-tables
If no table, set of tables, or schema is specified, the analyzedb utility collects statistics as needed on all catalog and user-defined tables in the specified database.
This command refreshes statistics on the system catalog and user-defined tables in the mytest database:
$ analyzedb -d mytest
You can create a PL/Python function to run the analyzedb utility as a Greengage DB function.
The following CREATE FUNCTION command creates a user-defined PL/Python function that runs the analyzedb utility and displays output on the command line.
Specify analyzedb options as the function parameter:
CREATE OR REPLACE FUNCTION analyzedb(params TEXT)
RETURNS VOID AS
$BODY$
import subprocess
cmd = ['analyzedb', '-a'] + params.split()
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# verbose output of process
for line in iter(p.stdout.readline, ''):
plpy.info(line);
p.wait()
$BODY$
LANGUAGE plpythonu VOLATILE;
When run by the gpadmin user, this SELECT command invokes the analyzedb utility to analyze the public.mytable table in the mytest database:
SELECT analyzedb('-d mytest -t public.mytable');
To create a PL/Python function, the PL/Python procedural language must be registered as a language in the database.
For example, this CREATE LANGUAGE command run as gpadmin registers PL/Python as an untrusted language:
CREATE LANGUAGE plpythonu;