Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Full data backup and restore

Pavel Semyonov

This topic describes how to perform a full backup and restore in Greengage DB, as well as how to adjust backup settings, and monitor the results.

NOTE

All gpbackup and gprestore commands must be executed on the master host under the gpadmin user account.

Basic backup and restore

Create a backup

A full backup of a database includes all user data, database metadata, and cluster-wide global objects. To create a full backup, run the gpbackup utility with the target database name specified in the --dbname option:

$ gpbackup --dbname marketplace

When the backup completes successfully, the output includes the following message:

[INFO]:-Backup completed successfully

The output also contains the backup timestamp, which uniquely identifies the backup and must be specified in future restore commands:

[INFO]:-Backup Timestamp = 20251006063113

This command writes backup files to all cluster hosts except the standby master:

  • On master host: backup configuration files, database metadata (DDL), and general backup information.

  • On segment hosts: compressed data files for the tables stored on each segment, one file per table.

Each backup forms a distributed directory hierarchy that mirrors the layout of Greengage DB’s files and includes configuration, metadata, and data subdirectories. This structure allows full and partial (selective) restore operations and parallel access by gprestore.

To verify that the backup is complete and valid, check the backup report and the gpbackup logs generated on the master host.

Restore from a backup

To fully restore a database from a backup, run the gprestore utility passing its timestamp in the --timestamp option:

$ gprestore --timestamp 20251006063113
IMPORTANT

gprestore does not overwrite existing tables or other user-defined objects in the target database. To restore data into an existing database, you must either drop or truncate the target tables before restore. You can use the --truncate-table option to automate this:

$ gprestore --timestamp 20251006063113 --truncate-table --data-only

After a successful restore, the output includes the following message:

[INFO]:-Restore completed successfully

You can also restore a backup to a cluster that does not yet contain the target database by using the --create-db option:

$ gprestore --timestamp 20251006063113 --create-db

This option creates a new database based on the template0 built-in template database and populates it with the restored objects.

After restore, you can validate the operation by checking the database size, table counts, or sample data, and review gprestore logs or reports.

TIP

To find out the timestamps of backups existing in the system, you can list the contents of the backups directory:

$ ls $MASTER_DATA_DIRECTORY/backups/

This command outputs the list of subdirectories named after backup creation days, for example, 20251006.

Then, list the contents of a specific subdirectory to see exact timestamps:

$ ls $MASTER_DATA_DIRECTORY/backups/20251006/

Define backup location

By default, gpbackup stores backup files under the $MASTER_DATA_DIRECTORY/backups path on the master host and in corresponding locations on the segment hosts. You can override this location using the --backup-dir option:

$ gpbackup --dbname marketplace --backup-dir /home/gpadmin/backups
IMPORTANT

The gpadmin user must have write permissions to the specified directory on all cluster hosts, as each segment process writes its own backup files locally.

When called with --backup-dir, gpbackup creates a backup directory structure under the specified path. Each host receives a subdirectory for every segment it runs, named gpseg<N>. Within each segment directory, backups are organized by date and timestamp. For example, the command above creates the following directories:

  • Master host:

    /home/gpadmin/backups/gpseg-1/backups/20251006/20251006073432/
  • Segment host with segments 0 and 1:

    /home/gpadmin/backups/gpseg0/backups/20251006/20251006073432/
    /home/gpadmin/backups/gpseg1/backups/20251006/20251006073432/
  • Segment host with segments 2 and 3:

    /home/gpadmin/backups/gpseg2/backups/20251006/20251006073432/
    /home/gpadmin/backups/gpseg3/backups/20251006/20251006073432/

To restore a database from a backup stored in a non-default location, specify the same path using the --backup-dir option of gprestore:

$ gprestore --timestamp 20251006073432 --backup-dir /home/gpadmin/backups/

When restoring, gprestore expects to find the same directory structure that gpbackup created. If the files were moved or copied to another cluster, make sure the structure and permissions are preserved.

Define backup layout

The physical layout of backup files determines how they are distributed across directories and how the backup can be restored later. gpbackup provides options to control this layout depending on the target environment and backup management strategy:

  • Default layout

    By default, gpbackup creates a separate directory for each backup under every segment’s data directory (or under the path specified in --backup-dir). Each table is stored in a separate compressed data file.

  • --single-backup-dir

    Use this option to store all backups created on a host in a single directory rather than in per-segment gpseg<N> subdirectories. This directory must be specified in the --backup-dir value. This simplifies directory management, especially when moving or resizing clusters, because you can copy all files from a single directory instead of multiple segment paths. However, backups using this layout may take slightly longer because all segments write to the same location, increasing I/O contention.

    $ gpbackup --dbname marketplace --single-backup-dir --backup-dir /home/gpadmin/backups
  • --single-data-file

    This option consolidates all tables for a given segment into one data file instead of creating one file per table. It reduces the total number of files, which can be beneficial when working with file systems that have limits on file counts or slower metadata operations. However, it disables per-table parallel restore, so restores from such backups typically take longer.

    $ gpbackup --dbname marketplace --single-data-file

In most cases, the default layout provides the best performance and flexibility. Alternative layouts are mainly useful for constrained environments or migration tasks where file management simplicity is more important than speed.

Configure backup compression

gpbackup and gprestore support data compression in backups to reduce disk space usage and improve I/O performance.

Supported compression algorithms are gzip and zstd:

  • gzip (default) is a widely supported algorithm. It provides better compatibility, as it is natively available on most Linux and UNIX systems.

  • zstd is a more modern and efficient algorithm. It provides higher compression ratios and faster compression and decompression speeds.

To specify the compression algorithm, use the --compression-type option.

You can also set the compression level from 1 (default: fastest, lowest compression) to 9 (slowest, highest compression) using the --compression-level option:

$ gpbackup --dbname marketplace --compression-type zstd --compression-level 5

If you have no specific requirements, a medium level such as 4 or 5 is recommended. It provides a good balance between compression ratio, CPU usage, and operation time.

To create an uncompressed backup, use the --no-compression option.

$ gpbackup --dbname marketplace --no-compression

In this case, data is stored in plain CSV files. This reduces CPU usage during backup and allows the files to be reused directly, for example, to create external tables or to import data using the COPY command.

When restoring, gprestore automatically detects the compression settings of the backup and decompresses data without additional options.

Parallel backup and restore

To improve performance, you can run backup and restore operations in parallel. Use the --jobs option to specify the number of worker processes that will run concurrently:

$ gpbackup --dbname marketplace --jobs 10

Each job performs work independently, for example, backing up or restoring separate tables. This allows large databases to be processed much faster, especially when data is distributed evenly across segments.

During a backup, each worker uses its own database connection and acquires a lock on the object being processed. For data tables, gpbackup takes an ACCESS SHARE lock to ensure that no schema changes (DDL) occur while data is being read. Queries that read or modify the table’s data can still proceed concurrently, but DDL operations such as ALTER TABLE, DROP TABLE, or TRUNCATE TABLE are blocked until the backup finishes. Because of this, it is recommended to schedule parallel backups during periods of low DDL activity, such as off-peak maintenance windows.

The --jobs option of gprestore works the same way — it determines how many tables are restored at once.

$ gprestore --timestamp 20251010120101 --jobs 10

Parallel restore significantly reduces total restore time but uses multiple connections. The optimal job count depends on available CPU, disk I/O, and network throughput.

NOTE

The --jobs option is incompatible with backups created using --single-data-file. Attempting to use both options results in an error.

Define backup scope

This section describes ways to back up and restore supplementary information, such as:

  • table metadata (DDL);

  • global objects;

  • table statistics.

To learn how to select stored objects — schemas, tables, views, and so on — for backing up, see Partial backups.

Data and metadata

By default, backups include both metadata (DDL) and table data. You can limit the backup to one of these components using the following options:

  • --data-only — backs up only table data, without DDL for object creation. Use this option when restoring data into an existing database with matching schema structure. Example:

    $ gpbackup --dbname marketplace --data-only
  • --metadata-only — backs up only metadata (DDL), without table data. This mode is useful for recreating the database structure on another cluster or exporting schema definitions. Example:

    $ gpbackup --dbname marketplace --metadata-only

    Combining --metadata-only with schema and table filtering options allows you to extract DDL for specific database objects and recreate them in another cluster:

    $ gprestore --timestamp 20251010043530 --metadata-only --include-table public.customers

Global metadata

By default, gpbackup backs up cluster-wide global metadata, such as roles, tablespaces, resource queues, and resource groups. To exclude global objects, use the --without-globals option:

$ gpbackup --dbname marketplace --without-globals

gprestore does not restore global objects unless explicitly requested. To restore them, use the --with-globals option:

$ gprestore --timestamp 20251006092702 --with-globals

This can be useful when setting up a new cluster to replicate role definitions and access privileges before restoring user data.

Table statistics

Backups can include table statistics, allowing the restored database to retain query planner information without requiring an immediate ANALYZE run. To include statistics in the backup, use the --with-stats option:

$ gpbackup --dbname marketplace --with-stats

To restore them, add the same option to gprestore:

$ gprestore --timestamp 20251006100053 --with-stats

However, if the data distribution changes during or after restore — for example, when restoring into a database with a different schema or subset of data — the old statistics may become inaccurate. In such cases, you can instruct gprestore to automatically collect new statistics after completion by using --run-analyze:

$ gprestore --timestamp 20251006100053 --run-analyze

Monitoring

Reports

gpbackup and gprestore record information about each operation in report files. These reports include the operation timestamp, Greengage DB and utility versions, status, object counts, and other details. Reports are stored in the backup directory on the master host:

  • gpbackup reports: gpbackup_<backup_timestamp>_report

    Greengage Database Backup Report
    
    timestamp key:         20251006065627
    gpdb version:          6.29.0+dev.4.g7f02b2072f build dev
    gpbackup version:      1.30.6+dev.1.g108659cb
    
    database name:         marketplace
    command line:          gpbackup --dbname marketplace
    compression:           gzip
    plugin executable:     None
    backup section:        All Sections
    object filtering:      None
    includes statistics:   No
    data file format:      Multiple Data Files Per Segment
    incremental:           False
    
    start time:            Mon Oct 06 2025 06:56:27
    end time:              Mon Oct 06 2025 06:56:29
    duration:              0:00:02
    
    backup status:         Success
    
    database size:         88 MB
    segment count:         4
    
    count of database objects in backup:
    aggregates                   0
    casts                        0
    collations                   0
    constraints                  0
    conversions                  0
    default privileges           0
    database gucs                0
    event triggers               0
    extensions                   0
    foreign data wrappers        0
    foreign servers              0
    functions                    0
    indexes                      0
    operator classes             0
    operator families            0
    operators                    0
    procedural languages         0
    protocols                    0
    resource groups              2
    resource queues              1
    roles                        1
    rules                        0
    schemas                      1
    sequences                    0
    tables                       2
    tablespaces                  0
    text search configurations   0
    text search dictionaries     0
    text search parsers          0
    text search templates        0
    triggers                     0
    types                        0
    user mappings                0
    views                        0
  • gprestore reports: gprestore_<backup_timestamp>_<restore_timestamp>_report

    Greengage Database Restore Report
    
    timestamp key:           20251006065627
    gpdb version:            6.29.0+dev.4.g7f02b2072f build dev
    gprestore version:       1.30.6+dev.1.g108659cb
    
    database name:           marketplace
    command line:            gprestore --timestamp 20251006065627 --create-db
    
    backup segment count:    4
    restore segment count:   4
    start time:              Mon Oct 06 2025 07:07:23
    end time:                Mon Oct 06 2025 07:07:27
    duration:                0:00:04
    
    restore status:          Success

<backup_timestamp> and <restore_timestamp> represent the start times of the backup and restore operations.

Email notifications

You can configure gpbackup and gprestore to automatically send operation reports by email. This feature relies on the sendmail command being available on the master host. The sendmail command can be provided by Sendmail itself or by any other Mail Transfer Agent (MTA) that offers a compatible interface.

To enable email notifications, create a file named gp_email_contacts.yaml in one of the following locations:

  • The home directory of the gpadmin user, for example: /home/gpadmin/

  • The utility installation directory, for example: /usr/local/gpdb/bin/

If the file is not found in either location, the utilities display an informational message such as:

[INFO]:-Found neither /usr/local/gpdb/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml
[INFO]:-Email containing gpbackup report /data1/master/gpseg-1/backups/20251006/20251006083418/gpbackup_20251006083418_report will not be sent

The gp_email_contacts.yaml file must specify email recipients and the types of reports to be sent in the following structure:

contacts:
  gpbackup:
  - address: <user>@<domain>
    status:
         success: [true | false]
         success_with_errors: [true | false]
         failure: [true | false]
  gprestore:
  - address: <user>@<domain>
    status:
         success: [true | false]
         success_with_errors: [true | false]
         failure: [true | false]

Example of a contacts file:

contacts:
  gpbackup:
  - address: admin@example.com
    status:
      success: true
      success_with_errors: true
      failure: true
  - address: alerts@example.com
    status:
      success: false
      success_with_errors: true
      failure: true
  gprestore:
  - address: admin@example.com
    status:
      success: true
      success_with_errors: true
      failure: true
Contacts file format
Key Description Required

contacts

The mandatory top-level section of the file

Yes

gpbackup

Defines email notifications for the gpbackup utility

No

gprestore

Defines email notifications for the gprestore utility. Has the same structure as the gpbackup section

No

address <user>@<domain>

An email address to send gpbackup reports to. One or more addresses can be specified, each in a separate address node

Yes

status

Defines which gpbackup reports are sent to the corresponding address

Yes

success true | false

Sends reports for successful backup operations (exit code 0). Default: false

No

success_with_errors true | false

Sends reports for backup operations completed with errors (exit code 1). Default: false

No

failure true | false

Sends reports for failed backup operations (exit code 2). Default: false

No

Log files

In addition to reports, gpbackup and gprestore create detailed log files that record the progress of each operation and any messages or errors encountered.

Log files are stored together with other Greengage DB logs. Typically, their location is the gpAdminLogs/ subdirectory under the gpadmin home directory.

Each log file covers all backup and restore operations executed on a given date. Their filenames include the date: gpbackup_YYYYMMDD.log and gprestore_YYYYMMDD.log, for example, gprestore_20251010.log.

Logs are useful for troubleshooting failed or incomplete operations, as they contain detailed command output and system messages from all cluster hosts.