Full data backup and restore
This topic describes how to perform a full backup and restore in Greengage DB, as well as how to adjust backup settings, and monitor the results.
All gpbackup and gprestore commands must be executed on the master host under the gpadmin user account.
Basic backup and restore
Create a backup
A full backup of a database includes all user data, database metadata, and cluster-wide global objects.
To create a full backup, run the gpbackup utility with the target database name specified in the --dbname option:
$ gpbackup --dbname marketplace
When the backup completes successfully, the output includes the following message:
[INFO]:-Backup completed successfully
The output also contains the backup timestamp, which uniquely identifies the backup and must be specified in future restore commands:
[INFO]:-Backup Timestamp = 20251006063113
This command writes backup files to all cluster hosts except the standby master:
-
On master host: backup configuration files, database metadata (DDL), and general backup information.
-
On segment hosts: compressed data files for the tables stored on each segment, one file per table.
Each backup forms a distributed directory hierarchy that mirrors the layout of Greengage DB’s files and includes configuration, metadata, and data subdirectories.
This structure allows full and partial (selective) restore operations and parallel access by gprestore.
To verify that the backup is complete and valid, check the backup report and the gpbackup logs generated on the master host.
Restore from a backup
To fully restore a database from a backup, run the gprestore utility passing its timestamp in the --timestamp option:
$ gprestore --timestamp 20251006063113
gprestore does not overwrite existing tables or other user-defined objects in the target database.
To restore data into an existing database, you must either drop or truncate the target tables before restore.
You can use the --truncate-table option to automate this:
$ gprestore --timestamp 20251006063113 --truncate-table --data-only
After a successful restore, the output includes the following message:
[INFO]:-Restore completed successfully
You can also restore a backup to a cluster that does not yet contain the target database by using the --create-db option:
$ gprestore --timestamp 20251006063113 --create-db
This option creates a new database based on the template0 built-in template database and populates it with the restored objects.
After restore, you can validate the operation by checking the database size, table counts, or sample data, and review gprestore logs or reports.
To find out the timestamps of backups existing in the system, you can list the contents of the backups directory:
$ ls $MASTER_DATA_DIRECTORY/backups/
This command outputs the list of subdirectories named after backup creation days, for example, 20251006.
Then, list the contents of a specific subdirectory to see exact timestamps:
$ ls $MASTER_DATA_DIRECTORY/backups/20251006/
Define backup location
By default, gpbackup stores backup files under the $MASTER_DATA_DIRECTORY/backups path on the master host and in corresponding locations on the segment hosts.
You can override this location using the --backup-dir option:
$ gpbackup --dbname marketplace --backup-dir /home/gpadmin/backups
The gpadmin user must have write permissions to the specified directory on all cluster hosts, as each segment process writes its own backup files locally.
When called with --backup-dir, gpbackup creates a backup directory structure under the specified path.
Each host receives a subdirectory for every segment it runs, named gpseg<N>.
Within each segment directory, backups are organized by date and timestamp.
For example, the command above creates the following directories:
-
Master host:
/home/gpadmin/backups/gpseg-1/backups/20251006/20251006073432/
-
Segment host with segments
0and1:/home/gpadmin/backups/gpseg0/backups/20251006/20251006073432/ /home/gpadmin/backups/gpseg1/backups/20251006/20251006073432/
-
Segment host with segments
2and3:/home/gpadmin/backups/gpseg2/backups/20251006/20251006073432/ /home/gpadmin/backups/gpseg3/backups/20251006/20251006073432/
To restore a database from a backup stored in a non-default location, specify the same path using the --backup-dir option of gprestore:
$ gprestore --timestamp 20251006073432 --backup-dir /home/gpadmin/backups/
When restoring, gprestore expects to find the same directory structure that gpbackup created.
If the files were moved or copied to another cluster, make sure the structure and permissions are preserved.
Define backup layout
The physical layout of backup files determines how they are distributed across directories and how the backup can be restored later.
gpbackup provides options to control this layout depending on the target environment and backup management strategy:
-
Default layout
By default,
gpbackupcreates a separate directory for each backup under every segment’s data directory (or under the path specified in--backup-dir). Each table is stored in a separate compressed data file. -
--single-backup-dirUse this option to store all backups created on a host in a single directory rather than in per-segment gpseg<N> subdirectories. This directory must be specified in the
--backup-dirvalue. This simplifies directory management, especially when moving or resizing clusters, because you can copy all files from a single directory instead of multiple segment paths. However, backups using this layout may take slightly longer because all segments write to the same location, increasing I/O contention.$ gpbackup --dbname marketplace --single-backup-dir --backup-dir /home/gpadmin/backups -
--single-data-fileThis option consolidates all tables for a given segment into one data file instead of creating one file per table. It reduces the total number of files, which can be beneficial when working with file systems that have limits on file counts or slower metadata operations. However, it disables per-table parallel restore, so restores from such backups typically take longer.
$ gpbackup --dbname marketplace --single-data-file
In most cases, the default layout provides the best performance and flexibility. Alternative layouts are mainly useful for constrained environments or migration tasks where file management simplicity is more important than speed.
Configure backup compression
gpbackup and gprestore support data compression in backups to reduce disk space usage and improve I/O performance.
Supported compression algorithms are gzip and zstd:
-
gzip(default) is a widely supported algorithm. It provides better compatibility, as it is natively available on most Linux and UNIX systems. -
zstdis a more modern and efficient algorithm. It provides higher compression ratios and faster compression and decompression speeds.
To specify the compression algorithm, use the --compression-type option.
You can also set the compression level from 1 (default: fastest, lowest compression) to 9 (slowest, highest compression) using the --compression-level option:
$ gpbackup --dbname marketplace --compression-type zstd --compression-level 5
If you have no specific requirements, a medium level such as 4 or 5 is recommended.
It provides a good balance between compression ratio, CPU usage, and operation time.
To create an uncompressed backup, use the --no-compression option.
$ gpbackup --dbname marketplace --no-compression
In this case, data is stored in plain CSV files. This reduces CPU usage during backup and allows the files to be reused directly, for example, to create external tables or to import data using the COPY command.
When restoring, gprestore automatically detects the compression settings of the backup and decompresses data without additional options.
Parallel backup and restore
To improve performance, you can run backup and restore operations in parallel.
Use the --jobs option to specify the number of worker processes that will run concurrently:
$ gpbackup --dbname marketplace --jobs 10
Each job performs work independently, for example, backing up or restoring separate tables. This allows large databases to be processed much faster, especially when data is distributed evenly across segments.
During a backup, each worker uses its own database connection and acquires a lock on the object being processed.
For data tables, gpbackup takes an ACCESS SHARE lock to ensure that no schema changes (DDL) occur while data is being read.
Queries that read or modify the table’s data can still proceed concurrently, but DDL operations such as ALTER TABLE, DROP TABLE, or TRUNCATE TABLE are blocked until the backup finishes.
Because of this, it is recommended to schedule parallel backups during periods of low DDL activity, such as off-peak maintenance windows.
The --jobs option of gprestore works the same way — it determines how many tables are restored at once.
$ gprestore --timestamp 20251010120101 --jobs 10
Parallel restore significantly reduces total restore time but uses multiple connections. The optimal job count depends on available CPU, disk I/O, and network throughput.
The --jobs option is incompatible with backups created using --single-data-file.
Attempting to use both options results in an error.
Define backup scope
This section describes ways to back up and restore supplementary information, such as:
-
table metadata (DDL);
-
global objects;
-
table statistics.
To learn how to select stored objects — schemas, tables, views, and so on — for backing up, see Partial backups.
Data and metadata
By default, backups include both metadata (DDL) and table data. You can limit the backup to one of these components using the following options:
-
--data-only— backs up only table data, without DDL for object creation. Use this option when restoring data into an existing database with matching schema structure. Example:$ gpbackup --dbname marketplace --data-only -
--metadata-only— backs up only metadata (DDL), without table data. This mode is useful for recreating the database structure on another cluster or exporting schema definitions. Example:$ gpbackup --dbname marketplace --metadata-onlyCombining
--metadata-onlywith schema and table filtering options allows you to extract DDL for specific database objects and recreate them in another cluster:$ gprestore --timestamp 20251010043530 --metadata-only --include-table public.customers
Global metadata
By default, gpbackup backs up cluster-wide global metadata, such as roles, tablespaces, resource queues, and resource groups.
To exclude global objects, use the --without-globals option:
$ gpbackup --dbname marketplace --without-globals
gprestore does not restore global objects unless explicitly requested.
To restore them, use the --with-globals option:
$ gprestore --timestamp 20251006092702 --with-globals
This can be useful when setting up a new cluster to replicate role definitions and access privileges before restoring user data.
Table statistics
Backups can include table statistics, allowing the restored database to retain query planner information without requiring an immediate ANALYZE run.
To include statistics in the backup, use the --with-stats option:
$ gpbackup --dbname marketplace --with-stats
To restore them, add the same option to gprestore:
$ gprestore --timestamp 20251006100053 --with-stats
However, if the data distribution changes during or after restore — for example, when restoring into a database with a different schema or subset of data — the old statistics may become inaccurate.
In such cases, you can instruct gprestore to automatically collect new statistics after completion by using --run-analyze:
$ gprestore --timestamp 20251006100053 --run-analyze
Monitoring
Reports
gpbackup and gprestore record information about each operation in report files.
These reports include the operation timestamp, Greengage DB and utility versions, status, object counts, and other details.
Reports are stored in the backup directory on the master host:
-
gpbackupreports: gpbackup_<backup_timestamp>_reportGreengage Database Backup Report timestamp key: 20251006065627 gpdb version: 6.29.0+dev.4.g7f02b2072f build dev gpbackup version: 1.30.6+dev.1.g108659cb database name: marketplace command line: gpbackup --dbname marketplace compression: gzip plugin executable: None backup section: All Sections object filtering: None includes statistics: No data file format: Multiple Data Files Per Segment incremental: False start time: Mon Oct 06 2025 06:56:27 end time: Mon Oct 06 2025 06:56:29 duration: 0:00:02 backup status: Success database size: 88 MB segment count: 4 count of database objects in backup: aggregates 0 casts 0 collations 0 constraints 0 conversions 0 default privileges 0 database gucs 0 event triggers 0 extensions 0 foreign data wrappers 0 foreign servers 0 functions 0 indexes 0 operator classes 0 operator families 0 operators 0 procedural languages 0 protocols 0 resource groups 2 resource queues 1 roles 1 rules 0 schemas 1 sequences 0 tables 2 tablespaces 0 text search configurations 0 text search dictionaries 0 text search parsers 0 text search templates 0 triggers 0 types 0 user mappings 0 views 0
-
gprestorereports: gprestore_<backup_timestamp>_<restore_timestamp>_reportGreengage Database Restore Report timestamp key: 20251006065627 gpdb version: 6.29.0+dev.4.g7f02b2072f build dev gprestore version: 1.30.6+dev.1.g108659cb database name: marketplace command line: gprestore --timestamp 20251006065627 --create-db backup segment count: 4 restore segment count: 4 start time: Mon Oct 06 2025 07:07:23 end time: Mon Oct 06 2025 07:07:27 duration: 0:00:04 restore status: Success
<backup_timestamp> and <restore_timestamp> represent the start times of the backup and restore operations.
Email notifications
You can configure gpbackup and gprestore to automatically send operation reports by email.
This feature relies on the sendmail command being available on the master host.
The sendmail command can be provided by Sendmail itself or by any other Mail Transfer Agent (MTA) that offers a compatible interface.
To enable email notifications, create a file named gp_email_contacts.yaml in one of the following locations:
-
The home directory of the
gpadminuser, for example: /home/gpadmin/ -
The utility installation directory, for example: /usr/local/gpdb/bin/
If the file is not found in either location, the utilities display an informational message such as:
[INFO]:-Found neither /usr/local/gpdb/bin/gp_email_contacts.yaml nor /home/gpadmin/gp_email_contacts.yaml [INFO]:-Email containing gpbackup report /data1/master/gpseg-1/backups/20251006/20251006083418/gpbackup_20251006083418_report will not be sent
The gp_email_contacts.yaml file must specify email recipients and the types of reports to be sent in the following structure:
contacts:
gpbackup:
- address: <user>@<domain>
status:
success: [true | false]
success_with_errors: [true | false]
failure: [true | false]
gprestore:
- address: <user>@<domain>
status:
success: [true | false]
success_with_errors: [true | false]
failure: [true | false]
Example of a contacts file:
contacts:
gpbackup:
- address: admin@example.com
status:
success: true
success_with_errors: true
failure: true
- address: alerts@example.com
status:
success: false
success_with_errors: true
failure: true
gprestore:
- address: admin@example.com
status:
success: true
success_with_errors: true
failure: true
| Key | Description | Required |
|---|---|---|
contacts |
The mandatory top-level section of the file |
Yes |
gpbackup |
Defines email notifications for the |
No |
gprestore |
Defines email notifications for the |
No |
address <user>@<domain> |
An email address to send |
Yes |
status |
Defines which |
Yes |
success true | false |
Sends reports for successful backup operations (exit code |
No |
success_with_errors true | false |
Sends reports for backup operations completed with errors (exit code |
No |
failure true | false |
Sends reports for failed backup operations (exit code |
No |
Log files
In addition to reports, gpbackup and gprestore create detailed log files that record the progress of each operation and any messages or errors encountered.
Log files are stored together with other Greengage DB logs.
Typically, their location is the gpAdminLogs/ subdirectory under the gpadmin home directory.
Each log file covers all backup and restore operations executed on a given date. Their filenames include the date: gpbackup_YYYYMMDD.log and gprestore_YYYYMMDD.log, for example, gprestore_20251010.log.
Logs are useful for troubleshooting failed or incomplete operations, as they contain detailed command output and system messages from all cluster hosts.