Use S3 storage plugin
This topic describes how to use the S3 storage plugin with the gpbackup and gprestore utilities in Greengage DB.
The S3 storage plugin extends gpbackup and gprestore with the ability to use any S3-compatible storage for backups.
With this plugin, gpbackup can automatically upload Greengage DB backups to Amazon S3 storage or any storage system that supports the S3 (Simple Storage Service) protocol.
Such storage can be provided by a public cloud vendor — such as AWS S3, Wasabi, or Google Cloud Storage in S3-compatibility mode — or deployed on-premises.
Using S3-compatible storage for backups provides scalable, reliable, and efficient data protection. It removes the need to manage local disk capacity, as S3 automatically handles large volumes with built-in redundancy and replication. The S3 plugin also enables parallel data streaming from all segments, improving backup speed, while allowing backups to be managed independently of the database cluster.
During restore, gprestore uses the same plugin to download backup data from S3 storage, providing seamless integration as if the backup files were stored locally.
Access to S3 storage
To use the S3 storage plugin, you must have access to an S3-compatible storage service. The following permissions are required:
-
Upload and delete objects for backup operations.
-
Read (list, open, and download) objects for restore operations.
Access credentials are specified in the plugin configuration file that you provide when running gpbackup or gprestore.
Depending on the storage configuration, these credentials can be an AWS access key pair or a temporary access token.
Install the S3 storage plugin
To use the S3 storage plugin, you need to build it from source code and deploy the resulting binary on all cluster hosts. The plugin source code is available in the gpbackup-s3-plugin repository.
Prerequisites
The plugin is written in the Go programming language.
To build it, Go 1.19 or later must be installed on the master host, and the GOPATH environment variable must be defined.
To check these requirements, run these commands on behalf of the gpadmin user:
$ go version
$ echo $GOPATH
Their output may look as follows:
go version go1.25.1 linux/amd64 /home/gpadmin/go
Instructions on Go installation are available at the official Go website.
Build and install
To build and install the S3 storage plugin for gpbackup and gprestore:
-
Ensure you are logged in to the master host as
gpadminand are in the home directory. -
Clone the gpbackup-s3-plugin repository:
$ git clone https://github.com/arenadata/gpbackup-s3-plugin.git -
Change the current directory to gpbackup-s3-plugin:
$ cd gpbackup-s3-plugin -
(Optional) Checkout a tag to build and install a specific version:
$ git checkout <tag_name>where
<tag_name>matches the version number. -
Compile the plugin:
$ make build -
Install the plugin binary on all cluster hosts:
$ make installThis command copies
gpbackup_s3_pluginto $GPHOME/bin on all cluster hosts. When successful, the output includes a line similar to:Successfully copied gpbackup_s3_plugin to /usr/local/gpdb on all segments
As a result, gpbackup_s3_plugin is available on the master and segment hosts:
$ gpbackup_s3_plugin --version
Example output:
gpbackup_s3_plugin version 1.10.0+dev.1.gd9993be
Prepare a plugin configuration
To use the S3 storage plugin for backup and restore, you need a plugin configuration file. It defines how the plugin connects to the storage and where backups are stored: the path to the plugin executable, connection parameters, and backup location in the S3 storage. If a backup is created using a particular plugin configuration, the same configuration (or an equivalent one) must be used later to restore it.
The configuration file is written in YAML format. An example configuration may look as follows:
executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
region: garage
endpoint: http://10.92.40.164:3900
aws_access_key_id: GK8bbdf080f0717c03add3e461
aws_secret_access_key: 28efc14dfa24b7965e74a69b7a3a619e2b49e090875c883377b5a61ba0b48f05
bucket: test-backup-bucket
folder: test/ggbackup
This configuration uses the following parameters:
-
executablepath— the absolute path to the S3 storage plugin executable. This is typically $GPHOME/bin/gpbackup_s3_plugin. -
options— top-level YAML node containing plugin parameters. -
region— AWS region or a logical identifier for your S3-compatible storage. -
endpoint— HTTP endpoint of the S3 service. If omitted, the plugin automatically resolves the endpoint from the region when connecting to Amazon S3. -
aws_access_key_idandaws_secret_access_key— credentials used to authenticate with the storage service. -
bucket— S3 bucket used to store backups. The bucket must already exist and be accessible to the credentials specified. -
folder— path prefix within the bucket where backups are written. Each backup is stored in a subdirectory of the form <folder>/backups/YYYYMMDD/YYYYMMDDHHMMSS.
For the complete list of options with descriptions, see S3 storage plugin configuration options.
Create a backup in the S3 storage
To back up the database to an S3 storage, run gpbackup with the --plugin-config option.
As its value, specify the absolute path to the plugin configuration file:
$ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml
When the backup completes successfully, the output includes the following message:
[INFO]:-Backup completed successfully
As a result, gpbackup creates the backup in the default local backup directories on the cluster hosts and uploads it to the S3 storage as defined in the plugin configuration.
You can verify the uploaded files using the AWS CLI tool or another S3 client.
To run the example AWS CLI commands below, configure the following environment variables:
-
AWS_ACCESS_KEY_ID -
AWS_SECRET_ACCESS_KEY -
AWS_DEFAULT_REGION -
AWS_ENDPOINT_URL
For example:
$ export AWS_ACCESS_KEY_ID=GK8bbdf080f0717c03add3e461
$ export AWS_SECRET_ACCESS_KEY=28efc14dfa24b7965e74a69b7a3a619e2b49e090875c883377b5a61ba0b48f05
$ export AWS_DEFAULT_REGION='garage'
$ export AWS_ENDPOINT_URL='http://10.92.40.164:3900'
Check the backup contents in the S3 storage using the AWS CLI s3 ls command:
$ aws s3 ls test-backup-bucket/test/ggbackup/backups/20251021/20251021053010/
The output shows all the backup files created on cluster hosts:
2025-10-21 05:30:13 39724 gpbackup_0_20251021053010_19450.gz 2025-10-21 05:30:13 1123 gpbackup_0_20251021053010_19460.gz 2025-10-21 05:30:13 22351 gpbackup_0_20251021053010_19463.gz 2025-10-21 05:30:13 40713 gpbackup_1_20251021053010_19450.gz 2025-10-21 05:30:13 1050 gpbackup_1_20251021053010_19460.gz 2025-10-21 05:30:13 22524 gpbackup_1_20251021053010_19463.gz 2025-10-21 05:30:14 800 gpbackup_20251021053010_config.yaml 2025-10-21 05:30:13 2751 gpbackup_20251021053010_metadata.sql 2025-10-21 05:30:13 354 gpbackup_20251021053010_plugin_config.yaml 2025-10-21 05:30:14 1869 gpbackup_20251021053010_report 2025-10-21 05:30:13 4446 gpbackup_20251021053010_toc.yaml 2025-10-21 05:30:13 40610 gpbackup_2_20251021053010_19450.gz 2025-10-21 05:30:13 1076 gpbackup_2_20251021053010_19460.gz 2025-10-21 05:30:13 21971 gpbackup_2_20251021053010_19463.gz 2025-10-21 05:30:13 39901 gpbackup_3_20251021053010_19450.gz 2025-10-21 05:30:13 1025 gpbackup_3_20251021053010_19460.gz 2025-10-21 05:30:13 21655 gpbackup_3_20251021053010_19463.gz
The metadata of backups created with the S3 storage plugin is also preserved locally on the master host. In addition to usual metadata files, it includes the gpbackup_<timestamp>_plugin_config.yaml file, which contains the plugin configuration used for the operation. You can use this file to recreate the configuration for restore.
You can use the S3 plugin with most standard backup customization options supported by gpbackup, for example:
-
Partial backups with schema and table filters:
$ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --include-table public.customers -
$ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --incremental --leaf-partition-dataIncremental backups require an existing full backup created with the
--leaf-partition-data optionin the same S3 location. -
Single-file backups, where each segment writes one file instead of per-table files:
$ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --single-data-file
Restore data from the S3 storage
To restore a backup created with the S3 storage plugin, you must use the same plugin binary and a compatible plugin configuration file.
To restore data from a backup in the S3 storage, run gprestore with the same plugin configuration as the one used for the backup creation:
$ gprestore --timestamp 20251021054129 --plugin-config /home/gpadmin/s3-config.yaml
After a successful restore, the output includes the following message:
20251021:05:43:42 gprestore:gpadmin:spn-greengage24:003827-[INFO]:-Restore completed successfully
You can also use the plugin together with standard gprestore options, for example:
-
Partial restore:
$ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --include-table public.customers -
Data-only restore:
$ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --data-only -
Redirected restore into a different database:
$ createdb marketplace_restore $ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --redirect-db marketplace_restoreThis scenario is useful for validating backups without overwriting production data.
S3 storage plugin configuration options
The S3 storage plugin uses a YAML configuration file that specifies the plugin options. This includes the plugin executable path, storage connection credentials, backup location, and operational parameters.
The general structure of the configuration file is as follows:
executablepath: <absolute-path-to-gpbackup_s3_plugin>
options:
region: <aws-region>
endpoint: <S3-endpoint>
aws_access_key_id: <aws-access-key-id>
aws_secret_access_key: <aws-access-secret-key>
aws_session_token: <aws-session-token>
bucket: <s3-bucket>
folder: <s3-location>
encryption: [on|off]
backup_max_concurrent_requests: [int] # default 6
backup_multipart_chunksize: [string] # default 500MB
restore_max_concurrent_requests: [int] # default 6
restore_multipart_chunksize: [string] # default 500MB
http_proxy: <proxy-address>
remove_duplicate_bucket: [true|false]
| Key | Description |
|---|---|
executablepath |
Absolute path to the S3 storage plugin executable, for example, /usr/local/gpdb/bin/gpbackup_s3_plugin or $GPHOME/bin/gpbackup_s3_plugin. The file must be available on all cluster hosts in the same location |
options |
The mandatory file section that contains all S3 storage–specific parameters |
region |
AWS region to use. Required for Amazon S3.
In this case, if When using a different S3-compatible storage, this option may be optional depending on storage configuration |
endpoint |
HTTP or HTTPS endpoint of an S3-compatible storage. Required for non-Amazon or self-hosted S3-compatible storages. For Amazon S3, the endpoint is resolved automatically based on |
aws_access_key_id |
ID of an access key used for S3 authentication.
Required together with NOTE
If no credentials are specified in the configuration file, the plugin searches for them in the following order:
|
aws_secret_access_key |
Secret key associated with the specified access key ID.
Required if |
aws_session_token |
Temporary AWS session token. Required for token-based access |
bucket |
Name of the S3 bucket to store backups in. The bucket must already exist in the target storage |
folder |
Path inside the bucket where backups are stored. If the folder does not exist, the plugin creates it during the first backup. Backups are stored in the following locations: <folder>/backups/YYYYMMDD/YYYYMMDDHHMMSS |
encryption |
Specifies whether to use a secure (SSL) connection to the storage.
Possible values: |
backup_max_concurrent_requests |
Maximum concurrency level for uploading a single file during backup (default The total number of concurrent uploads also depends on the number of segments and the This upload limit applies to multipart uploads defined by the |
backup_multipart_chunksize |
The chunk size for multipart upload requests.
Can be set in bytes ( Chunks are uploaded in parallel; the concurrency level is defined by |
restore_max_concurrent_requests |
Maximum concurrency level of a single file download during restore (default |
restore_multipart_chunksize |
The chunk size for multipart download requests during restore.
Works similar to |
http_proxy |
Address of an HTTP proxy used to access the storage. Can be specified as a URL, with or without authentication:
|
remove_duplicate_bucket |
Possible values: |