Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Use S3 storage plugin

Pavel Semyonov

This topic describes how to use the S3 storage plugin with the gpbackup and gprestore utilities in Greengage DB.

The S3 storage plugin extends gpbackup and gprestore with the ability to use any S3-compatible storage for backups. With this plugin, gpbackup can automatically upload Greengage DB backups to Amazon S3 storage or any storage system that supports the S3 (Simple Storage Service) protocol. Such storage can be provided by a public cloud vendor — such as AWS S3, Wasabi, or Google Cloud Storage in S3-compatibility mode — or deployed on-premises.

Using S3-compatible storage for backups provides scalable, reliable, and efficient data protection. It removes the need to manage local disk capacity, as S3 automatically handles large volumes with built-in redundancy and replication. The S3 plugin also enables parallel data streaming from all segments, improving backup speed, while allowing backups to be managed independently of the database cluster.

During restore, gprestore uses the same plugin to download backup data from S3 storage, providing seamless integration as if the backup files were stored locally.

Access to S3 storage

To use the S3 storage plugin, you must have access to an S3-compatible storage service. The following permissions are required:

  • Upload and delete objects for backup operations.

  • Read (list, open, and download) objects for restore operations.

Access credentials are specified in the plugin configuration file that you provide when running gpbackup or gprestore. Depending on the storage configuration, these credentials can be an AWS access key pair or a temporary access token.

Install the S3 storage plugin

To use the S3 storage plugin, you need to build it from source code and deploy the resulting binary on all cluster hosts. The plugin source code is available in the gpbackup-s3-plugin repository.

Prerequisites

The plugin is written in the Go programming language. To build it, Go 1.19 or later must be installed on the master host, and the GOPATH environment variable must be defined.

To check these requirements, run these commands on behalf of the gpadmin user:

$ go version
$ echo $GOPATH

Their output may look as follows:

go version go1.25.1 linux/amd64
/home/gpadmin/go

Instructions on Go installation are available at the official Go website.

Build and install

To build and install the S3 storage plugin for gpbackup and gprestore:

  1. Ensure you are logged in to the master host as gpadmin and are in the home directory.

  2. Clone the gpbackup-s3-plugin repository:

    $ git clone https://github.com/arenadata/gpbackup-s3-plugin.git
  3. Change the current directory to gpbackup-s3-plugin:

    $ cd gpbackup-s3-plugin
  4. (Optional) Checkout a tag to build and install a specific version:

    $ git checkout <tag_name>

    where <tag_name> matches the version number.

  5. Compile the plugin:

    $ make build
  6. Install the plugin binary on all cluster hosts:

    $ make install

    This command copies gpbackup_s3_plugin to $GPHOME/bin on all cluster hosts. When successful, the output includes a line similar to:

    Successfully copied gpbackup_s3_plugin to /usr/local/gpdb on all segments

As a result, gpbackup_s3_plugin is available on the master and segment hosts:

$ gpbackup_s3_plugin --version

Example output:

gpbackup_s3_plugin version 1.10.0+dev.1.gd9993be

Prepare a plugin configuration

To use the S3 storage plugin for backup and restore, you need a plugin configuration file. It defines how the plugin connects to the storage and where backups are stored: the path to the plugin executable, connection parameters, and backup location in the S3 storage. If a backup is created using a particular plugin configuration, the same configuration (or an equivalent one) must be used later to restore it.

The configuration file is written in YAML format. An example configuration may look as follows:

executablepath: $GPHOME/bin/gpbackup_s3_plugin
options:
  region: garage
  endpoint: http://10.92.40.164:3900
  aws_access_key_id: GK8bbdf080f0717c03add3e461
  aws_secret_access_key: 28efc14dfa24b7965e74a69b7a3a619e2b49e090875c883377b5a61ba0b48f05
  bucket: test-backup-bucket
  folder: test/ggbackup

This configuration uses the following parameters:

  • executablepath — the absolute path to the S3 storage plugin executable. This is typically $GPHOME/bin/gpbackup_s3_plugin.

  • options — top-level YAML node containing plugin parameters.

  • region — AWS region or a logical identifier for your S3-compatible storage.

  • endpoint — HTTP endpoint of the S3 service. If omitted, the plugin automatically resolves the endpoint from the region when connecting to Amazon S3.

  • aws_access_key_id and aws_secret_access_key — credentials used to authenticate with the storage service.

  • bucket — S3 bucket used to store backups. The bucket must already exist and be accessible to the credentials specified.

  • folder — path prefix within the bucket where backups are written. Each backup is stored in a subdirectory of the form <folder>/backups/YYYYMMDD/YYYYMMDDHHMMSS.

For the complete list of options with descriptions, see S3 storage plugin configuration options.

Create a backup in the S3 storage

To back up the database to an S3 storage, run gpbackup with the --plugin-config option. As its value, specify the absolute path to the plugin configuration file:

$ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml

When the backup completes successfully, the output includes the following message:

[INFO]:-Backup completed successfully

As a result, gpbackup creates the backup in the default local backup directories on the cluster hosts and uploads it to the S3 storage as defined in the plugin configuration. You can verify the uploaded files using the AWS CLI tool or another S3 client.

To run the example AWS CLI commands below, configure the following environment variables:

  • AWS_ACCESS_KEY_ID

  • AWS_SECRET_ACCESS_KEY

  • AWS_DEFAULT_REGION

  • AWS_ENDPOINT_URL

For example:

$ export AWS_ACCESS_KEY_ID=GK8bbdf080f0717c03add3e461
$ export AWS_SECRET_ACCESS_KEY=28efc14dfa24b7965e74a69b7a3a619e2b49e090875c883377b5a61ba0b48f05
$ export AWS_DEFAULT_REGION='garage'
$ export AWS_ENDPOINT_URL='http://10.92.40.164:3900'

Check the backup contents in the S3 storage using the AWS CLI s3 ls command:

$ aws s3 ls test-backup-bucket/test/ggbackup/backups/20251021/20251021053010/

The output shows all the backup files created on cluster hosts:

2025-10-21 05:30:13      39724 gpbackup_0_20251021053010_19450.gz
2025-10-21 05:30:13       1123 gpbackup_0_20251021053010_19460.gz
2025-10-21 05:30:13      22351 gpbackup_0_20251021053010_19463.gz
2025-10-21 05:30:13      40713 gpbackup_1_20251021053010_19450.gz
2025-10-21 05:30:13       1050 gpbackup_1_20251021053010_19460.gz
2025-10-21 05:30:13      22524 gpbackup_1_20251021053010_19463.gz
2025-10-21 05:30:14        800 gpbackup_20251021053010_config.yaml
2025-10-21 05:30:13       2751 gpbackup_20251021053010_metadata.sql
2025-10-21 05:30:13        354 gpbackup_20251021053010_plugin_config.yaml
2025-10-21 05:30:14       1869 gpbackup_20251021053010_report
2025-10-21 05:30:13       4446 gpbackup_20251021053010_toc.yaml
2025-10-21 05:30:13      40610 gpbackup_2_20251021053010_19450.gz
2025-10-21 05:30:13       1076 gpbackup_2_20251021053010_19460.gz
2025-10-21 05:30:13      21971 gpbackup_2_20251021053010_19463.gz
2025-10-21 05:30:13      39901 gpbackup_3_20251021053010_19450.gz
2025-10-21 05:30:13       1025 gpbackup_3_20251021053010_19460.gz
2025-10-21 05:30:13      21655 gpbackup_3_20251021053010_19463.gz
NOTE

The metadata of backups created with the S3 storage plugin is also preserved locally on the master host. In addition to usual metadata files, it includes the gpbackup_<timestamp>_plugin_config.yaml file, which contains the plugin configuration used for the operation. You can use this file to recreate the configuration for restore.

You can use the S3 plugin with most standard backup customization options supported by gpbackup, for example:

  • Partial backups with schema and table filters:

    $ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --include-table public.customers
  • Incremental backups:

    $ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --incremental --leaf-partition-data

    Incremental backups require an existing full backup created with the --leaf-partition-data option in the same S3 location.

  • Single-file backups, where each segment writes one file instead of per-table files:

    $ gpbackup --dbname marketplace --plugin-config /home/gpadmin/s3-config.yaml --single-data-file

Restore data from the S3 storage

IMPORTANT

To restore a backup created with the S3 storage plugin, you must use the same plugin binary and a compatible plugin configuration file.

To restore data from a backup in the S3 storage, run gprestore with the same plugin configuration as the one used for the backup creation:

$ gprestore --timestamp 20251021054129 --plugin-config /home/gpadmin/s3-config.yaml

After a successful restore, the output includes the following message:

20251021:05:43:42 gprestore:gpadmin:spn-greengage24:003827-[INFO]:-Restore completed successfully

You can also use the plugin together with standard gprestore options, for example:

  • Partial restore:

    $ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --include-table public.customers
  • Data-only restore:

    $ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --data-only
  • Redirected restore into a different database:

    $ createdb marketplace_restore
    $ gprestore --timestamp 20251027040114 --plugin-config /home/gpadmin/s3-config.yaml --redirect-db marketplace_restore

    This scenario is useful for validating backups without overwriting production data.

S3 storage plugin configuration options

The S3 storage plugin uses a YAML configuration file that specifies the plugin options. This includes the plugin executable path, storage connection credentials, backup location, and operational parameters.

The general structure of the configuration file is as follows:

executablepath: <absolute-path-to-gpbackup_s3_plugin>
options:
  region: <aws-region>
  endpoint: <S3-endpoint>
  aws_access_key_id: <aws-access-key-id>
  aws_secret_access_key: <aws-access-secret-key>
  aws_session_token: <aws-session-token>
  bucket: <s3-bucket>
  folder: <s3-location>
  encryption: [on|off]
  backup_max_concurrent_requests: [int] # default 6
  backup_multipart_chunksize: [string] # default 500MB
  restore_max_concurrent_requests: [int] # default 6
  restore_multipart_chunksize: [string] # default 500MB
  http_proxy: <proxy-address>
  remove_duplicate_bucket: [true|false]
Key Description

executablepath

Absolute path to the S3 storage plugin executable, for example, /usr/local/gpdb/bin/gpbackup_s3_plugin or $GPHOME/bin/gpbackup_s3_plugin. The file must be available on all cluster hosts in the same location

options

The mandatory file section that contains all S3 storage–specific parameters

region

AWS region to use. Required for Amazon S3. In this case, if region is specified, endpoint is resolved automatically.

When using a different S3-compatible storage, this option may be optional depending on storage configuration

endpoint

HTTP or HTTPS endpoint of an S3-compatible storage. Required for non-Amazon or self-hosted S3-compatible storages.

For Amazon S3, the endpoint is resolved automatically based on region

aws_access_key_id

ID of an access key used for S3 authentication. Required together with aws_secret_access_key when using key-based access.

NOTE

If no credentials are specified in the configuration file, the plugin searches for them in the following order:

  1. AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_SESSION_TOKEN environment variables.

  2. AWS CLI credentials configured with aws configure.

  3. IAM role credentials (if running on an EC2 instance).

aws_secret_access_key

Secret key associated with the specified access key ID. Required if aws_access_key_id is specified

aws_session_token

Temporary AWS session token. Required for token-based access

bucket

Name of the S3 bucket to store backups in. The bucket must already exist in the target storage

folder

Path inside the bucket where backups are stored. If the folder does not exist, the plugin creates it during the first backup.

Backups are stored in the following locations: <folder>/backups/YYYYMMDD/YYYYMMDDHHMMSS

encryption

Specifies whether to use a secure (SSL) connection to the storage. Possible values: on (default) and off

backup_max_concurrent_requests

Maximum concurrency level for uploading a single file during backup (default 6).

The total number of concurrent uploads also depends on the number of segments and the gpbackup --jobs value. For example, in an 8-segment cluster with default settings, each segment can perform up to 6 concurrent uploads, for a total of 48 uploads. If the number of backup jobs is increased with --jobs to 10, each job can upload up to 6 artifacts concurrently. This results in the total limit of 480 concurrent uploads.

This upload limit applies to multipart uploads defined by the backup_multipart_chunksize option. Files smaller than the chunk size are uploaded in a single operation

backup_multipart_chunksize

The chunk size for multipart upload requests. Can be set in bytes (B), megabytes (MB), or gigabytes (GB). The default is 500MB, and the minimum allowed value is 5MB (or 5242880B).

Chunks are uploaded in parallel; the concurrency level is defined by backup_max_concurrent_requests

restore_max_concurrent_requests

Maximum concurrency level of a single file download during restore (default 6). Works similarly to backup_max_concurrent_requests

restore_multipart_chunksize

The chunk size for multipart download requests during restore. Works similar to backup_multipart_chunksize

http_proxy

Address of an HTTP proxy used to access the storage. Can be specified as a URL, with or without authentication:

  • http://<url>:<port>

  • http://<username>:<password>@<url>:<port>

remove_duplicate_bucket

Possible values: false (default) and true. Helps avoid NoSuchBucket errors on some S3-compatible storages