Overview of Greengage cluster backup and restore

Contents

What might cluster backup look like?
Why binary?
Greengage backup
- Prepare environment
- Create a backup
Cluster restore
Results

Greengage DB is a massively parallel relational DBMS based on Greenplum OSS, designed to store and process large amounts of data. It allows you to perform complex analytical queries over large amounts of data and provides heterogeneous access to them through various connectors and integration tools.

In addition to its core functionality, Greengage also offers features such as monitoring, auditing, backup, etc. They are required to ensure full and reliable operation of the system, especially when it comes to production use. In this article, we will discuss the approach to back up a Greengage cluster: what are the possibilities, potential challenges, and much more.

First, we will cover the basics of what you need to know to understand the backup process. Then, we will move on to a practical approach to Greengage backup. If you are not interested in the theory, you can skip to the Greengage backup section.

To begin with, cluster backup is not just about creating backup copies (backups). First of all, backup is about ensuring fault tolerance and high availability. In the case of cluster storage systems, this can be achieved by:

data replication (synchronous or asynchronous replicas);
auto-failover (PostgreSQL Patroni + etcd/ZooKeeper);
multi-master (PostgreSQL BDR from EDB);
backup (pg_dump, pg_basebackup, gpbackup);
incremental backups (WAL archiving);
geo-replication (logical replication between DCs via CDC);
use of orchestrators (k8s + StatefulSets).

For the most part, it all depends on the system capabilities, budget, and the choice between data consistency and service availability (according to the CAP theorem). In the context of a cluster DBMS, backup options can be grouped by data "temperature":

Hot Standby (Active-Active):
- Data is instantly available for reading/writing.
- Full synchronous or near-synchronous replication.
- High cost of implementation.
Warm Standby (Active-Passive):
- Replicas are a few seconds/minutes behind the master.
- Data can be read, but writing is only possible to the master.
- Replication can be asynchronous or semi-synchronous (write majority).
Cold Backup (Delayed Replica):
- Data updates are rare.
- Stored in backups or archives.
- Recovery may take time.

In Greengage (as in Greenplum), backup is supported through the following features:

segment mirroring (availability of a synchronous replica for each cluster segment);
logical backup (gpbackup/gprestore);
standby master segment.

Next, we will consider another option for backup — by taking a physical backup. While we will present a working solution here, it should not be used in production. The goal is to share an idea of what approach to the creation of a physical backup can be applied, what nuances to pay attention to, and what drawbacks this approach may have.

Binary backups are often used to create replicas, recover from a crash, back up large databases, but this is more about vanilla PostgreSQL. If we talk about a Greengage cluster, then by backing up individual segments, you can parallelize both making backup copies and restoring from them. In addition, you can restore the mirror of each segment from the same segment backup.

But first, let’s discuss what a binary backup should look like and how to get it.

Since Greengage is based on the PostgreSQL core, we first need to understand: what is physical backup in PostgreSQL?

In its simplest form, a binary backup can be considered a manual copy of the PGDATA directory. First, you need to set up WAL archiving, and then, in addition to the backup, store the WAL archives. To start the backup process, run the query:

SELECT pg_start_backup('/mnt/backup/2025/04/', true);

Next, you need to copy the files from PGDATA and then stop the backup process by running:

SELECT pg_stop_backup();

But is it worth copying all the data? Actually, no. To avoid confusing PostgreSQL during the recovery process, the following data should be excluded from the backup:

backup_label.old (created when the backup is interrupted);
postmaster.pid and postmaster.opts (will affect the operation of pg_ctl);
recovery.conf and postgresql.auto.conf (should be created after restoring from a backup to start the recovery process);
recovery.done (indicates the completion of the recovery process);
the pg_replslot/, pg_dynshmem/, pg_notify/, pg_serial/, pg_snapshots/, pg_stat_tmp/, and _pg_subtrans/ directories (the content is initialized when postmaster is started).

In addition to the above, there are a number of files that contain Greengage-specific settings, and they should be taken into account when restoring. These include:

gpperfmon/conf/gpperfmon.conf (the gpperfmon extension configuration file stored only on the master);
internal.auto.conf (corresponds to dbid of the cluster instance in the gp_segment_configuration table);
postgresql.conf (contains a specific gp_contentid setting with the cluster segment identifier).

As you can see, the backup process itself has many nuances that you should consider when using the low-level API for creating a physical backup. Instead of performing these actions manually, you can use the pg_basebackup utility supplied with PostgreSQL that automates some of the physical backup stages. There are also a number of utilities that extend the basic capabilities of physical backup (pgbackrest, wal-g, barman) and simplify backup administration. In the examples below, we will use pgbackrest to build a solution, but you can choose any option that is convenient for you.

Point-in-time recovery (PITR) is the process of restoring from WAL archives to a specified point in WAL. Depending on the PostgreSQL version, the recovery target (recovery_target) can be specified as LSN (WAL record identifier), xid (transaction identifier), time, restore point (restore point name). For more details, see the official documentation.

PITR directly depends on a physical backup: first, it restores from the last one, and then, in the PostgreSQL cluster recovery configuration, you can specify a point up to which you want to restore. This allows you to restore data that was created/modified after the backup was completed. We will need this later to create a consistent backup copy of the Greengage cluster.

Continuous archiving is a key mechanism for fault tolerance and minimizing data loss. In combination with a full backup (pg_basebackup), it allows restoring a database to any available state.

Up to this point, we have only talked about backing up a PostgreSQL cluster. Now, it’s time to think about a cluster copy. If you simultaneously start creating a physical copy from each cluster segment and wait for the backup to complete for all segments, can the collection of these copies be considered a cluster backup? The answer is no. This approach does not guarantee data consistency at the cluster level, since each segment will finish copying at its own time.

To solve this problem, after completing the backup of all segments, it is enough to create a distributed named point using the gp_create_restore_point(<rp_name>) function, which is available in the gp_pitr extension and takes the name of the restore point as an argument.

Thus, the cluster backup will look like:

list of physical backups for each segment;
restore point name created after the backup of segments was completed;
WAL archives of each segment, from the start of the backup to the archive where the restore point was written.

Now, the physical data should be enough to restore the cluster. But what about the metadata of the cluster itself? If it is unavailable, where can we get information about its hosts and segment distribution? There are several possible solutions. As an option, it is suggested to additionally, immediately before starting the backup, get the state of the gp_segment_configuration table and save it in addition to the backup metadata. This will not only simplify the recovery process, but also perform additional checks if necessary.

To work with physical copies, the pgbackrest utility is used, as it provides much more options for managing backups compared to pg_basebackup.

NOTE

Someone may ask: why not use, for example, WAL-G, since it already supports backup for a Greenplum cluster, which in fact should work for Greengage as well? Indeed, it can be used in some cases as a backup tool. However, the purpose of this article is more educational, namely: to give an insight into the approach to backup of a cluster DBMS, in order to understand what nuances and limitations there may be.

So, following the previous discussion, let’s create a backup of a Greengage cluster. But first, we need to prepare the necessary environment:

Build pgbackrest with Greengage support:

$ meson setup build --prefix=$GPHOME --libdir=$GPHOME/lib --pkg-config-path=$GPHOME/lib/pkgconfig

Install pgbackrest on each cluster host and grant launch permissions to the gpadmin user.
Configure pgbackrest to work with segments of each cluster host. To do this, create the pgbackrest.conf configuration file on each cluster host.
Example of pgbackrest.conf
[global] repo1-type=s3 repo1-path=/ repo1-s3-bucket=prod repo1-s3-endpoint=https://s3storage:9400/ repo1-s3-region=region repo1-s3-key-type=shared repo1-s3-key=IbeqK139x8jZVlfFXVNZ repo1-s3-key-secret=PEU8HQpagkpKojgzplHTgBWAqXBYl0CqZw1vBtDn repo1-s3-uri-style=path repo1-storage-upload-chunk-size=5242880 repo1-storage-verify-tls=n repo1-retention-full=3 repo1-retention-archive-type=full repo1-retention-archive=3 repo1-retention-diff=6 repo1-bundle=y repo1-bundle-limit=2MB repo1-bundle-size=100MB repo1-block=n lock-path=/tmp/pgbackrest exclude=pg_log/gp_era exclude=backups exclude=gpperfmon/data/ exclude=gpbackup_history.yaml archive-timeout=60.0 db-timeout=1800.0 protocol-timeout=1830.0 io-timeout=60.0 compress-type=gz compress-level=6 buffer-size=1MB process-max=20 log-level-console=info log-level-file=info log-path=/home/gpadmin/gpAdminLogs start-fast=y expire-auto=n fork=GPDB checksum-page=n [seg0] pg1-path=/data1/primary/gpseg0 pg1-port=10000 [seg1] pg1-path=/data1/mirror/gpseg1 pg1-port=10501
The global section contains general settings for all segments. Among others, it is worth paying attention to the following:
- settings of the repository for storing WAL archives and backups (the repo1-* parameters);
- list of files and directories excluded from the backup;
- the fork parameter, which can work correctly with Greengage segments when it is set to GPDB;
- disabling automatic deletion of backups (the expire-auto parameter).
  
  Next, there are sections (in the terminology of pgbackrest this is stanza) specific to each segment on a particular host. The section name has the seg<content> format (<content> is a segment identifier from gp_segment_configuration.content). The section itself contains information from gp_segment_configuration. It is important that this file specifies sections for both primary and mirror segments on the host. Due to this, if you switch from primary to mirror, the WAL archiving process will not be disrupted.
On each cluster host, configure the PGBACKREST_CONFIG environment variable — specify the absolute path to the pgbackrest.conf configuration file.
On the Greengage cluster, enable WAL archiving mode:
```
$ gpconfig -c archive_mode -v on
$ gpconfig -c archive_command -v ''\"'\"'PGOPTIONS=\"-c gp_session_role=utility\" pgbackrest --stanza=seg%c archive-push %p'\"'\"'' --skipvalidation
```
When archiving is performed, %c will be replaced with the segment identifier, and %p will be replaced with the path to the file to be archived. Applying the changes requires the cluster restart:
```
$ gpstop -arM fast
```
Install the gp_pitr extension:
```
CREATE EXTENSION gp_pitr;
```

NOTE

For demonstration purposes, we have prepared a small ggbm utility that implements the basic backup and restore options for a Greengage cluster.

Next is the backup process itself:

Take a lock to perform an operation on the cluster. This can be a lock file, synchronization via an external coordinator, or something else. It is important that only one operation (backup, recovery, creating a restore point, deleting a backup copy, etc.) can be performed on the cluster at a time.
Take the current topology and save it in a separate file:
```
SELECT dbid, content, role, preferred_role, port, hostname, address, datadir
FROM gp_segment configuration;
```
It is worth noting that we are not interested in the values of mode and status, because during recovery they depend on whether the mirrors will be restored or not (and in what way).

Run backup on the master and each primary segment of the cluster. The backup type should be the same for all segments, and the stanza name should match the segment’s content value:

$ gpssh -h sdw1 -v -e 'PGOPTIONS="-c gp_session_role=utility" pgbackrest --type=full --stanza=seg0 backup'

Wait for the response code 0 from each pgbackrest process. Below is an example of the output when the backup is complete:

[mdw1] 2025-04-30 21:18:39.086 P00   INFO: backup start archive = 000000010000000000000011, lsn = 0/44000028
[mdw1] 2025-04-30 21:18:39.086 P00   INFO: check archive for prior segment 000000010000000000000010
[mdw1] 2025-04-30 21:18:54.523 P00   INFO: execute exclusive backup stop and wait for all WAL segments to archive
[mdw1] 2025-04-30 21:18:56.525 P00   INFO: backup stop archive = 000000010000000000000011, lsn = 0/440000D0
[mdw1] 2025-04-30 21:18:56.525 P00   INFO: check archive for segment(s) 000000010000000000000011:000000010000000000000011
[mdw1] 2025-04-30 21:18:56.543 P00   INFO: new backup label = 20250430-211837F
[mdw1] 2025-04-30 21:18:56.656 P00   INFO: full backup size = 289MB, file total = 1714
[mdw1] 2025-04-30 21:18:56.656 P00   INFO: backup command end: completed successfully (19582ms)
[INFO] completed successfully

Also, for each segment, save the name of the generated backup set to the backup metadata (it can be obtained from the log or the output of pgbackrest info for a specific stanza).

Create a restore point and force the WAL log to switch to the next segment:

adb=# SELECT gp_segment_id, restore_lsn, pg_xlogfile_name(restore_lsn) as walfile FROM gp_create_restore_point('rp_1');
 gp_segment_id | restore_lsn |         walfile
---------------+-------------+--------------------------
            -1 | 0/1CA5DD80  | 000000010000000000000007
             0 | 0/37082A00  | 00000001000000000000000D
             1 | 0/370E4480  | 00000001000000000000000D
(3 rows)
adb=# SELECT pg_switch_xlog() UNION ALL SELECT pg_switch_xlog() FROM gp_dist_random('gp_id')

(This is an example for GG6. In GG7, the pg_xlogfile_name function is renamed to pg_walfile_name, and pg_switch_xlog is renamed to pg_switch_wal).

Save all metadata.

Example of backup

In the case of ggbm, the backup process will be as follows:

:~$ ggbm.py backup -n full_1 -t full
Backup completed successfully. Metadata saved to backup_full_1.json
Operation completed successfully

As a result, a backup copy of the cluster will be created with the full_1 named point, and the metadata will be saved in the backup_full_1.json file:

{
  "backup_name": "full_1",
  "restore_point": "full_1",
  "created_at": "2025-05-07T10:42:33.741219",
  "segments": [
    {
      "dbid": "1",
      "content": "-1",
      "role": "p",
      "preferred_role": "p",
      "mode": "n",
      "status": "u",
      "port": "5432",
      "hostname": "vdv-h1",
      "address": "vdv-h1",
      "datadir": "/data1/master/gpseg-1",
      "backup_success": true,
      "backup_label": "20250507-104217F"
    },
    {
      "dbid": "2",
      "content": "0",
      "role": "p",
      "preferred_role": "p",
      "mode": "n",
      "status": "u",
      "port": "10000",
      "hostname": "vdv-h2",
      "address": "vdv-h2",
      "datadir": "/data1/primary/gpseg0",
      "backup_success": true,
      "backup_label": "20250507-104217F"
    },
    {
      "dbid": "3",
      "content": "1",
      "role": "p",
      "preferred_role": "p",
      "mode": "n",
      "status": "u",
      "port": "10000",
      "hostname": "vdv-h3",
      "address": "vdv-h3",
      "datadir": "/data1/primary/gpseg1",
      "backup_success": true,
      "backup_label": "20250507-104217F"
    }
  ]
}

As a result, we have a full backup with a synchronization point. Then, we can create new backups (including differential and incremental ones) or only named points (to restore to them from the closest backup).

Now, let’s move on to the cluster restore process. Obviously, we should already have at least one backup copy of the cluster before restoring. But we cannot start recovery right away without checking a number of important criteria:

The number of primary segments in the cluster should match the number of primary segments in the backup. In other words, if expand/shrink was performed on the cluster after the backup was created, it is not possible to restore it to the topology before the cluster was stopped (but it is possible to restore the cluster to the topology at the time the backup was created). The check can only be performed if the cluster is available and the gp_segment_configuration table can be accessed before the restore starts.
The cluster should be stopped. Restoring from a physical copy is only possible for a stopped cluster.
All segment processes should be stopped on all hosts. Otherwise, there may be problems when starting or performing recovery.
All cluster hosts should have the pgbackrest.conf configuration that points to the repository with the required backup.

If all the above conditions are met, then we can proceed directly to restoring the cluster. The main steps will be:

(Optionally) Delete all contents of PGDATA on the master and primary segments. If you plan to completely rewrite the directory, you can skip this step by adding the --force argument in the next step.
Run the process of restoring from the backup on the master and each primary segment (basing on the cluster topology):
```
$ gpssh -h sdw1 -v -e 'PGOPTIONS="-c gp_session_role=utility" pgbackrest --set=20250430-211837F --stanza=seg0 --delta  --recovery-option="recovery_target_action=shutdown" restore'
```
Upon completion, it will automatically create a configuration to perform the recovery process for each segment.
Wait until the previous step completes successfully and then start the cluster:
```
$ gpstart -a
```
Restore mirrors:
```
$ gprecoverseg -aF
```
Restore standby by recreating it.

Example of restore

To restore the cluster with the ggbm utility, execute the following command:

:~$ ggbm.py restore -n full_1
Cluster successfully restored from backup full_1
Operation completed successfully

This way, the master and primary segments that were in the primary role at the time of the backup will be restored. To complete the restore process, as well as recover the mirrors and standby, perform the steps 3-5 from the Cluster restore section.

So, we have considered the approach to data backup in a distributed system using Greengage DB as an example. Despite the rather high complexity of backing up a cluster in this way, it still has a number of advantages, such as:

Good scalability, since the backup speed can be scaled by:
- more segment hosts;
- vertical scaling of segment hosts and increasing the number of threads when performing backup/restore;
- using multiple repositories to shard the storage of cluster segment backups.
Guaranteed data consistency achieved by DBMS tools.
Reusability of physical copies. They can be used to create a full copy of the original cluster or to partially restore databases.

However, we should note that this article does not cover some issues that are important for the full use of cluster physical backup in the production environment. The main point is that the approach described above implements in-place recovery, i.e. recovery is only possible on the same environment from which the backup was created. The following issues were also not covered:

How to handle exceptions that may occur during backup/restore?
How to restore a mirror at the same time as a primary segment?
How to manage the deletion of backups?
How to check the correctness of backups?
How much more effective is physical backup compared to logical backup?
How to recover on other hosts?

In the following articles, we will address these types of issues and explore the benefits that can be achieved through the use of physical backup.

Deleting orphaned files

Greengage scaling approach

Greengage DB backup and recovery

What might cluster backup look like?

Why binary?

Physical backup

Point-in-time recovery

Continuous archiving

Backup copy composition

Greengage backup

Prepare environment

Create a backup

Cluster restore

Results