Overview of Greengage DB cluster backup and restore. Part 2

Contents

Cluster backup specifics
Is it necessary to back up an entire cluster each time?
Backup interruption
Backup deletion
Cluster topology change
Results

In the previous article, we discussed an approach to back up a Greengage cluster that was intended to provide readers with some basic knowledge necessary to understand how to create backups and then restore from them. We also considered a utility implementing this approach in practice. However, all previous discussions assumed "ideal conditions," without considering possible failures or the influence of external factors. Now, we are ready to move on to the next step, which is to explore a number of scenarios that can occur in practice and that we should take into account when backing up a Greengage cluster.

First, let’s consider the practicality of creating a full backup of a cluster each time. It is clear that this can be time-consuming and resource-intensive (both for creating a backup and for storing it). But we also should not forget about the RPO (Recovery Point Objective), which actually means how much data may be lost in case of recovery. Therefore, the more frequently we can create restore points, the lower the RPO will be.

Cluster restore scheme

In the previous article, we created a restore point after completing a physical backup on all cluster segments. However, we are not prevented from creating additional restore points after that. What’s the benefit? We can also restore data from an existing backup and replay the WAL log to a new restore point. Yes, the WAL log will be larger, but our data will be more up-to-date. You can read more about creating restore points here.

However, this also has a downside: the longer the gap between the backup and the restore point, the longer the RTO (Recovery Time Objective) will be, since a larger WAL should be applied over the restored files. The idea is quite simple, but it is important to note the following: since restoring to a restore point requires a backup made earlier and WAL, deleting that backup will make it impossible to restore to a similar restore point. This means that we need to take this into account when deleting backups, which we will discuss later.

Next, let’s discuss handling backup interruptions. There can be a lot of reasons for this, but it’s important to handle such situations correctly. The first question is: if a backup was interrupted, do we need to start from scratch to restart it? After all, we might have a backup that was running for a long time before the interruption occurred (either intentionally or due to an error). And if it were possible to restart the backup using the data already available, this would significantly reduce both time and space (because it wouldn’t require deleting the data already received).

In pgbackrest (the utility discussed in the previous article as the base for the prototype of the ggbm backup solution), the backup command has a special --resume option to handle this situation. It does the following: if the previous backup run was of the same type and did not complete successfully, the next run will skip files that have already been backed up. Exceptions are files whose checksums do not match and those that are grouped into a bundle (see the --repo-bundle option).

The use of this approach in Greengage is complemented by the following nuances:

Before running a backup, it’s worth checking the status and type of the previous backup in the metadata. If the previous backup was successful, it is better to run a backup with the explicit --no-resume option.
If it is possible to start with the --resume option, then it’s necessary to determine which segments require a restart. In this case, we need to consider the following options relative to the previous backup:
- A segment was not backed up. In this case, restart with --no-resume.
- A segment was not backed up partially. In this case, restart with --resume.
- A segment was backed up successfully. Such segments should be excluded from re-running the backup command.
Terminating the pgbackrest process does not remove the backup lock. To do this, explicitly call the pg_stop_backup function in the utility mode on each cluster segment.
After the backup command completes, we should check the actual backup type. The cluster backup type should match the type of backups of all segments, meaning the type should be the same on all segments. This additional metadata check is very useful for determining the actual type of a backup. For example, if a user chooses to take an incremental backup for the first time, no error will occur, but the actual type of the backup will be full. The same applies in case of backup deletion. We’ll discuss this later.

Taking all these details into account, we can implement a separate command to interrupt the backup, as well as handle an error during the backup on one of the segments, and automatically stop the backup process on the remaining cluster segments. Additionally, it becomes possible to restart the backup process.

Creating backups does not make much sense without a mechanism to delete them. Data in backups becomes outdated over time, not to mention it takes up space. But we also need to understand that deleting backup data is a very important operation. Therefore, it is necessary to consider different approaches and scenarios.

First, it’s worth considering the deletion mechanism built into pgbackrest and its applicability to Greengage. Specifically, let’s look at the retention policy setting. This mechanism allows configuring automatic deletion of backups after each successful completion of the backup command. There are two types of deletion available: by time or by quantity. The number of backups can only be configured for the full and diff types, while incremental backups are deleted along with the backups they depend on.

In addition, pgbackrest has the expire command that allows us to:

call the deletion procedure based on the retention setting;
delete a specific backup set, even if it does not fall under the retention policy (except if it is the only backup in stanza).

Overall, the functionality is quite flexible and automates the deletion process. However, if we look at the retention policy in pgbackrest from the Greengage side, we won’t be able to use it for the following reasons:

time-based deletion cannot be made consistent across a cluster (extra backup sets may be deleted or, conversely, unnecessary ones may be left);
quantity-based deletion does not take into account the backup statuses between segments.

To illustrate the latter point, let’s look at this example: we executed the backup command with the full backup type. Then, we ran it again with the same backup type, but an error occurred on one of the segments (it’s possible the backup did not even start due to an error). We did not restart the backup, but instead created a new backup for the entire cluster, of the incr type this time. Now, for some segments, the number of full backups differs. This means that if we delete backups automatically in the future, we may lose part of the cluster backup.

Example of an invalid incremental backup due to an error during a previous backup

So what should we do in this case? We’ll need to manage the deletion process outside of pgbackrest, disabling automatic deletion in the pgbackrest.conf configuration file:

expire-auto=n

The deletion algorithm itself may be similar to how it is implemented in pgbackrest:

determine the last (by time) backup set (hereinafter referred to as BS) to be deleted;
determine a list of backup sets created before BS;
determine all BS-dependent backup sets of the diff and incr types, which were created in the interval after BS and the closest backup set of the full type;
sequentially delete all backup sets determined in the previous steps and their associated WAL archives by calling the expire command.

Backup deletion based on retention policies

The result of deleting a specific backup depends on whether automatic policy-based deletion is applied. And if so, whether it is before or after. It depends on what you consider the expected behavior in a given scenario.

Deletion of the "Full 3" backup with retention policies applied before deletion

Deletion of the "Full 3" backup with retention policies applied after deletion

To complete the picture, we would like to add that deleting a backup of the diff type will be roughly the same as deleting a backup of the full type, but with some nuances. Deleting an incremental backup, however, doesn’t need to be considered separately, since it will always be deleted along with the full or diff backups it depends on.

In Greengage, cluster availability is ensured by mirroring segments. Switching a primary segment to its mirror can occur either automatically (for example, if a primary segment fails) or as planned (when a segment host is stopped for maintenance). But what is backup?

The point is that these events on a cluster change the display of the cluster state in the gp_segment_configuration table. Imagine a situation: we created a backup, then shut down one of the cluster hosts (all primary segments on that host switched to their mirrors, and the host itself became unavailable over the network). And then we needed to restore from a previously created backup. However, the backup metadata only contains information about the hosts on which the primary segments were active at the time of the backup. To restore the cluster, it is desirable to know the latest state of the cluster topology.

The solution here is clear: we need to obtain the current cluster topology (this is especially important if the cluster is no longer accessible), then compare the metadata of each segment from the backup with its current location, and restore it on the host where it was last in the primary status. However, before starting the cluster, it is important not to forget that even though we’ve restored the cluster on the correct hosts, the state in gp_segment_configuration will not correspond to it. Therefore, first, we should start only the master and overwrite the cluster topology on it with a new state:

$ pg_ctl start -D /data1/master/gpseg-1/

PGOPTIONS='-c gp_session_role=utility -c allow_system_table_mods=true' psql -d postgres -c "delete from gp_segment_configuration; insert into gp_segment_configuration(...);"

Then we can launch the entire cluster:

$ gpstop -arM fast

In addition to the above, a similar scenario is changing the cluster topology between an existing backup and a restore point created after. To account for this situation, it’s worth considering a restore point as another type of backup that refers to the previous physical backup. This requires storing the cluster topology for each restore point, which makes it possible to restore the cluster to the topology that existed when the restore point was created.

In this article, we’ve covered application scenarios that should definitely be considered in a Greengage cluster backup system. These include deleting and automatically deleting backups, taking into account changes in the cluster topology, and handling some errors. After implementing this functionality in the backup system, we can apply it to real cluster installations in a production environment. However, this does not mean we should stop there.

In the next article, we will explore even more complex backup scenarios, such as:

how configuration changes affect existing backups;
parallel recovery of primary and mirror segments;
restoring one cluster from a backup of another cluster.

Blog

How to contribute to Greengage DB

Greengage DB backup and recovery. Part 2

Cluster backup specifics

Is it necessary to back up an entire cluster each time?

Backup interruption

Backup deletion

Cluster topology change

Results