Incremental backups
This topic provides an overview of incremental backups in Greengage DB and explains how they are created and restored using the gpbackup and gprestore utilities.
Incremental backups allow you to capture only the tables that were changed since a previous backup. Compared to full backups, they are faster to create and require significantly less disk space. They also decrease the number of locks acquired during the backup process.
In some DBMSs, incremental backups capture only the individual rows that were added or modified. In Greengage DB, incremental backups include entire tables once any change has occurred.
In Greengage DB, incremental backups are supported by the same gpbackup and gprestore utilities that perform full backups.
These utilities identify and back up only those append-optimized (AO) tables or partitions that have been modified since the last compatible backup in the same backup set.
The size of incremental backups is reduced because gpbackup excludes AO tables that have not changed since the previous backup.
This optimization applies to both row- and column-oriented AO tables.
Operations that mark an AO table as changed include:
-
ALTER TABLE -
DELETE -
INSERT -
TRUNCATE -
UPDATE -
Dropping and recreating the table (
DROP TABLEfollowed byCREATE TABLE).
Incremental backups are most effective when data modifications occur in a limited subset of tables compared to the overall database size.
For partitioned AO tables, the space savings can be even greater: only modified leaf partitions are included in an incremental backup. If partitioned AO tables are designed as intended — where existing rows are rarely updated — incremental backups can substantially reduce both backup size and duration.
Heap tables, however, are always fully included in each incremental backup, regardless of whether they were modified.
Like regular backups, incremental backups can be either full or partial. They also support the same customization options as regular backups: you can specify custom backup directories, adjust compression settings, and do other adjustments. This allows incremental backups to fit seamlessly into existing backup workflows and automation scripts.
Backup sets
The central concept in Greengage DB’s incremental backup mechanism is a backup set. A backup set consists of:
-
One full (base) backup — a non-incremental backup that captures a snapshot of the entire database or selected objects (if the backup is partial). It serves as the foundation of the set.
-
One or more incremental backups — each incremental backup stores only the tables changed since the previous backup in the same set, whether that backup was full or incremental.
Together, all backups in a single set represent a consistent sequence of database states over time, with reduced redundancy due to the exclusion of unchanged append-optimized (AO) tables.
A complete backup set (a base full backup and all subsequent incremental backups) is required for a full database restore. However, you can also perform an incremental restore using only one incremental backup to recover the data captured by that specific backup.
Ways to create backup sets
There are two ways backups can be grouped into sets:
-
Automatic — by default, Greengage DB uses the backup history metadata to locate the most recent backup whose options are compatible with the current
gpbackupcommand. That backup automatically becomes the base for the new incremental backup. -
Manual — alternatively, you can explicitly specify the timestamp of an existing backup to use as the base for the new incremental backup.
See the incremental backup process example below to learn how to create backups in both these ways.
Backup option compatibility
All backups within a single set must be created with compatible gpbackup options.
gpbackup checks this when selecting a backup to base a new incremental backup on.
Backups are considered compatible in terms of a backup set if the following parameters match:
-
--dbname— backups must store the same database. -
--leaf-partition-data— required: the base full backup and every incremental backup in the set must be created with--leaf-partition-data. -
--backup-dir— if a custom backup location is used, it must be the same across the entire set. -
--single-data-file— all backups in the set must either use or not use the single-file layout consistently. -
--plugin-config— if the base backup was created using a plugin, all subsequent incremental backups must use the same plugin and its configuration file. -
--no-compression— if one backup in the set is uncompressed, all others must also be uncompressed.NOTEFor compressed backups, compression algorithms and levels can differ between backups in the same set.
-
For partial backups, schema and table filters (such as
--include-schemaor--include-table) must be identical. When filtering by schema, only the schema name is validated, not its contained objects.
Create an incremental backup
To create an incremental backup of a database, run gpbackup with the --incremental and --leaf-partition-data options:
$ gpbackup --dbname marketplace --leaf-partition-data --incremental
When invoked with these options, gpbackup searches the backup history for a compatible backup to use as a base.
If found, its timestamp is displayed in the utility output, for example:
[INFO]:-Basing incremental backup off of backup with timestamp = 20251017091457
gpbackup then determines the scope of the incremental backup — that is, the AO tables and partitions that have changed since the base backup.
This information defines which data is written to the backup directories on segment hosts.
All heap tables are always included in incremental backups, regardless of whether they were modified.
If no compatible base backup is found, gpbackup terminates with an error similar to the following:
[CRITICAL]:-There was no matching previous backup found with the flags provided. Please take a full backup.
In this case, create a new full backup using compatible options before retrying the incremental one.
After the backup is created successfully, you can find all the previous backups it is based on in the backup report in the following lines:
<...> incremental: True incremental backup set: 20251017091457 20251017091528 <...>
To explicitly base an incremental backup on a specific previous backup, use the --from-timestamp option and provide the timestamp of that backup:
$ gpbackup --dbname marketplace --leaf-partition-data --incremental --from-timestamp 20251017091457
Incremental backup process example
This section walks through a typical example of using incremental backups in Greengage DB.
The sample database marketplace contains tables of different types shown in the table below.
| Name | Storage type | OID |
|---|---|---|
customers |
heap |
19135 |
products |
append-optimized |
19125 |
sales |
append-optimized, partitioned |
19357, 19366, 19375, 19384 |
Table OIDs are used to identify data backup files that store the table data.
They are shown in the backup’s table of contents (the gpbackup_YYYMMDDhhmmss_toc.yaml file in the backup directory on master).
For the partitioned table sales, OIDs of leaf partitions are specified.
In this scenario, incremental backups are created daily to capture changes made to the database during that day.
Step 1: create a full backup
To start using incremental backups, you first need to create a full backup. This full backup will serve as the base for all subsequent incremental ones.
The backup must be created with the --leaf-partition-data option:
$ gpbackup --dbname marketplace --leaf-partition-data
This command performs a regular Greengage DB backup, which can be restored independently if needed. In this example, the backup has the following timestamp (January 1):
[INFO]:-Backup Timestamp = 20250101091457
It includes all tables from the database. You can inspect its content on a segment host by listing the backup directory:
$ ls -l /data1/primary/gpseg0/backups/20250101/20250101091457/
total 76 -rw------- 1 gpadmin gpadmin 39724 Jan 1 09:14 gpbackup_0_20250101091457_19125.gz -rw------- 1 gpadmin gpadmin 1123 Jan 1 09:14 gpbackup_0_20250101091457_19135.gz -rw------- 1 gpadmin gpadmin 20 Jan 1 09:14 gpbackup_0_20250101091457_19357.gz -rw------- 1 gpadmin gpadmin 8530 Jan 1 09:14 gpbackup_0_20250101091457_19366.gz -rw------- 1 gpadmin gpadmin 6179 Jan 1 09:14 gpbackup_0_20250101091457_19375.gz -rw------- 1 gpadmin gpadmin 7392 Jan 1 09:14 gpbackup_0_20250101091457_19384.gz
Each file corresponds to a table or partition in the database: customers (19135), products (19125), and four partitions of sales.
Step 2: start a backup set
Once the base backup is in place, you can start creating incremental backups.
To create the first incremental backup, run the same command as before but add the --incremental and --leaf-partition-data options:
$ gpbackup --dbname marketplace --leaf-partition-data --incremental
When invoked with these options, gpbackup automatically searches for a compatible full backup to use as its base.
The output displays both the new and the base backup timestamps (January 1 and January 2):
[INFO]:-Backup Timestamp = 20250102091528 <...> [INFO]:-Basing incremental backup off of backup with timestamp = 20250101091457
After completion, the new incremental backup forms a backup set together with the base full backup. This is shown in the backup report:
$ cat /data1/master/gpseg-1/backups/20250102/20250102091528/gpbackup_20250102091528_report
<...> incremental: True incremental backup set: 20250101091457 20250102091528 <...>
To verify what was captured, check the segment directory:
$ ls -l /data1/primary/gpseg0/backups/20250102/20250102091528/
total 56 -rw------- 1 gpadmin gpadmin 39724 Jan 2 09:15 gpbackup_0_20250102091528_19125.gz -rw------- 1 gpadmin gpadmin 1123 Jan 2 09:15 gpbackup_0_20250102091528_19135.gz -rw------- 1 gpadmin gpadmin 8530 Jan 2 09:15 gpbackup_0_20250102091528_19366.gz
Here you can see that the incremental backup contains only:
-
heap table
customers(19135); -
modified AO table
products(19125); -
modified partition of
sales(19366).
Unchanged partitions of sales are omitted since their data remains valid in the base backup.
From this point, you can either restore the database using this backup set or continue adding new incremental backups to it.
Step 3: append new incremental backups
To append another incremental backup to the same backup set, repeat the same command:
$ gpbackup --dbname marketplace --leaf-partition-data --incremental
If previous incremental backups exist, the new one (January 3) is automatically based on the most recent in the set (January 2):
[INFO]:-Backup Timestamp = 20250103091621 <...> [INFO]:-Basing incremental backup off of backup with timestamp = 20250102091528
Now the backup set grows to three backups:
$ cat /data1/master/gpseg-1/backups/20250103/20250103091621/gpbackup_20250103091621_report
<...> incremental: True incremental backup set: 20250101091457 20250102091528 20250103091621 <...>
Each new incremental backup is appended sequentially to the chain.
Inspect the content of the latest backup:
$ ls -l /data1/primary/gpseg0/backups/20250103/20250103091621/
total 16 -rw------- 1 gpadmin gpadmin 1123 Jan 03 09:16 gpbackup_0_20250103091621_19135.gz -rw------- 1 gpadmin gpadmin 8530 Jan 03 09:16 gpbackup_0_20250103091621_19366.gz
This output shows that only the partition of sales (the one with OID 19366) was modified since the previous incremental backup.
Heap table customers (19135) is always included in incremental backups.
Step 4: create an incremental backup from a specific timestamp
In some cases, you may want to base an incremental backup on a specific earlier point in time — for example, to merge several backups into one or to capture a custom range of changes for restore testing.
To do this, use gpbackup with the --from-timestamp option and specify the desired base backup timestamp:
$ gpbackup --dbname marketplace --leaf-partition-data --incremental --from-timestamp 20250102091528
This forces gpbackup to base the new backup on the specified timestamp, even if a newer compatible backup exists:
[INFO]:-Backup Timestamp = 20250104094401 <...> [INFO]:-Basing incremental backup off of backup with timestamp = 20250102091528
The new backup captures changes between 20250102091528 and 20250104094401.
This creates a new backup set that includes the new backup, the selected base backup, and all backups it depends on.
The backup report shows this new set:
$ cat /data1/master/gpseg-1/backups/20250104/20250104094401/gpbackup_20250104094401_report
incremental: True incremental backup set: 20250101091457 20250102091528 20250104094401
The previously created backup with timestamp 20250103091621 remains valid, and the backup set created in the previous step can still be used for restore.
Restore incremental backups
Incremental backups can’t be restored after changes in the cluster segment configuration, such as cluster expansion. After such changes, create a full backup to base future incremental backups on.
There are two main ways to use incremental backup for restore operations:
-
Full restore — rebuilds the entire database from a complete backup set.
-
Incremental restore — restores only the data captured in a specific incremental backup.
Full restore
To fully restore a database from an incremental backup set, run gprestore with the --timestamp option.
This option specifies the most recent backup in the set — the one that defines the target database state.
For example, use the timestamp 20250103091621 to restore the backup set created earlier:
$ gprestore --timestamp 20250103091621
This command restores the database as follows:
-
AO tables and partitions are restored from their most recent snapshots, which may reside in different backups within the set.
-
Heap tables are restored from the specified backup, that is, the last one in the set.
As with a regular restore, the tables being restored must not already exist in the target database.
If you use the --verbose flag, the command output shows exactly which backups each table is restored from.
For example:
[DEBUG]:-Restoring data for 1 tables from backup with timestamp: 20250102091528 [DEBUG]:-Executing "COPY public.products(id,name,description,created_at,price) FROM PROGRAM 'cat <SEG_DATA_DIR>/backups/20250102/20250102091528/gpbackup_<SEGID>_20250102091528_19125.gz | gzip -d -c' WITH CSV DELIMITER ',' ON SEGMENT;" on master <...> [DEBUG]:-Restoring data for 2 tables from backup with timestamp: 20250103091621 [DEBUG]:-Executing "COPY public.customers(id,name) FROM PROGRAM 'cat <SEG_DATA_DIR>/backups/20250103/20250103091621/gpbackup_<SEGID>_20250103091621_19135.gz | gzip -d -c' WITH CSV DELIMITER ',' ON SEGMENT;" on master <...> [DEBUG]:-Restoring data for 3 tables from backup with timestamp: 20250101091457 [DEBUG]:-Executing "COPY public.sales_1_prt_other_dates FROM PROGRAM 'cat <SEG_DATA_DIR>/backups/20250101/20250101091457/gpbackup_<SEGID>_20250101091457_19357.gz | gzip -d -c' WITH CSV DELIMITER ',' ON SEGMENT;" on master
This log shows how gprestore selects the correct data snapshot for each table based on when it last changed.
Since any leading subset of a backup set is also valid, you can restore the database to any earlier state simply by specifying the timestamp of an earlier backup.
For example, restoring from the 20250102091528 backup would bring the database back to its state as of January 2.
Incremental restore
Sometimes you may only need to restore specific data changes from one incremental backup rather than the entire database. This process is called an incremental restore.
An incremental restore restores the following data captured in a single incremental backup:
-
heap tables;
-
AO tables that were modified between the creation times of this backup and its base backup;
-
table partitions modified within the same time frame.
During an incremental restore, gprestore truncates the target table and reloads it entirely from the specified backup.
For partitioned tables, this operation happens at the partition level: partitions not included in the backup remain untouched, while changed partitions are overwritten.
It’s important to note that incremental restore does not apply individual row changes. Instead, it replaces the affected tables or partitions with their complete snapshots from the backup. For example, if a single row was added to an AO table, restoring incrementally from that backup will bring the entire table to its state after that insert, not simply append the missing row. Similarly, for partitioned tables, all partitions that were changed are fully rewritten.
To perform an incremental restore, specify the backup timestamp along with two additional options: --incremental and --data-only.
The target tables must already exist in the database before running the restore.
$ gprestore --timestamp 20250103091621 --incremental --data-only
This command restores only the data captured in the incremental backup with timestamp 20250103091621.
The tables omitted in this backup remain unchanged.