Check and recover segments

Pavel Semyonov

Contents

Check for failed segments
Recover failed segments
Rebalance a cluster

Segment mirroring allows Greengage DB (based on Greenplum) clusters to remain fully available in the event of one or more segment failures. When a segment fails, Greengage DB automatically promotes its mirror to serve as the new primary and continues processing user queries. Transactions that were in progress at the time of failure are rolled back and automatically retried once the mirror is promoted. As a result, segment failures can go unnoticed by users unless the corresponding mirrors fail too.

However, these failures can decrease performance because the cluster enters an unbalanced state. Some segment hosts may handle larger portions of data and perform more processing than others. In addition, the cluster loses a part of its total computing capacity.

To restore optimal cluster performance, fault tolerance, and balance, you should recover failed segments and ensure even data distribution.

This topic explains how to check for and recover failed segments in a Greengage DB cluster.

This section describes how to detect failed segments in a Greengage DB cluster.

The gpstate utility has the -e option, which displays information about segments that may have mirroring issues:

Switched roles between primary and mirror segments.
Primary-mirror pairs that are not synchronized.
Segments that are down.

$ gpstate -e

Example output with mirroring issues:

[INFO]:-----------------------------------------------------
[INFO]:-Segment Mirroring Status Report
[INFO]:-----------------------------------------------------
[INFO]:-Segments with Primary and Mirror Roles Switched
[INFO]:-   Current Primary   Port    Mirror   Port
[INFO]:-   sdw1              11000   sdw2     10000
[INFO]:-   sdw1              11001   sdw2     10001
[INFO]:-----------------------------------------------------
[INFO]:-Unsynchronized Segment Pairs
[INFO]:-   Current Primary   Port    WAL sync remaining bytes   Mirror   Port
[INFO]:-   sdw1              10000   Unknown                    sdw2     11000
[INFO]:-   sdw1              10001   Unknown                    sdw2     11001
[INFO]:-   sdw1              11000   Unknown                    sdw2     10000
[INFO]:-   sdw1              11001   Unknown                    sdw2     10001
[INFO]:-----------------------------------------------------
[INFO]:-Downed Segments (may include segments where status could not be retrieved)
[INFO]:-   Segment   Port    Config status   Status
[INFO]:-   sdw2      11000   Down            Down in configuration
[INFO]:-   sdw2      11001   Down            Down in configuration
[INFO]:-   sdw2      10000   Down            Down in configuration
[INFO]:-   sdw2      10001   Down            Down in configuration

If no issues are detected, the output ends with the following message:

[INFO]:-All segments are running normally

gp_segment_configuration and gp_configuration_history tables

The gp_segment_configuration system catalog table stores metadata about all cluster segments, including their current status. Segments that have failed have the d (down) value in the status column. You can get information about segment failures with a query like this:

SELECT * FROM gp_segment_configuration WHERE status='d';

Example result:

 dbid | content | role | preferred_role | mode | status | port  | hostname | address |        datadir
------+---------+------+----------------+------+--------+-------+----------+---------+-----------------------
    6 |       0 | m    | m              | n    | d      | 11000 | sdw2     | sdw2    | /data1/mirror/gpseg0
    7 |       1 | m    | m              | n    | d      | 11001 | sdw2     | sdw2    | /data1/mirror/gpseg1
    4 |       2 | m    | p              | n    | d      | 10000 | sdw2     | sdw2    | /data1/primary/gpseg2
    5 |       3 | m    | p              | n    | d      | 10001 | sdw2     | sdw2    | /data1/primary/gpseg3
(4 rows)

NOTE

Note that down segments configured as primary (preferred_role is p) are automatically demoted to mirrors after their failures.

Another system catalog table — gp_configuration_history — stores the history of changes in segment configurations, including role and status changes, with timestamps. This information can be useful for investigation of complex issues. For example, the following query finds events that happened between the specified dates:

SELECT * FROM gp_configuration_history WHERE time BETWEEN '2025-04-29' AND '2025-04-30';

Example gp_configuration_history row:

             time              | dbid |                                     desc
-------------------------------+------+-------------------------------------------------------------------------------
 2025-04-29 04:38:18.248111+00 |    9 | FTS: update role, status, and mode for dbid 9 with contentid 3 to m, u, and s

The gplogfilter utility searches Greengage DB logs for entries that match specified criteria. For detailed information about Greengage DB logs, see the Logging topic.

When called with the -t (--trouble) option, the utility filters for log entries with types ERROR, FATAL, and PANIC. These often indicate segment failures or other serious issues.

To check the master instance logs for issues:
```
$ gplogfilter -t
```
To analyze a specific log file, pass its path as an argument:
```
$ gplogfilter -t $MASTER_DATA_DIRECTORY/pg_log/gpdb-2025-03-31_064402.csv
```
To save the filtered output in a file for later analysis, use the -o (--out) option:
```
$ gplogfilter -t -o master_issues
```
To check logs of segment instances, use gpssh to execute the same gplogfilter call on all segment hosts. For example, to search logs in the pg_log directories inside segment data directories:
```
$ gpssh -f hostfile_segment_hosts -e " \
      source /usr/local/gpdb/greengage_path.sh && \
      gplogfilter /data1/*/*/pg_log/gpdb*.csv \
      --trouble"
```
where <hostfile_segment_hosts> is the list of cluster segment hosts.

To save segment log output for analysis on the master host, redirect gpssh output to a file:
```
$ gpssh -f hostfile_segment_hosts -e " \
      source /usr/local/gpdb/greengage_path.sh && \
      gplogfilter /data1/*/*/pg_log/gpdb*.csv \
      --trouble" > segment_issues
```

Once a segment failure is detected in the cluster, determine its root cause to define an appropriate recovery strategy.

If the issue is temporary — such as a system fault or brief network outage — failed segments can often be recovered in place (on their original host) after a reboot or restart of the relevant services. This is known as in-place recovery.

However, in more serious cases — such as hardware failure — the original host may be unavailable or unusable for some time. To recover from such failures, you need to relocate the affected segments to other operational hosts, either existing ones in the cluster or newly added machines.

Once you have diagnosed the issue and chosen a recovery scenario, you can use the gprecoverseg utility to restore failed segments as described in this section.

Greengage DB supports three segment recovery types for different use cases:

Incremental recovery. This is the default method for in-place recovery. Greengage DB identifies the differences between the failed segment and its healthy mirror by analyzing their WAL files. Then, the missing transactions are replayed using the pg_rewind utility to bring the segment back in sync. This method is fast and efficient because it avoids full data transfer. If incremental recovery fails, differential or full recovery is required.
Differential recovery. This method uses the rsync utility to compare the file systems of the failed segment and its healthy mirror. Only changed files are copied to the target segment directory. While similar in purpose to full recovery, it is typically faster for in-place scenarios because it transfers less data. This approach is useful when incremental recovery is impossible, but you want to avoid a full copy.
Full recovery. Full recovery creates a new segment data directory from scratch using a complete copy of the mirror’s data. The copy is made using the pg_basebackup utility. When used in-place, full recovery deletes the existing data directory of the failed segment. This method is the only supported option for relocating segments to a different host, either an existing one in the cluster or a new machine.

This is how these types can be used for different recovery scenarios:

In-place recovery: incremental, differential, or full.
Recovery to another host in a cluster: full only.
Recovery to a new host: full only.

You can use the following basic workflow for segment recovery:

Decide whether you want to recover the segment in its original location. If not, or if in-place recovery is impossible, proceed directly to the last step.
After resolving the issue with the failed host, attempt an incremental recovery.
If incremental recovery fails, try a differential in-place recovery.
If both incremental and differential recovery are not possible or fail, perform a full recovery. You can do it either in place or on a different host (within the cluster or an external one).

To recover all failed segments to their original locations, perform an incremental recovery:

Run gprecoverseg:

$ gprecoverseg

Greengage DB plans the recovery and outputs the following details for each segment to recover:

[INFO]:----------------------------------------------------------
[INFO]:-Recovery 1 of 4
[INFO]:----------------------------------------------------------
[INFO]:-   Synchronization mode                 = Incremental
[INFO]:-   Failed instance host                 = sdw2
[INFO]:-   Failed instance address              = sdw2
[INFO]:-   Failed instance directory            = /data1/mirror/gpseg0
[INFO]:-   Failed instance port                 = 11000
[INFO]:-   Recovery Source instance host        = sdw1
[INFO]:-   Recovery Source instance address     = sdw1
[INFO]:-   Recovery Source instance directory   = /data1/primary/gpseg0
[INFO]:-   Recovery Source instance port        = 10000
[INFO]:-   Recovery Target                      = in-place

Enter y and press Enter to confirm segment recovery:

Continue with segment recovery procedure Yy|Nn (default=N):

NOTE

To automatically confirm the recovery, add the -a option:

$ gprecoverseg -a

After a successful recovery, the following lines are shown:

[INFO]:-********************************
[INFO]:-Segments successfully recovered.
[INFO]:-********************************
[INFO]:-Recovered mirror segments need to sync WAL with primary segments.
[INFO]:-Use 'gpstate -e' to check progress of WAL sync remaining bytes

For optimal cluster performance, rebalance the cluster to return segments to their preferred roles.

Greengage DB supports partial in-place recovery, allowing you to recover only a subset of failed segments. This is useful, for example, when multiple segment hosts went down, and only some of them are back online.

To perform partial recovery, you must provide a recovery configuration file. This file lists the locations of failed segments to recover in a single gprecoverseg execution. You can generate this file using the gprecoverseg utility or prepare it manually using the structure described in this section.

To create a template recovery configuration file for partial recovery, run gprecoverseg with the -o option specifying the name of the file to write the template to:

$ gprecoverseg -o recover_conf_file

Recovery configuration file example

The following template file is generated for a cluster with four failed segments on the sdw2 host.

# If any entry is commented, please know that it belongs to failed segment which is unreachable.
# If you need to recover them, please modify the segment entry and add failover details
# (failed_addresss|failed_port|failed_dataDirectory<space>failover_addresss|failover_port|failover_dataDirectory) to recover it to another host.

sdw2|11000|/data1/mirror/gpseg0
sdw2|11001|/data1/mirror/gpseg1
sdw2|10000|/data1/primary/gpseg2
sdw2|10001|/data1/primary/gpseg3

The generated file lists failed segments, each described on a separate line in the following format:

failedAddress|failedPort|failedDataDirectory

failedHostname|failedAddress|failedPort|failedDataDirectory

Comment out with # or delete lines corresponding to segments you do not want to recover. Then, perform the recovery for the remaining segments:

Run gprecoverseg passing the file name as the -i option value:

$ gprecoverseg -i recover_conf_file

Greengage DB plans the recovery and outputs the following details for each segment to recover:

[INFO]:----------------------------------------------------------
[INFO]:-Recovery 1 of 2
[INFO]:----------------------------------------------------------
[INFO]:-   Synchronization mode                 = Incremental
[INFO]:-   Failed instance host                 = sdw2
[INFO]:-   Failed instance address              = sdw2
[INFO]:-   Failed instance directory            = /data1/mirror/gpseg0
[INFO]:-   Failed instance port                 = 11000
[INFO]:-   Recovery Source instance host        = sdw1
[INFO]:-   Recovery Source instance address     = sdw1
[INFO]:-   Recovery Source instance directory   = /data1/primary/gpseg0
[INFO]:-   Recovery Source instance port        = 10000
[INFO]:-   Recovery Target                      = in-place

Enter y and press Enter to confirm segment recovery:

Continue with segment recovery procedure Yy|Nn (default=N):

After a successful recovery, the following lines are shown:

[INFO]:-********************************
[INFO]:-Segments successfully recovered.
[INFO]:-********************************
[INFO]:-Recovered mirror segments need to sync WAL with primary segments.
[INFO]:-Use 'gpstate -e' to check progress of WAL sync remaining bytes

After partial recovery is completed:

Rebalance the cluster to restore preferred roles of recovered segments.
Recover remaining failed segments to return the cluster to a fully operational state.

If the data directories or files of failed segments are corrupted, an incremental recovery may not be possible. In such cases, use full recovery to recreate the segment from scratch.

To perform a full recovery of failed segments, use the -F option of gprecoverseg:

$ gprecoverseg -F

CAUTION

When used in-place, full recovery deletes the existing data directory of the failed segment and replaces it with a fresh copy from its mirror. Any custom files or directories stored in the segment’s original location are lost and not restored.

If the data was only partially lost or corrupted, differential recovery may be a faster alternative, as it transfers only modified files:

$ gprecoverseg -differential

Both full and differential in-place recovery can be performed for specific segments with a recovery configuration file:

$ gprecoverseg -F -i recover_conf_file

If segments can’t be recovered to their original locations, you can temporarily relocate them to other hosts of the cluster. This scenario requires a full recovery, since the segment data does not exist on the new hosts.

To perform recovery to another host, you need a recovery configuration file that defines the new locations for the failed segments. You can generate a recovery configuration file using the gprecoverseg utility or prepare it manually using the structure described in this section.

To generate a template with updated segment locations, run gprecoverseg with two options:

-o — name of the output file.
-p — target host where failed segments should be recovered.

$ gprecoverseg -o recover_out -p sdw3

NOTE

It is generally recommended to recover to one host at a time. However, you can specify multiple hosts as a comma-separated list:

$ gprecoverseg -o recover_out -p sdw3,sdw4

Recovery configuration file example

The following template file is generated by a gprecoverseg call shown above.

# If any entry is commented, please know that it belongs to failed segment which is unreachable.
# If you need to recover them, please modify the segment entry and add failover details
# (failed_addresss|failed_port|failed_dataDirectory<space>failover_addresss|failover_port|failover_dataDirectory) to recover it to another host.

sdw2|10500|/data1/mirror/gpseg0 sdw3|10002|/data1/mirror/gpseg0
sdw2|10501|/data1/mirror/gpseg1 sdw3|10003|/data1/mirror/gpseg1
sdw2|10000|/data1/primary/gpseg2 sdw3|10004|/data1/primary/gpseg2
sdw2|10001|/data1/primary/gpseg3 sdw3|10005|/data1/primary/gpseg3

The generated file lists failed segments and their corresponding new locations on the specified hosts. Each entry follows this structure:

failedAddress|failedPort|failedDataDirectory newAddress|newPort|newDataDirectory

failedHostname|failedAddress|failedPort|failedDataDirectory newHostname|newAddress|newPort|newDataDirectory

NOTE

Note the whitespace character that separates the original segment information from its new location.

To customize recovery targets, edit their corresponding lines in the file.

To perform a partial recovery, comment out with # or delete unneeded lines.

IMPORTANT

For fault tolerance, ensure that no primary–mirror segment pair is placed on the same host.

Perform the recovery:

Run gprecoverseg with the -i option:

$ gprecoverseg -i recover_conf_file

Greengage DB plans the recovery and outputs the following details for each segment to recover:

[INFO]:----------------------------------------------------------
[INFO]:-Recovery 1 of 4
[INFO]:----------------------------------------------------------
[INFO]:-   Synchronization mode                 = Full
[INFO]:-   Failed instance host                 = sdw2
[INFO]:-   Failed instance address              = sdw2
[INFO]:-   Failed instance directory            = /data1/mirror/gpseg0
[INFO]:-   Failed instance port                 = 11000
[INFO]:-   Recovery Source instance host        = sdw1
[INFO]:-   Recovery Source instance address     = sdw1
[INFO]:-   Recovery Source instance directory   = /data1/primary/gpseg0
[INFO]:-   Recovery Source instance port        = 10000
[INFO]:-   Recovery Target instance host        = sdw3
[INFO]:-   Recovery Target instance address     = sdw3
[INFO]:-   Recovery Target instance directory   = /data1/mirror/gpseg0
[INFO]:-   Recovery Target instance port        = 10000

Enter y and press Enter to confirm segment recovery:

Continue with segment recovery procedure Yy|Nn (default=N):

After a successful recovery, the following lines are shown:

[INFO]:-********************************
[INFO]:-Segments successfully recovered.
[INFO]:-********************************
[INFO]:-Recovered mirror segments need to sync WAL with primary segments.
[INFO]:-Use 'gpstate -e' to check progress of WAL sync remaining bytes

After recovery is completed:

Rebalance the cluster to return recovered segments to their preferred roles.
Recover any remaining failed segments to fully restore the cluster state.

Recovery to a new host

A good practice is to have spare hosts to use in case of a segment host failure. Recovery to a new host outside the cluster follows the same process as recovery to another existing segment host. You can write the name (or names) of a new host into a recovery configuration file manually or generate such a file using the -p option:

$ gprecoverseg -o recover_spare -p sdw-spare-1

Before recovering segments on external hosts, prepare these hosts to run Greengage DB as described in the Configure new hosts topic.

After a segment failure, the cluster goes into an unbalanced state. Mirrors of the failed segments are promoted to primary and begin handling user queries. This increases the workload on their hosts.

When the failed segments are recovered, Greengage DB restores them as mirrors, while their promoted counterparts continue serving as primaries. As a result, the cluster remains operational but is no longer aligned with its original configuration.

You can check for this unbalanced state using gpstate:

gpstate -m: check segments with the Acting as primary status.

[INFO]:--------------------------------------------------------------
[INFO]:--Current GPDB mirror list and status
[INFO]:--Type = Group
[INFO]:--------------------------------------------------------------
[INFO]:-   Mirror   Datadir                Port    Status              Data Status
[INFO]:-   sdw2     /data1/mirror/gpseg0   10000   Passive             Synchronized
[INFO]:-   sdw2     /data1/mirror/gpseg1   10001   Passive             Synchronized
[INFO]:-   sdw1     /data1/mirror/gpseg2   11000   Acting as Primary   Synchronized
[INFO]:-   sdw1     /data1/mirror/gpseg3   11001   Acting as Primary   Synchronized
[INFO]:--------------------------------------------------------------

gpstate -e: check the Segments with Primary and Mirror Roles Switched section.

[INFO]:-----------------------------------------------------
[INFO]:-Segments with Primary and Mirror Roles Switched
[INFO]:-   Current Primary    Port    Mirror    Port
[INFO]:-   sdw1               11000   sdw2      10000
[INFO]:-   sdw1               11001   sdw2      10001

To return all segments to their configured (preferred) roles:

Run gprecoverseg with the -r option:

$ gprecoverseg -r

Greengage DB outputs the information about each segment to return to the original role:

[INFO]:----------------------------------------------------------
[INFO]:-Unbalanced segment 1 of 4
[INFO]:----------------------------------------------------------
[INFO]:-   Unbalanced instance host        = sdw1
[INFO]:-   Unbalanced instance address     = sdw1
[INFO]:-   Unbalanced instance directory   = /data1/mirror/gpseg2
[INFO]:-   Unbalanced instance port        = 11000
[INFO]:-   Balanced role                   = Mirror
[INFO]:-   Current role                    = Primary

Enter y and press Enter to confirm rebalancing:

Continue with segment rebalance procedure Yy|Nn (default=N):

After a successful completion, the following line is shown:

[INFO]:-The rebalance operation has completed successfully.

To verify that all segments have returned to their original roles, run gpstate -e again:

$ gpstate -e

The output should end with the following line:

[INFO]:-All segments are running normally

Enable mirroring

Recover a failed master

Check and recover segments

Check for failed segments

gpstate utility

gp_segment_configuration and gp_configuration_history tables

Log files

Recover failed segments

Recovery types

General workflow

In-place recovery

Partial in-place recovery

Full and differential in-place recovery

Recovery to another host

Rebalance a cluster