Recover a failed master
This section explains how to recover a Greengage DB (based on Greenplum) cluster from a primary master failure.
The primary (active) master instance in a Greengage DB cluster acts as the single access point for client connections. Therefore, its failure results in service interruption. Master mirroring enables Greengage DB to quickly recover from a master failure, allowing a master mirror — standby master — to take over.
In normal operation, the standby master does not accept requests or perform query processing. Instead, it continuously receives and applies changes from the active master by streaming write-ahead log (WAL) records. It maintains a synchronized copy of the system catalog and other metadata. A standby master failure does not interrupt cluster operations. The active master continues working and logs changes that occur while the standby is down. Once the standby is restored, it automatically synchronizes with the current state in the background.
When the primary master fails, the cluster stops serving client queries and appears down, even though the segment instances may continue running on their respective hosts. To restore cluster availability, you must activate the standby master. Upon activation, the cluster resumes operations from the state of the last successfully committed transaction before the failure.
Activate standby master
After a primary master failure, the cluster becomes unavailable. If it’s impossible to bring the failed master back online, you need to activate the standby master to resume operations.
To activate the standby master, use the gpactivatestandby
utility:
-
Log in to the standby master host as
gpadmin
. -
Run
gpactivatestandby
passing the path to the standby master data directory in the-d
option:$ gpactivatestandby -d /data1/master/gpseg-1
NOTEgpactivatestandby
requires thePGPORT
environment variable to be set.$ export PGPORT=5432
Greengage DB prepares the activation procedure and outputs the standby master details:
[INFO]:------------------------------------------------------ [INFO]:-Standby data directory = /data1/master/gpseg-1 [INFO]:-Standby port = 5432 [INFO]:-Standby running = yes [INFO]:-Force standby activation = no [INFO]:------------------------------------------------------
-
Enter
y
and pressEnter
to confirm the standby master activation:Do you want to continue with standby master activation? Yy|Nn (default=N):
NOTETo automatically confirm the standby master activation, use the
-a
option:$ gpactivatestandby -d /data1/master/gpseg-1 -a
After successful activation, the following lines are shown:
[INFO]:-The activation of the standby master has completed successfully. [INFO]:-smdw is now the new primary master.
The cluster is now operational with a new active master — former standby. To resume interaction with the cluster, the clients should be reconfigured to connect to the new master. Internal Greengage DB communication is automatically updated to use the new master.
The cluster now has an active master and no standby master:
$ gpstate -f
The output shows the absence of the standby master:
[INFO]:-Standby master instance not configured
To return the cluster to a fault-tolerant state, use one of the following ways:
-
Set up a new standby master as described in Enable cluster mirroring.
-
Restore the original active-standby master pair as described below.
Do not restart the original master instance after standby master activation. This can lead to data corruption and cluster inconsistency.
Restore original master-standby configuration
If you’ve resolved the issue that caused the original master to fail, you can revert to the original active–standby master configuration after failover.
To restore primary and standby master to their original hosts:
-
Initialize a standby master on the original primary master host
mdw
. -
Activate it, making it the primary master again.
-
Reinitialize a standby master on its original host
smdw
.
Below are the detailed descriptions of these steps.
Create standby master on original master host
To initialize a standby master on the original master host mdw
:
-
On
mdw
, rename or move the existing master data directory to save it as a backup:$ mv /data1/master/gpseg-1 /data1/master/backup_gpseg-1
-
On
smdw
, initialize a new standby master specifyingmdw
as the target host:$ gpinitstandby -s mdw
The output should end with the following line:
[INFO]:-Successfully created standby master on mdw
-
Check the standby master by running
gpstate -f
onsmdw
:$ gpstate -f
The output shows the standby master details and state:
[INFO]:-Standby master details [INFO]:----------------------- [INFO]:- Standby address = mdw [INFO]:- Standby data directory = /data1/master/gpseg-1 [INFO]:- Standby port = 5432 [INFO]:- Standby PID = 2063 [INFO]:- Standby status = Standby host passive [INFO]:-------------------------------------------------------------- [INFO]:--pg_stat_replication [INFO]:-------------------------------------------------------------- [INFO]:--WAL Sender State: streaming [INFO]:--Sync state: sync
Activate original master
Activating a standby master while the cluster is running requires shutting the cluster down first.
To return the original master to its primary role:
-
Stop the cluster by running
gpstop
onsmdw
:$ gpstop
-
On
mdw
, activate the newly initialized standby with the-f
option:$ gpactivatestandby -d $MASTER_DATA_DIRECTORY -f
IMPORTANTThe
-f
option forces activation if the standby master is not running. Use this option only when you are sure that its state is consistent with the primary master.The output informs that the primary master now runs on the original master host:
[INFO]:-The activation of the standby master has completed successfully. [INFO]:-mdw is now the new primary master.
-
Ensure that master mirroring is not enabled by calling
gpstate
on the new primary master hostmdw
:$ gpstate -f
The output includes the line:
[INFO]:-Standby master instance not configured
Initialize standby master in original location
To fully restore the original fault-tolerant topology, recreate the standby master on its original host smdw
:
-
On
smdw
, rename or move the existing master data directory to save it as a backup:$ mv /data1/master/gpseg-1 /data1/master/backup_gpseg-1
-
On
mdw
, add a standby master specifyingsmdw
as the target host:$ gpinitstandby -s smdw
The output shows the result:
[INFO]:-Successfully created standby master on smdw
-
Check the master mirroring state with a
gpstate
call onmdw
:$ gpstate -f
Primary and standby masters are running on their hosts and synced:
[INFO]:-Standby master details [INFO]:----------------------- [INFO]:- Standby address = smdw [INFO]:- Standby data directory = /data1/master/gpseg-1 [INFO]:- Standby port = 5432 [INFO]:- Standby PID = 1462 [INFO]:- Standby status = Standby host passive [INFO]:-------------------------------------------------------------- [INFO]:--pg_stat_replication [INFO]:-------------------------------------------------------------- [INFO]:--WAL Sender State: streaming [INFO]:--Sync state: sync