Expand a cluster
This topic explains how to expand a Greengage DB (based on Greenplum) cluster by adding new segments and hosts.
As data warehouses grow over time, they may need more storage space with time. In other cases, increased computational power is needed to process analytical queries efficiently. Predicting the exact storage and processing capacity a cluster will require over time can be challenging, especially when dealing with big data.
Greengage DBMS addresses these challenges by offering linear scalability. You can expand an existing cluster by adding new segment hosts, thereby increasing both storage capacity and processing power. Once expanded, the cluster functions as if it was initially created with the new configuration. This enables seamless future expansions as data volume continues to grow.
New hosts are added with minimal downtime during new segment initialization. After that, the cluster functions normally. However, performance may temporarily decrease until all tables are redistributed across the updated topology.
Check requirements
Before expanding a Greengage DB cluster, prepare new segment hosts and ensure that they meet the necessary requirements. These hosts can be either physical servers or cloud instances, depending on your deployment environment.
The number of new hosts must be enough to apply the mirroring policy used in the cluster — grouped
or spread
.
For grouped
mirroring, at least two additional hosts are required, as each host stores mirrors of another’s primary segments.
Configure new hosts
Before adding new hosts to the cluster, configure their operating system and install Greengage DBMS on them as described in this section.
Install Greengage DB
Prepare new hosts to run Greengage DB:
-
Install all the required software dependencies listed in this topic: Software requirements for Greengage DB installation.
-
Install Greengage DB on all the hosts. For example, to install Greengage DB built from the source code, follow the steps from the topic: Build Greengage DB from the source code. This topic describes how to:
-
configure an operating system;
-
create the Greengage DB administrative user;
-
build Greengage DB from the source code;
-
install Greengage DB.
NOTETo initialize DBMS, you do not need to create a demo cluster.
-
If your cluster uses extensions, such as PXF, MADlib, or PostGIS, you also need to install their dependencies on new hosts.
Configure host name resolution
Ensure that name resolution is consistent across all hosts, existing and new. If you use manual host resolution in the /etc/hosts file, update it on each host by adding new hosts' names.
For example, if you’re adding hosts sdw3
and sdw4
to a cluster of four hosts mdw
, smdw
, sdw1
, and sdw2
, the file should look as follows on all hosts:
# ... 192.168.1.10 mdw 192.168.1.20 smdw 192.168.1.30 sdw1 192.168.1.40 sdw2 192.168.1.50 sdw3 192.168.1.60 sdw4
Create the host file with new hosts
-
Log in to the master host as
gpadmin
and go to the home directory. -
Create the hostfile_new_hosts file:
$ vi hostfile_new_hosts
-
Add the host names of new segment hosts to this file:
sdw3 sdw4
-
Save and close the file.
Enable passwordless SSH to new hosts
Run the following commands on the master host:
-
Enable 1-n passwordless SSH from master to new hosts. Use
ssh-copy-id
to copy thegpadmin
user’s public key to the authorized_keys file on new hosts:$ ssh-copy-id sdw3 $ ssh-copy-id sdw4
-
Enable n-n passwordless SSH between all hosts of a future cluster using the
gpssh-exkeys
utility:$ gpssh-exkeys -e <hostfile_all_hosts> -x hostfile_new_hosts
where
<hostfile_all_hosts>
is a list of the original cluster hosts that was used for its initialization.
Create new data directories
Create data directories on new hosts using the gpssh
utility:
$ gpssh -f hostfile_new_hosts -e 'sudo mkdir -p /data1/primary'
$ gpssh -f hostfile_new_hosts -e 'sudo mkdir -p /data1/mirror'
$ gpssh -f hostfile_new_hosts -e 'sudo chown -R gpadmin /data1/*'
The data directory locations must match those on existing segment hosts. You can find the correct directory paths in the database configuration file used during the original cluster initialization.
Run performance tests
After configuring the new hosts, it is recommended to test their disk I/O performance using the gpcheckperf
utility:
$ gpcheckperf -f hostfile_new_hosts -d /data1/primary -d /data1/mirror
This test transfers large files between hosts and writes them to disk on the new hosts.
After completion, gpcheckperf
outputs the test results.
If it is possible in your environment, shut down the cluster and test the performance of the future expanded cluster:
$ gpstop -a
$ gpcheckperf -f <hostfile_all_with_new> -d /data1/primary -d /data1/mirror
where <hostfile_all_with_new>
lists all hosts of the future cluster: existing and new.
Initialize new segments
It is recommended that you create a backup of the original cluster using gpbackup
before starting the expansion.
New segment configuration file
To define the exact parameters of cluster expansion, a file with new segments configuration is needed.
This file specifies the number of new segments, their hostnames, storage locations, and roles.
You can generate this file interactively using the gpexpand
utility or prepare it manually using the structure described in this section.
To generate a segment configuration file interactively:
-
Run
gpexpand
specifying the file with new hosts as the-f
option value:$ gpexpand -f hostfile_new_hosts
NOTEYou can also run
gpexpand
without arguments. In this case, it invites you to enter new host names interactively. -
Enter
y
to confirm cluster expansion and pressEnter
:Would you like to initiate a new System Expansion Yy|Nn (default=N):
-
(Optional) If no host file was specified in the
gpexpand
call, enter hostnames of new hosts in a comma-separated list:Enter a comma separated list of new hosts you want to add to your array. Do not include interface hostnames. **Enter a blank line to only add segments to existing hosts**[]:
-
Define the mirroring strategy to use on the expanded cluster:
grouped
(default) orspread
.What type of mirroring strategy would you like? spread|grouped (default=grouped):
-
Enter the number of primary segments to add per existing host:
How many new primary segments per host do you want to add? (default=0):
To leave existing segment hosts unchanged and create the same number of new segments on new hosts, enter
0
or pressEnter
.IMPORTANTThis number defines how many primary segments are created in addition to those currently existing on segment hosts. New hosts receive the same number of primary segments that existing hosts will have after expansion.
For example, if the cluster has two segment hosts
sdw1
andsdw2
that store two primary segments each, entering1
leads to the following segment configuration:-
Existing hosts
sdw1
andsdw2
store three primary segments each: two old segments and one new. -
All new hosts get three new primary segments each.
-
The cluster will have a total of 12 primary segments.
-
-
(Optional) If the expansion includes creation of new segments on existing hosts, specify locations to store their data: primary and mirror.
The utility outputs the name of the new segment configuration file in a message like this:
Input configuration file was written to 'gpexpand_inputfile_20250312_083443'.
The following file expands the cluster by creating two new primary segments on each new host (sdw3
and sdw4
).
The hosts mirror each other’s primary segments.
sdw3|sdw3|10000|/data1/primary/gpseg4|11|4|p sdw4|sdw4|10500|/data1/mirror/gpseg4|17|4|m sdw3|sdw3|10001|/data1/primary/gpseg5|12|5|p sdw4|sdw4|10501|/data1/mirror/gpseg5|18|5|m sdw4|sdw4|10000|/data1/primary/gpseg6|13|6|p sdw3|sdw3|10500|/data1/mirror/gpseg6|15|6|m sdw4|sdw4|10001|/data1/primary/gpseg7|14|7|p sdw3|sdw3|10501|/data1/mirror/gpseg7|16|7|m
The segment configuration file consists of lines of the following structure:
hostname|address|port|datadir|dbid|content|preferred_role
Field | Description |
---|---|
hostname |
A hostname of a segment host |
address |
The hostname used to access a particular segment instance on a segment host. This value may be the same as hostname on systems that do not have per-interface hostnames configured |
port |
The TCP port the database segment is using |
datadir |
A segment instance data directory |
dbid |
A unique identifier of a segment instance |
content |
A content identifier for a segment instance |
preferred_role |
The role that a segment is assigned at initialization time. Possible values are |
Fields of this file match the columns of the gp_segment_configuration
system catalog table that can be defined at initialization time.
Create new segments
Running DDL operations during segment initialization can cause cluster inconsistency. Make sure that no DDL operations are performed until initialization is completed.
To create new segments based on the configuration file, run gpexand -i
:
$ gpexpand -i <gpexpand_inputfile>
where <gpexpand_inputfile>
is the segment configuration file created in the previous step.
After successful segment creation, the utility prints the following lines:
[INFO]:-************************************************ [INFO]:-Initialization of the system expansion complete. [INFO]:-To begin table expansion onto the new segments [INFO]:-rerun gpexpand [INFO]:-************************************************ [INFO]:-Exiting...
As a result of the command execution:
-
The cluster enters the expansion state. From now on, you can monitor the expansion process using
gpstate -x
:$ gpstate -x
The output includes the information about expansion state:
[INFO]:-Cluster Expansion State = Data Distribution - Paused [INFO]:----------------------------------------------------- [INFO]:-Number of tables to be redistributed [INFO]:- Database Count of Tables to redistribute [INFO]:- postgres 4 [INFO]:-----------------------------------------------------
-
A
gpexpand
schema is created in thepostgres
database on the master segment. This schema tracks expansion progress, allowing users to monitor and manage the process.gpexpand schema tables Table Description expansion_progress
General information about data redistribution during the expansion
status
Completed expansion steps with timestamps
status_detail
Detailed redistribution status for each table
-
New segments are created on both new and existing hosts based on the provided configuration. The updated segment configuration can be viewed in the
gp_segment_configuration
system catalog table. Newly created segments are empty, while all data remains in existing segments. To balance data distribution across the expanded cluster, redistribute table data across all segments of the expanded cluster.
Redistribute tables
After adding new segments, all data remains on the existing segments.
This leads to an unbalanced cluster state where queries are processed mostly by the old segments.
Distribution policy of all distributed tables changes to DISTRIBUTED RANDOMLY
, potentially decreasing their performance.
To balance the cluster, redistribute table data in accordance with the new topology.
Once a table redistribution is completed, it returns to its original distribution policy and optimal query performance.
Greengage DB provides a built-in redistribution mechanism that ensures optimal data distribution with minimal service disruption.
Redistribution can only be performed in production mode.
On large clusters, this process may take significant time and resource usage.
To reduce performance impact, Greengage DB allows you to pause and resume redistribution, splitting this long process into sessions of limited time.
Redistribution progress is saved in the gpexpand
schema tables.
Schedule redistribution sessions during off-peak hours to minimize performance degradation.
New tables created at this phase are automatically distributed across all segments and do not require redistribution. They are available for use since their creation. However, existing tables are temporarily locked one by one for read and write operations while they are being redistributed.
Rank tables for redistribution
To optimize query performance during redistribution, prioritize large and frequently queried tables. This ensures that these tables become available earlier for normal operations.
The redistribution order is determined by the rank
column in the gpexpand.status_detail
system catalog table.
Lower values mean higher priority in redistribution.
To define table redistribution order:
-
Connect to the
postgres
database on the master instance:$ psql postgres
-
Set a big number as a default
rank
value for all tables, for example,100
:UPDATE gpexpand.status_detail SET rank = 100;
-
Decrease the rank for tables that should be redistributed first:
UPDATE gpexpand.status_detail SET rank = 10 WHERE fq_name = 'public.table3'; UPDATE gpexpand.status_detail SET rank = 20 WHERE fq_name = 'public.table2'; UPDATE gpexpand.status_detail SET rank = 30 WHERE fq_name = 'public.table4';
Specify the database name if schema and table names repeat in multiple databases:
UPDATE gpexpand.status_detail SET rank = 50 WHERE fq_name = 'public.table4' AND dbname = 'test_db';
TIPUsing round numbers (
10
,20
,30
) improves flexibility. In this case, you can change the redistribution order by updating a single row later:UPDATE gpexpand.status_detail SET rank = 15 WHERE fq_name = 'public.table4';
To exclude a table from the redistribution, remove it from the gpexpand.status_detail
table:
DELETE FROM gpexpand.status_detail WHERE fq_name = 'public.table5';
Run redistribution sessions
To run a redistribution session, run gpexpand
with one of two options:
-
-d <HH:MM:SS>
— run a session of a specific duration. -
-e <YYYY-MM-DD HH:MM:SS>
— run redistribution until a specified point in time.
Examples:
-
Run a five-minute redistribution session:
$ gpexpand -d 00:05:00
-
Run redistribution until 9 AM on March 13, 2025:
$ gpexpand -e 2025-03-13 09:00:00
Greengage DB can redistribute multiple tables in parallel.
To run a parallel redistribution session, specify the number of tables to process simultaneously as the -n
option value:
$ gpexpand -n 16
During redistribution, two database connections are created for each table: one alters the table, and another updates the progress in the gpexpand
schema.
When setting the number of parallel table redistributions, ensure that the cluster’s maximum connections limit (the max_connections
configuration parameter) is not exceeded.
Increase the limit if necessary.
Monitor expansion state
This section describes how to track the progress of cluster expansion.
gpstate utility
When called with the -x
option, gpstate
shows the number of tables that still need redistribution:
$ gpstate -x
Its output includes lines like these:
[INFO]:-Cluster Expansion State = Data Distribution - Paused [INFO]:----------------------------------------------------- [INFO]:-Number of tables to be redistributed [INFO]:- Database Count of Tables to redistribute [INFO]:- postgres 3 [INFO]:-----------------------------------------------------
gpexpand.expansion_progress table
The gpexpand.expansion_progress
table provides current expansion statistics, including the number of completed and pending tables, data transferred, and estimated completion time.
SELECT * FROM gpexpand.expansion_progress;
Query result:
name | value ------------------------------+----------------------- Tables Expanded | 1 Estimated Expansion Rate | 27.0714856745503 MB/s Bytes Left | 211241312 Estimated Time to Completion | 00:00:07.441609 Tables Left | 3 Bytes Done | 264568832 (6 rows)
gpexpand.status_detail table
The gpexpand.status_detail
table stores detailed redistribution progress for each table.
SELECT dbname, fq_name,rank, status, expansion_started, expansion_finished
FROM gpexpand.status_detail;
Query result:
dbname | fq_name | rank | status | expansion_started | expansion_finished ----------+---------------+------+-------------+----------------------------+---------------------------- postgres | public.table1 | 100 | NOT STARTED | | postgres | public.table4 | 30 | NOT STARTED | | postgres | public.table2 | 20 | NOT STARTED | | postgres | public.table3 | 10 | COMPLETED | 2025-03-13 10:15:41.775582 | 2025-03-13 10:15:50.095813 (4 rows)
Post-expansion actions
When redistribution of all tables is completed, the gpexpand
output includes the following line:
[INFO]:-EXPANSION COMPLETED SUCCESSFULLY
Additionally, the gpexpand.status
table contains a row named EXPANSION COMPLETE
, which indicates completion:
SELECT * FROM gpexpand.status WHERE status = 'EXPANSION COMPLETE';
Query result:
status | updated --------------------+---------------------------- EXPANSION COMPLETE | 2025-03-13 10:29:18.385153 (1 row)
This section describes steps that you may need to perform after the expansion is completed.
Remove the expansion schema
The gpexpand
schema is no longer needed after redistribution is complete.
Remove it with the following call on the master host:
$ gpexpand -c
A new expansion cannot be started while the gpexpand
schema exists in the cluster.
Restore a backup of the original cluster
If you need to restore data from a backup created before the expansion, use gprestore
with the --resize-cluster
option.
$ gprestore --resize-cluster --backup-dir /data1/backup --timestamp 20250302111439
Set up PXF
If your cluster uses PXF, configure it on all hosts of the expanded cluster:
-
Install PXF on new hosts.
-
Synchronize PXF configurations across all cluster nodes:
$ pxf cluster register $ pxf cluster sync
-
Restart PXF:
$ pxf cluster restart