GitHub

Expand a cluster

Pavel Semyonov

This topic explains how to expand a Greengage DB (based on Greenplum) cluster by adding new segments and hosts.

As data warehouses grow over time, they may need more storage space with time. In other cases, increased computational power is needed to process analytical queries efficiently. Predicting the exact storage and processing capacity a cluster will require over time can be challenging, especially when dealing with big data.

Greengage DBMS addresses these challenges by offering linear scalability. You can expand an existing cluster by adding new segment hosts, thereby increasing both storage capacity and processing power. Once expanded, the cluster functions as if it was initially created with the new configuration. This enables seamless future expansions as data volume continues to grow.

New hosts are added with minimal downtime during new segment initialization. After that, the cluster functions normally. However, performance may temporarily decrease until all tables are redistributed across the updated topology.

Check requirements

Before expanding a Greengage DB cluster, prepare new segment hosts and ensure that they meet the necessary requirements. These hosts can be either physical servers or cloud instances, depending on your deployment environment.

The number of new hosts must be enough to apply the mirroring policy used in the cluster — grouped or spread. For grouped mirroring, at least two additional hosts are required, as each host stores mirrors of another’s primary segments.

Hardware, software, and network configurations of the new hosts must match those of the existing segment hosts. Learn about Greengage DB software and network requirements.

Configure new hosts

Before adding new hosts to the cluster, configure their operating system and install Greengage DBMS on them as described in this section.

Install Greengage DB

Prepare new hosts to run Greengage DB:

  1. Install all the required software dependencies listed in this topic: Software requirements for Greengage DB installation.

  2. Install Greengage DB on all the hosts. For example, to install Greengage DB built from the source code, follow the steps from the topic: Build Greengage DB from the source code. This topic describes how to:

    • configure an operating system;

    • create the Greengage DB administrative user;

    • build Greengage DB from the source code;

    • install Greengage DB.

      NOTE

      To initialize DBMS, you do not need to create a demo cluster.

If your cluster uses extensions, such as PXF, MADlib, or PostGIS, you also need to install their dependencies on new hosts.

Configure host name resolution

Ensure that name resolution is consistent across all hosts, existing and new. If you use manual host resolution in the /etc/hosts file, update it on each host by adding new hosts' names.

For example, if you’re adding hosts sdw3 and sdw4 to a cluster of four hosts mdw, smdw, sdw1, and sdw2, the file should look as follows on all hosts:

# ...
192.168.1.10 mdw
192.168.1.20 smdw
192.168.1.30 sdw1
192.168.1.40 sdw2
192.168.1.50 sdw3
192.168.1.60 sdw4

Create the host file with new hosts

  1. Log in to the master host as gpadmin and go to the home directory.

  2. Create the hostfile_new_hosts file:

    $ vi hostfile_new_hosts
  3. Add the host names of new segment hosts to this file:

    sdw3
    sdw4
  4. Save and close the file.

Enable passwordless SSH to new hosts

Run the following commands on the master host:

  1. Enable 1-n passwordless SSH from master to new hosts. Use ssh-copy-id to copy the gpadmin user’s public key to the authorized_keys file on new hosts:

    $ ssh-copy-id sdw3
    $ ssh-copy-id sdw4
  2. Enable n-n passwordless SSH between all hosts of a future cluster using the gpssh-exkeys utility:

    $ gpssh-exkeys -e <hostfile_all_hosts> -x hostfile_new_hosts

    where <hostfile_all_hosts> is a list of the original cluster hosts that was used for its initialization.

Create new data directories

Create data directories on new hosts using the gpssh utility:

$ gpssh -f hostfile_new_hosts -e 'sudo mkdir -p /data1/primary'
$ gpssh -f hostfile_new_hosts -e 'sudo mkdir -p /data1/mirror'
$ gpssh -f hostfile_new_hosts -e 'sudo chown -R gpadmin /data1/*'
IMPORTANT

The data directory locations must match those on existing segment hosts. You can find the correct directory paths in the database configuration file used during the original cluster initialization.

Run performance tests

After configuring the new hosts, it is recommended to test their disk I/O performance using the gpcheckperf utility:

$ gpcheckperf -f hostfile_new_hosts -d /data1/primary -d /data1/mirror

This test transfers large files between hosts and writes them to disk on the new hosts. After completion, gpcheckperf outputs the test results.

If it is possible in your environment, shut down the cluster and test the performance of the future expanded cluster:

$ gpstop -a
$ gpcheckperf -f <hostfile_all_with_new> -d /data1/primary -d /data1/mirror

where <hostfile_all_with_new> lists all hosts of the future cluster: existing and new.

Initialize new segments

NOTE

It is recommended that you create a backup of the original cluster using gpbackup before starting the expansion.

New segment configuration file

To define the exact parameters of cluster expansion, a file with new segments configuration is needed. This file specifies the number of new segments, their hostnames, storage locations, and roles. You can generate this file interactively using the gpexpand utility or prepare it manually using the structure described in this section.

To generate a segment configuration file interactively:

  1. Run gpexpand specifying the file with new hosts as the -f option value:

    $ gpexpand -f hostfile_new_hosts
    NOTE

    You can also run gpexpand without arguments. In this case, it invites you to enter new host names interactively.

  2. Enter y to confirm cluster expansion and press Enter:

    Would you like to initiate a new System Expansion Yy|Nn (default=N):
  3. (Optional) If no host file was specified in the gpexpand call, enter hostnames of new hosts in a comma-separated list:

    Enter a comma separated list of new hosts you want
    to add to your array.  Do not include interface hostnames.
    **Enter a blank line to only add segments to existing hosts**[]:
  4. Define the mirroring strategy to use on the expanded cluster: grouped (default) or spread.

    What type of mirroring strategy would you like?
    spread|grouped (default=grouped):
  5. Enter the number of primary segments to add per existing host:

    How many new primary segments per host do you want to add? (default=0):

    To leave existing segment hosts unchanged and create the same number of new segments on new hosts, enter 0 or press Enter.

    IMPORTANT

    This number defines how many primary segments are created in addition to those currently existing on segment hosts. New hosts receive the same number of primary segments that existing hosts will have after expansion.

    For example, if the cluster has two segment hosts sdw1 and sdw2 that store two primary segments each, entering 1 leads to the following segment configuration:

    • Existing hosts sdw1 and sdw2 store three primary segments each: two old segments and one new.

    • All new hosts get three new primary segments each.

    • The cluster will have a total of 12 primary segments.

  6. (Optional) If the expansion includes creation of new segments on existing hosts, specify locations to store their data: primary and mirror.

The utility outputs the name of the new segment configuration file in a message like this:

Input configuration file was written to 'gpexpand_inputfile_20250312_083443'.

The following file expands the cluster by creating two new primary segments on each new host (sdw3 and sdw4). The hosts mirror each other’s primary segments.

sdw3|sdw3|10000|/data1/primary/gpseg4|11|4|p
sdw4|sdw4|10500|/data1/mirror/gpseg4|17|4|m
sdw3|sdw3|10001|/data1/primary/gpseg5|12|5|p
sdw4|sdw4|10501|/data1/mirror/gpseg5|18|5|m
sdw4|sdw4|10000|/data1/primary/gpseg6|13|6|p
sdw3|sdw3|10500|/data1/mirror/gpseg6|15|6|m
sdw4|sdw4|10001|/data1/primary/gpseg7|14|7|p
sdw3|sdw3|10501|/data1/mirror/gpseg7|16|7|m

The segment configuration file consists of lines of the following structure:

hostname|address|port|datadir|dbid|content|preferred_role
Segment configuration file structure
Field Description

hostname

A hostname of a segment host

address

The hostname used to access a particular segment instance on a segment host. This value may be the same as hostname on systems that do not have per-interface hostnames configured

port

The TCP port the database segment is using

datadir

A segment instance data directory

dbid

A unique identifier of a segment instance

content

A content identifier for a segment instance

preferred_role

The role that a segment is assigned at initialization time. Possible values are p (primary) or m (mirror)

NOTE

Fields of this file match the columns of the gp_segment_configuration system catalog table that can be defined at initialization time.

Create new segments

CAUTION

Running DDL operations during segment initialization can cause cluster inconsistency. Make sure that no DDL operations are performed until initialization is completed.

To create new segments based on the configuration file, run gpexand -i:

$ gpexpand -i <gpexpand_inputfile>

where <gpexpand_inputfile> is the segment configuration file created in the previous step.

After successful segment creation, the utility prints the following lines:

[INFO]:-************************************************
[INFO]:-Initialization of the system expansion complete.
[INFO]:-To begin table expansion onto the new segments
[INFO]:-rerun gpexpand
[INFO]:-************************************************
[INFO]:-Exiting...

As a result of the command execution:

  • The cluster enters the expansion state. From now on, you can monitor the expansion process using gpstate -x:

    $ gpstate -x

    The output includes the information about expansion state:

    [INFO]:-Cluster Expansion State = Data Distribution - Paused
    [INFO]:-----------------------------------------------------
    [INFO]:-Number of tables to be redistributed
    [INFO]:-     Database   Count of Tables to redistribute
    [INFO]:-     postgres   4
    [INFO]:-----------------------------------------------------
  • A gpexpand schema is created in the postgres database on the master segment. This schema tracks expansion progress, allowing users to monitor and manage the process.

    gpexpand schema tables
    Table Description

    expansion_progress

    General information about data redistribution during the expansion

    status

    Completed expansion steps with timestamps

    status_detail

    Detailed redistribution status for each table

  • New segments are created on both new and existing hosts based on the provided configuration. The updated segment configuration can be viewed in the gp_segment_configuration system catalog table. Newly created segments are empty, while all data remains in existing segments. To balance data distribution across the expanded cluster, redistribute table data across all segments of the expanded cluster.

Rollback

If segment creation fails, restore the original cluster configuration using gpexpand with the -r/--rollback option:

$ gpexpand -r

Redistribute tables

After adding new segments, all data remains on the existing segments. This leads to an unbalanced cluster state where queries are processed mostly by the old segments. Distribution policy of all distributed tables changes to DISTRIBUTED RANDOMLY, potentially decreasing their performance. To balance the cluster, redistribute table data in accordance with the new topology. Once a table redistribution is completed, it returns to its original distribution policy and optimal query performance.

Greengage DB provides a built-in redistribution mechanism that ensures optimal data distribution with minimal service disruption. Redistribution can only be performed in production mode. On large clusters, this process may take significant time and resource usage. To reduce performance impact, Greengage DB allows you to pause and resume redistribution, splitting this long process into sessions of limited time. Redistribution progress is saved in the gpexpand schema tables.

TIP

Schedule redistribution sessions during off-peak hours to minimize performance degradation.

New tables created at this phase are automatically distributed across all segments and do not require redistribution. They are available for use since their creation. However, existing tables are temporarily locked one by one for read and write operations while they are being redistributed.

Rank tables for redistribution

To optimize query performance during redistribution, prioritize large and frequently queried tables. This ensures that these tables become available earlier for normal operations.

The redistribution order is determined by the rank column in the gpexpand.status_detail system catalog table. Lower values mean higher priority in redistribution. To define table redistribution order:

  1. Connect to the postgres database on the master instance:

    $ psql postgres
  2. Set a big number as a default rank value for all tables, for example, 100:

    UPDATE gpexpand.status_detail SET rank = 100;
  3. Decrease the rank for tables that should be redistributed first:

    UPDATE gpexpand.status_detail SET rank = 10 WHERE fq_name = 'public.table3';
    UPDATE gpexpand.status_detail SET rank = 20 WHERE fq_name = 'public.table2';
    UPDATE gpexpand.status_detail SET rank = 30 WHERE fq_name = 'public.table4';

    Specify the database name if schema and table names repeat in multiple databases:

    UPDATE gpexpand.status_detail SET rank = 50
                                  WHERE fq_name = 'public.table4' AND dbname = 'test_db';
    TIP

    Using round numbers (10, 20, 30) improves flexibility. In this case, you can change the redistribution order by updating a single row later:

    UPDATE gpexpand.status_detail SET rank = 15 WHERE fq_name = 'public.table4';

To exclude a table from the redistribution, remove it from the gpexpand.status_detail table:

DELETE FROM gpexpand.status_detail WHERE fq_name = 'public.table5';

Run redistribution sessions

To run a redistribution session, run gpexpand with one of two options:

  • -d <HH:MM:SS> — run a session of a specific duration.

  • -e <YYYY-MM-DD HH:MM:SS> — run redistribution until a specified point in time.

Examples:

  • Run a five-minute redistribution session:

    $ gpexpand -d 00:05:00
  • Run redistribution until 9 AM on March 13, 2025:

    $ gpexpand -e 2025-03-13 09:00:00

Greengage DB can redistribute multiple tables in parallel. To run a parallel redistribution session, specify the number of tables to process simultaneously as the -n option value:

$ gpexpand -n 16
IMPORTANT

During redistribution, two database connections are created for each table: one alters the table, and another updates the progress in the gpexpand schema. When setting the number of parallel table redistributions, ensure that the cluster’s maximum connections limit (the max_connections configuration parameter) is not exceeded. Increase the limit if necessary.

Monitor expansion state

This section describes how to track the progress of cluster expansion.

gpstate utility

When called with the -x option, gpstate shows the number of tables that still need redistribution:

$ gpstate -x

Its output includes lines like these:

[INFO]:-Cluster Expansion State = Data Distribution - Paused
[INFO]:-----------------------------------------------------
[INFO]:-Number of tables to be redistributed
[INFO]:-     Database   Count of Tables to redistribute
[INFO]:-     postgres   3
[INFO]:-----------------------------------------------------

gpexpand.expansion_progress table

The gpexpand.expansion_progress table provides current expansion statistics, including the number of completed and pending tables, data transferred, and estimated completion time.

SELECT * FROM gpexpand.expansion_progress;

Query result:

             name             |         value
------------------------------+-----------------------
 Tables Expanded              | 1
 Estimated Expansion Rate     | 27.0714856745503 MB/s
 Bytes Left                   | 211241312
 Estimated Time to Completion | 00:00:07.441609
 Tables Left                  | 3
 Bytes Done                   | 264568832
(6 rows)

gpexpand.status_detail table

The gpexpand.status_detail table stores detailed redistribution progress for each table.

SELECT dbname, fq_name,rank, status, expansion_started, expansion_finished
FROM gpexpand.status_detail;

Query result:

  dbname  |    fq_name    | rank |   status    |     expansion_started      |     expansion_finished
----------+---------------+------+-------------+----------------------------+----------------------------
 postgres | public.table1 |  100 | NOT STARTED |                            |
 postgres | public.table4 |   30 | NOT STARTED |                            |
 postgres | public.table2 |   20 | NOT STARTED |                            |
 postgres | public.table3 |   10 | COMPLETED   | 2025-03-13 10:15:41.775582 | 2025-03-13 10:15:50.095813
(4 rows)

Post-expansion actions

When redistribution of all tables is completed, the gpexpand output includes the following line:

[INFO]:-EXPANSION COMPLETED SUCCESSFULLY

Additionally, the gpexpand.status table contains a row named EXPANSION COMPLETE, which indicates completion:

SELECT * FROM gpexpand.status WHERE status = 'EXPANSION COMPLETE';

Query result:

       status       |          updated
--------------------+----------------------------
 EXPANSION COMPLETE | 2025-03-13 10:29:18.385153
(1 row)

This section describes steps that you may need to perform after the expansion is completed.

Remove the expansion schema

The gpexpand schema is no longer needed after redistribution is complete. Remove it with the following call on the master host:

$ gpexpand -c
NOTE

A new expansion cannot be started while the gpexpand schema exists in the cluster.

Restore a backup of the original cluster

If you need to restore data from a backup created before the expansion, use gprestore with the --resize-cluster option.

$ gprestore --resize-cluster --backup-dir /data1/backup --timestamp 20250302111439

Set up PXF

If your cluster uses PXF, configure it on all hosts of the expanded cluster:

  1. Install PXF on new hosts.

  2. Synchronize PXF configurations across all cluster nodes:

    $ pxf cluster register
    $ pxf cluster sync
  3. Restart PXF:

    $ pxf cluster restart