Github

Initialize DBMS

Andrey Aksenov

This topic describes how to initialize Greengage DBMS (based on Greenplum) after installation. The initialization process may slightly differ depending on the number of hosts and their hardware configurations, for example:

  • The number of CPUs per segment host affects the recommended number of segments running on this host.

  • If a cluster host has several unbonded network interfaces, this affects how the host files look.

  • If a cluster host has several logical disks for storing user data, this affects how many data storage areas should be created.

This topic assumes that a Greengage DB cluster is deployed on four hosts described below.

Prerequisites

Hardware

To demonstrate how to initialize Greengage DBMS, four hosts with the following names are used:

  • mdw — the master host.

  • smdw — the standby master host.

  • sdw1 and sdw2 — the segment hosts.

Each host in this setup includes four CPU cores, one network interface, and one logical disk. Given the number of CPUs, each segment host has two primary and two mirror segments for better performance.

Software

Before initializing Greengage DBMS, you need to prepare all the hosts that constitute the cluster:

  1. Install all the required software dependencies listed in this topic: Software requirements for Greengage DB installation.

  2. Install Greengage DB on all the hosts. For example, to install Greengage DB built from the source code, follow the steps from the topic: Build Greengage DB from the source code. This topic describes how to:

    • configure an operating system;

    • create the Greengage DB administrative user;

    • build Greengage DB from the source code;

    • install Greengage DB.

      NOTE

      To initialize DBMS, you do not need to create a demo cluster.

The /etc/hosts file

Update the /etc/hosts file on every cluster host. This file should include all the host names and the interface address names for every host in the cluster. For the cluster used in this topic, /etc/hosts might look as follows:

# ...
192.168.1.10 mdw
192.168.1.20 smdw
192.168.1.30 sdw1
192.168.1.40 sdw2

Create the host files

Various Greengage DB utilities use a host file to perform cluster-wide operations. This file lists the host names or IP addresses of cluster hosts. The example use cases of host files are:

  • executing commands on multiple hosts in parallel;

  • enabling passwordless SSH from every host to every other host;

  • initializing Greengage DBMS.

You can check the /etc/hosts file for the correct host names to use.

Create the host file with all hosts

  1. Log in to the master host (mdw) as gpadmin and go to the home directory.

  2. Create the hostfile_all_hosts file:

    $ vi hostfile_all_hosts
  3. Add all the host names to this file:

    mdw
    smdw
    sdw1
    sdw2

    Make sure there are no blank lines or extra spaces.

  4. Save and close the file.

Create the host file with segment hosts

  1. Create the hostfile_segment_hosts file:

    $ vi hostfile_segment_hosts
  2. Add the host names of segment hosts to this file:

    sdw1
    sdw2
  3. Save and close the file.

Enable passwordless SSH

The gpadmin user should be able to SSH from any host to any other host without entering a password or passphrase. To achieve this, you need to enable passwordless SSH, as described below.

Enable 1-n passwordless SSH

To enable passwordless SSH from the master host to every other host in the cluster, use the ssh-copy-id utility:

  1. Execute the ssh-copy-id command to add the gpadmin user’s public key to the authorized_keys SSH file on the smdw host:

    $ ssh-copy-id smdw
  2. (Optional) Type yes and press Enter if the following SSH warning is shown:

    This key is not known by any other names
    Are you sure you want to continue connecting (yes/no/[fingerprint])?
  3. Enter the password of the gpadmin user for the smdw host and press Enter when the following prompt is shown:

    INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
    gpadmin@smdw's password:

    The output should look as follows:

    Number of key(s) added: 1
    
    Now try logging into the machine, with:   "ssh 'smdw'"
    and check to make sure that only the key(s) you wanted were added.
  4. Repeat the steps above for the segment hosts:

    $ ssh-copy-id sdw1
    $ ssh-copy-id sdw2

Enable n-n passwordless SSH

To enable passwordless SSH from every host to every other host, use the gpssh-exkeys utility. Pass the name of the host file containing all host names to the gpssh-exkeys command:

$ gpssh-exkeys -f hostfile_all_hosts

The output should look like this:

[STEP 1 of 5] create local ID and authorize on local host
  ... /home/gpadmin/.ssh/id_rsa file exists ... key generation skipped

[STEP 2 of 5] keyscan all hosts and update known_hosts file

[STEP 3 of 5] retrieving credentials from remote hosts
  ... send to smdw
  ... send to sdw1
  ... send to sdw2

[STEP 4 of 5] determine common authentication file content

[STEP 5 of 5] copy authentication files to all remote hosts
  ... finished key exchange with smdw
  ... finished key exchange with sdw1
  ... finished key exchange with sdw2

[INFO] completed successfully

Create the data storage areas

A data storage area is a directory location where Greengage DB stores data for master and segment instances. Note that the data storage areas store different data for master and segment instances:

  • For the master host, the data storage area stores system data, such as the system catalog tables and metadata.

  • For segment hosts, the data storage areas store user data distributed across multiple primary segments. Mirror segments require separate storage areas.

Create the data directory on the master

  1. Create the /data1/master directory on the master host:

    $ sudo mkdir -p /data1/master
  2. Change the owner of the created directory to gpadmin:

    $ sudo chown gpadmin:gpadmin /data1/master

Create the data directory on the standby master

To create the data directory on the standby master, use the gpssh utility:

  1. Execute the following command on the master host:

    $ gpssh -h smdw -e 'sudo -n mkdir -p /data1/master'
  2. Change the owner of the created directory to gpadmin:

    $ gpssh -h smdw -e 'sudo chown gpadmin:gpadmin /data1/master'

Create the data directories on segment hosts

To create the data directories on all segment hosts at once, you need to pass the name of the host file containing segment host names to the gpssh command:

  1. Create the data directories for primary segments:

    $ gpssh -f hostfile_segment_hosts -e 'sudo mkdir -p /data1/primary'
  2. Create the data directories for mirror segments:

    $ gpssh -f hostfile_segment_hosts -e 'sudo mkdir -p /data1/mirror'
  3. Change the owner of the created directories to gpadmin:

    $ gpssh -f hostfile_segment_hosts -e 'sudo chown -R gpadmin /data1/*'

Initialize Greengage DBMS

To initialize Greengage DBMS, you need to:

  1. Specify the database configuration settings in a file.

  2. Run the Greengage DB initialization utility on the master host.

  3. Set Greengage DB-specific environment variables for the gpadmin user.

Create the database configuration file

The database configuration file tells the initialization utility how to configure Greengage DBMS. An example configuration file can be found in $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config.

  1. Create the init_config file in the home gpadmin directory:

    $ vi init_config
  2. Specify the following settings:

    ARRAY_NAME="Greengage DB cluster"
    SEG_PREFIX=gpseg
    PORT_BASE=10000
    declare -a DATA_DIRECTORY=(/data1/primary /data1/primary )
    MASTER_HOSTNAME=mdw
    MASTER_DIRECTORY=/data1/master
    MASTER_PORT=5432
    TRUSTED_SHELL=ssh
    CHECK_POINT_SEGMENTS=8
    ENCODING=UNICODE
    MIRROR_PORT_BASE=10500
    declare -a MIRROR_DATA_DIRECTORY=(/data1/mirror /data1/mirror )

    Some of the key settings in this file are:

    • SEG_PREFIX — the prefix used to name the data directories on the master and segment instances.

    • PORT_BASE — the base number by which primary segment port numbers are calculated. For mirror segments, MIRROR_PORT_BASE is used.

    • DATA_DIRECTORY — the data storage locations where the initialization utility creates data directories for primary segments. For mirror segments, MIRROR_DATA_DIRECTORY is used.

    • MASTER_DIRECTORY — the data storage locations where the initialization utility creates the master data directory.

    • MASTER_PORT — the port number for the master instance used by users and client connections to access Greengage DBMS.

  3. Save and close the file.

Run the initialization utility

To initialize Greengage DBMS, use the gpinitsystem utility:

  1. Execute the gpinitsystem command with the following options:

    $ gpinitsystem -c init_config -h hostfile_segment_hosts -s smdw
    • The -c option is used to pass the database configuration file name.

    • The -h option specifies the name of the host file with segment hosts.

    • The -s option specifies the host name of the standby master host.

  2. The gpinitsystem utility verifies your setup information and checks if it can connect to each host and access the data directories specified in the configuration. The output should show the cluster configuration to apply. The configuration of segments for the example cluster looks as follows:

    [INFO]:----------------------------------------
    [INFO]:-Greengage Primary Segment Configuration
    [INFO]:----------------------------------------
    [INFO]:-sdw1        10000   sdw1   /data1/primary/gpseg0   2
    [INFO]:-sdw1        10001   sdw1   /data1/primary/gpseg1   3
    [INFO]:-sdw2        10000   sdw2   /data1/primary/gpseg2   4
    [INFO]:-sdw2        10001   sdw2   /data1/primary/gpseg3   5
    [INFO]:---------------------------------------
    [INFO]:-Greengage Mirror Segment Configuration
    [INFO]:---------------------------------------
    [INFO]:-sdw2        10500   sdw2   /data1/mirror/gpseg0    6
    [INFO]:-sdw2        10501   sdw2   /data1/mirror/gpseg1    7
    [INFO]:-sdw1        10500   sdw1   /data1/mirror/gpseg2    8
    [INFO]:-sdw1        10501   sdw1   /data1/mirror/gpseg3    9
  3. If all the pre-checks are successful, the utility prompts you to confirm the configuration:

    Continue with Greengage creation Yy|Nn (default=N):

    Enter y and press Enter to start the initialization process.

  4. At the end of a successful setup, gpinitsystem starts your Greengage DBMS and shows the following output:

    Greengage Database instance successfully created
  5. Set the MASTER_DATA_DIRECTORY environment variable:

    $ export MASTER_DATA_DIRECTORY=/data1/master/gpseg-1
  6. Get the information about the running cluster using the gpstate utility:

    $ gpstate

    The result should include the following lines:

    Master instance                               = Active
    Master standby                                = smdw
    Standby master state                          = Standby host passive
    Total segment instance count from metadata    = 8

Troubleshoot initialization issues

The initialization process fails if gpinitsystem encounters any errors while setting up an instance. This could result in a partially created system. You can see the cause of this failure by looking at the log files stored in the ~/gpAdminLogs directory.

Depending on the error encountered during the initialization, you may need to clean up and run gpinitsystem again. For example, if some segment instances are created and some fail, you may need to stop Postgres processes and remove any utility-created data directories from the data storage areas. Greengage DB creates a backout script to help with this cleanup.

For example, if the /data1 data directory is missing on the second segment host (sdw2), the initialization process fails with the following error:

[FATAL]:-Cannot write to /data1/primary on sdw2  Script Exiting!
[WARN]:-Script has left Greengage Database in an incomplete state
[WARN]:-Run command bash /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20250128_144517 on master to remove these changes

To remove any utility-created data directories, Postgres processes, and log files, run the backout script:

$ bash ~/gpAdminLogs/backout_gpinitsystem_gpadmin_20250128_144517

The output should look as follows:

Stopping segment instance on sdw1
removing directory /data1/primary/gpseg0 on sdw1
Stopping segment instance on sdw1
removing directory /data1/primary/gpseg1 on sdw1

Set Greengage DB environment variables

After initializing Greengage DBMS, you need to set environment variables in the gpadmin user environment on the master and standby master hosts. You can do this by editing the gpadmin profile file, such as .bashrc:

  1. Open the .bashrc file:

    $ vi ~/.bashrc
  2. Add the following line to the file:

    source /usr/local/gpdb/greengage_path.sh

    The greengage_path.sh script sets or updates a set of environment variables, in particular:

    • GPHOME — specifies the directory where Greengage DB is installed.

    • PATH — updates the system path to include directories where Greengage DB utilities are stored.

    • LD_LIBRARY_PATH — adds the path to Greengage DB libraries to a list of directories where the dynamic linker should search for shared libraries.

  3. Specify the MASTER_DATA_DIRECTORY environment variable:

    export MASTER_DATA_DIRECTORY=/data1/master/gpseg-1
  4. (Optional) Set the PGPORT, PGUSER, and PGDATABASE environment variables to configure the connection settings for accessing a database:

    export PGPORT=5432
    export PGUSER=gpadmin
    export PGDATABASE=postgres

    For example, the psql utility considers these values when connecting to a database.

  5. (Optional) If you use RHEL or CentOS, add the following line to the end of the .bashrc file to enable using the ps command in the gpadmin environment:

    export LD_PRELOAD=/lib64/libz.so.1 ps
  6. Save and close the file.

  7. After editing the profile file, source it to apply the changes:

    $ source ~/.bashrc
  8. Copy the .bashrc file to the standby master using the gpscp utility:

    $ gpscp -h smdw .bashrc =: