Initialize DBMS
This topic describes how to initialize Greengage DBMS (based on Greenplum) after installation. The initialization process may slightly differ depending on the number of hosts and their hardware configurations, for example:
-
The number of CPUs per segment host affects the recommended number of segments running on this host.
-
If a cluster host has several unbonded network interfaces, this affects how the host files look.
-
If a cluster host has several logical disks for storing user data, this affects how many data storage areas should be created.
This topic assumes that a Greengage DB cluster is deployed on four hosts described below.
Prerequisites
Hardware
To demonstrate how to initialize Greengage DBMS, four hosts with the following names are used:
-
mdw
— the master host. -
smdw
— the standby master host. -
sdw1
andsdw2
— the segment hosts.
Each host in this setup includes four CPU cores, one network interface, and one logical disk. Given the number of CPUs, each segment host has two primary and two mirror segments for better performance.
Software
Before initializing Greengage DBMS, you need to prepare all the hosts that constitute the cluster:
-
Install all the required software dependencies listed in this topic: Software requirements for Greengage DB installation.
-
Install Greengage DB on all the hosts. For example, to install Greengage DB built from the source code, follow the steps from the topic: Build Greengage DB from the source code. This topic describes how to:
-
configure an operating system;
-
create the Greengage DB administrative user;
-
build Greengage DB from the source code;
-
install Greengage DB.
NOTETo initialize DBMS, you do not need to create a demo cluster.
-
The /etc/hosts file
Update the /etc/hosts file on every cluster host. This file should include all the host names and the interface address names for every host in the cluster. For the cluster used in this topic, /etc/hosts might look as follows:
# ... 192.168.1.10 mdw 192.168.1.20 smdw 192.168.1.30 sdw1 192.168.1.40 sdw2
Create the host files
Various Greengage DB utilities use a host file to perform cluster-wide operations. This file lists the host names or IP addresses of cluster hosts. The example use cases of host files are:
-
executing commands on multiple hosts in parallel;
-
enabling passwordless SSH from every host to every other host;
-
initializing Greengage DBMS.
You can check the /etc/hosts file for the correct host names to use.
Create the host file with all hosts
-
Log in to the master host (
mdw
) asgpadmin
and go to the home directory. -
Create the hostfile_all_hosts file:
$ vi hostfile_all_hosts
-
Add all the host names to this file:
mdw smdw sdw1 sdw2
Make sure there are no blank lines or extra spaces.
-
Save and close the file.
Create the host file with segment hosts
-
Create the hostfile_segment_hosts file:
$ vi hostfile_segment_hosts
-
Add the host names of segment hosts to this file:
sdw1 sdw2
-
Save and close the file.
Enable passwordless SSH
The gpadmin
user should be able to SSH from any host to any other host without entering a password or passphrase.
To achieve this, you need to enable passwordless SSH, as described below.
Enable 1-n passwordless SSH
To enable passwordless SSH from the master host to every other host in the cluster, use the ssh-copy-id
utility:
-
Execute the
ssh-copy-id
command to add thegpadmin
user’s public key to the authorized_keys SSH file on thesmdw
host:$ ssh-copy-id smdw
-
(Optional) Type
yes
and pressEnter
if the following SSH warning is shown:This key is not known by any other names Are you sure you want to continue connecting (yes/no/[fingerprint])?
-
Enter the password of the
gpadmin
user for thesmdw
host and pressEnter
when the following prompt is shown:INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys gpadmin@smdw's password:
The output should look as follows:
Number of key(s) added: 1 Now try logging into the machine, with: "ssh 'smdw'" and check to make sure that only the key(s) you wanted were added.
-
Repeat the steps above for the segment hosts:
$ ssh-copy-id sdw1 $ ssh-copy-id sdw2
Enable n-n passwordless SSH
To enable passwordless SSH from every host to every other host, use the gpssh-exkeys
utility.
Pass the name of the host file containing all host names to the gpssh-exkeys
command:
$ gpssh-exkeys -f hostfile_all_hosts
The output should look like this:
[STEP 1 of 5] create local ID and authorize on local host ... /home/gpadmin/.ssh/id_rsa file exists ... key generation skipped [STEP 2 of 5] keyscan all hosts and update known_hosts file [STEP 3 of 5] retrieving credentials from remote hosts ... send to smdw ... send to sdw1 ... send to sdw2 [STEP 4 of 5] determine common authentication file content [STEP 5 of 5] copy authentication files to all remote hosts ... finished key exchange with smdw ... finished key exchange with sdw1 ... finished key exchange with sdw2 [INFO] completed successfully
Create the data storage areas
A data storage area is a directory location where Greengage DB stores data for master and segment instances. Note that the data storage areas store different data for master and segment instances:
-
For the master host, the data storage area stores system data, such as the system catalog tables and metadata.
-
For segment hosts, the data storage areas store user data distributed across multiple primary segments. Mirror segments require separate storage areas.
Create the data directory on the master
-
Create the /data1/master directory on the master host:
$ sudo mkdir -p /data1/master
-
Change the owner of the created directory to
gpadmin
:$ sudo chown gpadmin:gpadmin /data1/master
Create the data directory on the standby master
To create the data directory on the standby master, use the gpssh
utility:
-
Execute the following command on the master host:
$ gpssh -h smdw -e 'sudo -n mkdir -p /data1/master'
-
Change the owner of the created directory to
gpadmin
:$ gpssh -h smdw -e 'sudo chown gpadmin:gpadmin /data1/master'
Create the data directories on segment hosts
To create the data directories on all segment hosts at once, you need to pass the name of the host file containing segment host names to the gpssh
command:
-
Create the data directories for primary segments:
$ gpssh -f hostfile_segment_hosts -e 'sudo mkdir -p /data1/primary'
-
Create the data directories for mirror segments:
$ gpssh -f hostfile_segment_hosts -e 'sudo mkdir -p /data1/mirror'
-
Change the owner of the created directories to
gpadmin
:$ gpssh -f hostfile_segment_hosts -e 'sudo chown -R gpadmin /data1/*'
Initialize Greengage DBMS
To initialize Greengage DBMS, you need to:
-
Specify the database configuration settings in a file.
-
Run the Greengage DB initialization utility on the master host.
-
Set Greengage DB-specific environment variables for the
gpadmin
user.
Create the database configuration file
The database configuration file tells the initialization utility how to configure Greengage DBMS. An example configuration file can be found in $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config.
-
Create the init_config file in the home
gpadmin
directory:$ vi init_config
-
Specify the following settings:
ARRAY_NAME="Greengage DB cluster" SEG_PREFIX=gpseg PORT_BASE=10000 declare -a DATA_DIRECTORY=(/data1/primary /data1/primary ) MASTER_HOSTNAME=mdw MASTER_DIRECTORY=/data1/master MASTER_PORT=5432 TRUSTED_SHELL=ssh CHECK_POINT_SEGMENTS=8 ENCODING=UNICODE MIRROR_PORT_BASE=10500 declare -a MIRROR_DATA_DIRECTORY=(/data1/mirror /data1/mirror )
Some of the key settings in this file are:
-
SEG_PREFIX
— the prefix used to name the data directories on the master and segment instances. -
PORT_BASE
— the base number by which primary segment port numbers are calculated. For mirror segments,MIRROR_PORT_BASE
is used. -
DATA_DIRECTORY
— the data storage locations where the initialization utility creates data directories for primary segments. For mirror segments,MIRROR_DATA_DIRECTORY
is used. -
MASTER_DIRECTORY
— the data storage locations where the initialization utility creates the master data directory. -
MASTER_PORT
— the port number for the master instance used by users and client connections to access Greengage DBMS.
-
-
Save and close the file.
Run the initialization utility
To initialize Greengage DBMS, use the gpinitsystem
utility:
-
Execute the
gpinitsystem
command with the following options:$ gpinitsystem -c init_config -h hostfile_segment_hosts -s smdw
-
The
-c
option is used to pass the database configuration file name. -
The
-h
option specifies the name of the host file with segment hosts. -
The
-s
option specifies the host name of the standby master host.
-
-
The
gpinitsystem
utility verifies your setup information and checks if it can connect to each host and access the data directories specified in the configuration. The output should show the cluster configuration to apply. The configuration of segments for the example cluster looks as follows:[INFO]:---------------------------------------- [INFO]:-Greengage Primary Segment Configuration [INFO]:---------------------------------------- [INFO]:-sdw1 10000 sdw1 /data1/primary/gpseg0 2 [INFO]:-sdw1 10001 sdw1 /data1/primary/gpseg1 3 [INFO]:-sdw2 10000 sdw2 /data1/primary/gpseg2 4 [INFO]:-sdw2 10001 sdw2 /data1/primary/gpseg3 5 [INFO]:--------------------------------------- [INFO]:-Greengage Mirror Segment Configuration [INFO]:--------------------------------------- [INFO]:-sdw2 10500 sdw2 /data1/mirror/gpseg0 6 [INFO]:-sdw2 10501 sdw2 /data1/mirror/gpseg1 7 [INFO]:-sdw1 10500 sdw1 /data1/mirror/gpseg2 8 [INFO]:-sdw1 10501 sdw1 /data1/mirror/gpseg3 9
-
If all the pre-checks are successful, the utility prompts you to confirm the configuration:
Continue with Greengage creation Yy|Nn (default=N):
Enter
y
and pressEnter
to start the initialization process. -
At the end of a successful setup,
gpinitsystem
starts your Greengage DBMS and shows the following output:Greengage Database instance successfully created
-
Set the
MASTER_DATA_DIRECTORY
environment variable:$ export MASTER_DATA_DIRECTORY=/data1/master/gpseg-1
-
Get the information about the running cluster using the
gpstate
utility:$ gpstate
The result should include the following lines:
Master instance = Active Master standby = smdw Standby master state = Standby host passive Total segment instance count from metadata = 8
Troubleshoot initialization issues
The initialization process fails if gpinitsystem
encounters any errors while setting up an instance.
This could result in a partially created system.
You can see the cause of this failure by looking at the log files stored in the ~/gpAdminLogs directory.
Depending on the error encountered during the initialization, you may need to clean up and run gpinitsystem
again.
For example, if some segment instances are created and some fail, you may need to stop Postgres processes and remove any utility-created data directories from the data storage areas.
Greengage DB creates a backout script to help with this cleanup.
For example, if the /data1 data directory is missing on the second segment host (sdw2
), the initialization process fails with the following error:
[FATAL]:-Cannot write to /data1/primary on sdw2 Script Exiting! [WARN]:-Script has left Greengage Database in an incomplete state [WARN]:-Run command bash /home/gpadmin/gpAdminLogs/backout_gpinitsystem_gpadmin_20250128_144517 on master to remove these changes
To remove any utility-created data directories, Postgres processes, and log files, run the backout script:
$ bash ~/gpAdminLogs/backout_gpinitsystem_gpadmin_20250128_144517
The output should look as follows:
Stopping segment instance on sdw1 removing directory /data1/primary/gpseg0 on sdw1 Stopping segment instance on sdw1 removing directory /data1/primary/gpseg1 on sdw1
Set Greengage DB environment variables
After initializing Greengage DBMS, you need to set environment variables in the gpadmin
user environment on the master and standby master hosts.
You can do this by editing the gpadmin
profile file, such as .bashrc:
-
Open the .bashrc file:
$ vi ~/.bashrc
-
Add the following line to the file:
source /usr/local/gpdb/greengage_path.sh
The greengage_path.sh script sets or updates a set of environment variables, in particular:
-
GPHOME
— specifies the directory where Greengage DB is installed. -
PATH
— updates the system path to include directories where Greengage DB utilities are stored. -
LD_LIBRARY_PATH
— adds the path to Greengage DB libraries to a list of directories where the dynamic linker should search for shared libraries.
-
-
Specify the
MASTER_DATA_DIRECTORY
environment variable:export MASTER_DATA_DIRECTORY=/data1/master/gpseg-1
-
(Optional) Set the
PGPORT
,PGUSER
, andPGDATABASE
environment variables to configure the connection settings for accessing a database:export PGPORT=5432 export PGUSER=gpadmin export PGDATABASE=postgres
For example, the
psql
utility considers these values when connecting to a database. -
(Optional) If you use RHEL or CentOS, add the following line to the end of the .bashrc file to enable using the
ps
command in thegpadmin
environment:export LD_PRELOAD=/lib64/libz.so.1 ps
-
Save and close the file.
-
After editing the profile file, source it to apply the changes:
$ source ~/.bashrc
-
Copy the .bashrc file to the standby master using the
gpscp
utility:$ gpscp -h smdw .bashrc =: