Installation
This topic describes how to build the Greenplum Platform Extension Framework (PXF) from the source code on Linux operating systems for use in a Greengage DB cluster. The PXF source code is available in the GitHub repository.
By default, PXF is deployed in a co-located topology.
Under this scenario, PXF is installed on each Greengage DB host, and the PXF service starts and runs on each segment host.
PXF services are then managed collectively by using the pxf cluster commands.
In an alternative scenario, PXF is installed on both the non-Greengage DB hosts and all Greengage DB hosts.
PXF services in this case are managed individually using the pxf command on each host; you cannot manage them collectively with the pxf cluster commands.
If you choose the alternative deployment topology, you must explicitly configure each Greengage DB host to identify the host and listen address on which the PXF service is running.
To learn more about managing a PXF cluster, see Manage PXF.
For the full description of the pxf and pxf cluster commands, see Overview of the pxf commands and Overview of the pxf cluster commands.
Prerequisites
To follow the installation steps, make sure that you have the following setup:
-
One of the following operating systems:
-
Ubuntu 22.04
-
CentOS 7.9
-
-
The required software is installed on the chosen operating system:
-
git -
GCCcompiler -
makesystem -
cURL7.29 or later -
unzip
-
-
A running Greengage DB cluster, see Build Greengage DB from the source code and Initialize DBMS for details.
-
Access to all cluster hosts (master, standby master, and segment hosts). The operating system user that will own the installation directory must also own the Greengage DB installation directory or have write privileges to it.
-
Optionally, an arbitrary directory for storing the cloned
pxfsource code and the required prerequisite software. Throughout this guide, ~/workspace is used as an example of such a directory.
Install dependencies
Before building PXF from the source code, you need to install its software dependencies: Java and Go.
Create the host file
The hostfile_secondary_hosts file lists the host names or IP addresses of all secondary cluster hosts (that is, all hosts but the master host).
This lets you use the gpscp and gpssh Greengage DB utilities to propagate changes on the master host to the remaining cluster hosts.
The file is created similarly to the one created during Greengage DBMS initialization.
-
Log in to the master host as
gpadminand go to the home directory. -
Create the hostfile_secondary_hosts file:
$ vi hostfile_secondary_hosts -
Add all the secondary host names to the file. For example, for a cluster comprising four hosts:
-
mdw— the master host; -
smdw— the standby master host; -
sdw1andsdw2— the segment hosts;
the following lines have to be added to the file:
smdw sdw1 sdw2
Make sure there are no blank lines or extra spaces.
-
-
Save and close the file.
Install Java
-
On the Greengage DB master host, download the required JDK 17 package from jdk.java.net to ~/workspace.
$ cd ~/workspace $ curl -O https://download.java.net/java/GA/jdk17.0.1/<openjdk-17.tar.gz> -
Use the
gpscpand thegpsshutilities to copy the downloaded package to secondary cluster hosts and then unpack it on all hosts:$ gpssh -f ~/hostfile_secondary_hosts "mkdir -p /home/gpadmin/workspace" $ gpscp -f ~/hostfile_secondary_hosts /home/gpadmin/workspace/<openjdk-17.tar.gz> =:/home/gpadmin/workspace/ $ gpssh -f ~/hostfile_all_hosts 'tar -C /home/gpadmin/workspace -zxvf /home/gpadmin/workspace/<openjdk-17.tar.gz>' -
Set the
JAVA_HOMEenvironment variable to point at the location of the unpacked Java package. Add the variable toPATHand update the user’s shell startup file (for example, .bashrc). Use thegpsshutility to perform this on all cluster hosts:$ gpssh -f ~/hostfile_all_hosts "echo 'export JAVA_HOME=/home/gpadmin/workspace/<PATH_TO_JAVA_HOME>' >> ~/.bashrc" $ gpssh -f ~/hostfile_all_hosts "echo 'export PATH=\"\$JAVA_HOME/bin:\$PATH\"' >> ~/.bashrc" -
After editing the profile file, use
gpsshto source it and apply the changes on all hosts:$ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc"
Install Go
-
On the Greengage DB master host, download the required Go package from go.dev to ~/workspace and unpack it:
$ curl -O https://go.dev/dl/<go.tar.gz> $ tar zxvf <go.tar.gz> -
Set the
GOPATHandGOPROXYenvironment variables and updatePATHaccordingly:$ export GOPATH=<PATH_TO_GO_HOME> $ export GOPROXY=https://proxy.golang.org $ export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin -
Install the required Ginkgo package:
$ go install github.com/onsi/ginkgo/v2/ginkgo@latest
Build PXF from the source code
-
Clone the PXF repository to ~/workspace:
$ git clone https://github.com/greengagedb/pxf.git -
Optionally, checkout a tag to build and install a specific version:
$ git checkout <tag_name>where
<tag_name>matches the version number. -
Create the PXF installation directory (/usr/local/pxf) and change its ownership to
gpadmin. Use thegpsshutility to perform this on all cluster hosts:$ gpssh -f ~/hostfile_all_hosts "sudo mkdir -p /usr/local/pxf" $ gpssh -f ~/hostfile_all_hosts "sudo chown -R gpadmin:gpadmin /usr/local/pxf" -
Set the
PXF_HOME(denoting the PXF installation directory) andPXF_BASE(denoting the PXF runtime configuration directory) environment variables by adding the corresponding commands to .bashrc. Use thegpsshutility to perform this on all cluster hosts:$ gpssh -f ~/hostfile_all_hosts "echo 'export PXF_HOME=/usr/local/pxf' >> ~/.bashrc" $ gpssh -f ~/hostfile_all_hosts "echo 'export PXF_BASE=\"${HOME}/pxf-base\"' >> ~/.bashrc"NOTEIf
PXF_BASEis not set, thePXF_HOMEvalue is applied. In this case, server configurations, libraries, or other configurations might get deleted when reinstalling PXF. See Installation directory for more information on the installation and configuration directories used by PXF. -
After editing the profile file, use
gpsshto source it and apply the changes on all hosts:$ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc" -
Run
make installto compile and install PXF:$ cd ~/workspace/pxf $ make installThe output should look similar to the following:
... BUILD SUCCESSFUL in 4m 26s 24 actionable tasks: 24 executed install -m 744 -d "build/stage/lib" install -m 744 -d "build/stage/lib/native" install -m 744 -d "build/stage/servers" install -m 744 -d "build/stage/servers/default" install -m 700 -d "build/stage/logs" install -m 700 -d "build/stage/run" install -m 700 -d "build/stage/keytabs" make[1]: Leaving directory '/home/gpadmin/workspace/pxf/server' ===> PXF compilation is complete <===
-
Use the
gpscputility to propagate the PXF installation to secondary hosts:$ gpscp -f ~/hostfile_secondary_hosts -r /usr/local/pxf =:/usr/local/ -
Add the PXF command-line executable location to
PATH. Use thegpsshutility to perform this on all cluster hosts:$ gpssh -f ~/hostfile_all_hosts "echo 'export PATH=/usr/local/pxf/bin:\$PATH' >> ~/.bashrc" -
After editing the profile file, use
gpsshto source it and apply the changes on all hosts:$ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc"
Installation directory
The PXF installation directory (specified by PXF_HOME) includes both the PXF executables and the runtime configuration files.
| Item | Description |
|---|---|
application |
The location of the PXF Server application JAR file |
bin |
The location of the PXF command-line executables |
conf |
The location of user-customizable PXF configuration files for PXF runtime and logging configuration settings: pxf-application.properties, pxf-env.sh, pxf-log4j2.xml, and pxf-profiles.xml |
keytabs |
The default location of the Kerberos principal keytab file for secure authentication of the PXF service.
The keytabs directory and contained files are readable only by the Greengage DB installation user, typically |
lib |
The location of user-added runtime dependencies. The native subdirectory is the default PXF runtime directory for native libraries |
logs |
The PXF runtime log file directory.
The logs directory and log files are readable only by the Greengage DB installation user, typically |
run |
The default PXF run directory.
After starting PXF, this directory contains a PXF process id file, pxf-app.pid.
The run directory and its contents are readable only by the Greengage DB installation owner, typically |
servers |
The configuration directory for PXF servers; each subdirectory contains a server definition, and the name of the subdirectory identifies the name of the server.
The default server is named |
share |
The directory for shared PXF files that may be required depending on the external data stores that you access. The directory initially includes only the PXF HBase JAR file |
templates |
The PXF directory for server configuration file templates |
version |
The file denoting the installed PXF version |
Start PXF
PXF is not active after installation. Before using the framework, you must explicitly initialize and start the PXF service.
-
Prepare a new base directory specified by the
PXF_BASEenvironment variable during installation:$ pxf cluster prepare -
Start the PXF service on all Greengage DB cluster hosts:
$ pxf cluster startThe output should look similar to the following:
Starting PXF on coordinator host, standby coordinator host, and 2 segment hosts... PXF started successfully on 4 out of 4 hosts
Check PXF
To check the PXF version, run the following command on each Greengage DB host:
$ pxf version
The output should look similar to the following:
PXF version 6.13.0-SNAPSHOT
Stop PXF
To stop the PXF service on all Greengage DB cluster hosts, run the following command:
$ pxf cluster stop