Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Installation

Anton Monakov

This topic describes how to build the Greenplum Platform Extension Framework (PXF) from the source code on Linux operating systems for use in a Greengage DB cluster. The PXF source code is available in the GitHub repository.

By default, PXF is deployed in a co-located topology. Under this scenario, PXF is installed on each Greengage DB host, and the PXF service starts and runs on each segment host. PXF services are then managed collectively by using the pxf cluster commands.

In an alternative scenario, PXF is installed on both the non-Greengage DB hosts and all Greengage DB hosts. PXF services in this case are managed individually using the pxf command on each host; you cannot manage them collectively with the pxf cluster commands. If you choose the alternative deployment topology, you must explicitly configure each Greengage DB host to identify the host and listen address on which the PXF service is running.

To learn more about managing a PXF cluster, see Manage PXF. For the full description of the pxf and pxf cluster commands, see Overview of the pxf commands and Overview of the pxf cluster commands.

Prerequisites

To follow the installation steps, make sure that you have the following setup:

  • One of the following operating systems:

    • Ubuntu 22.04

    • CentOS 7.9

  • The required software is installed on the chosen operating system:

    • git

    • GCC compiler

    • make system

    • cURL 7.29 or later

    • unzip

  • A running Greengage DB cluster, see Build Greengage DB from the source code and Initialize DBMS for details.

  • Access to all cluster hosts (master, standby master, and segment hosts). The operating system user that will own the installation directory must also own the Greengage DB installation directory or have write privileges to it.

  • Optionally, an arbitrary directory for storing the cloned pxf source code and the required prerequisite software. Throughout this guide, ~/workspace is used as an example of such a directory.

Install dependencies

Before building PXF from the source code, you need to install its software dependencies: Java and Go.

Create the host file

The hostfile_secondary_hosts file lists the host names or IP addresses of all secondary cluster hosts (that is, all hosts but the master host). This lets you use the gpscp and gpssh Greengage DB utilities to propagate changes on the master host to the remaining cluster hosts.

The file is created similarly to the one created during Greengage DBMS initialization.

  1. Log in to the master host as gpadmin and go to the home directory.

  2. Create the hostfile_secondary_hosts file:

    $ vi hostfile_secondary_hosts
  3. Add all the secondary host names to the file. For example, for a cluster comprising four hosts:

    • mdw — the master host;

    • smdw — the standby master host;

    • sdw1 and sdw2 — the segment hosts;

    the following lines have to be added to the file:

    smdw
    sdw1
    sdw2

    Make sure there are no blank lines or extra spaces.

  4. Save and close the file.

Install Java

  1. On the Greengage DB master host, download the required JDK 17 package from jdk.java.net to ~/workspace.

    $ cd ~/workspace
    $ curl -O https://download.java.net/java/GA/jdk17.0.1/<openjdk-17.tar.gz>
  2. Use the gpscp and the gpssh utilities to copy the downloaded package to secondary cluster hosts and then unpack it on all hosts:

    $ gpssh -f ~/hostfile_secondary_hosts "mkdir -p /home/gpadmin/workspace"
    $ gpscp -f ~/hostfile_secondary_hosts /home/gpadmin/workspace/<openjdk-17.tar.gz> =:/home/gpadmin/workspace/
    $ gpssh -f ~/hostfile_all_hosts 'tar -C /home/gpadmin/workspace -zxvf /home/gpadmin/workspace/<openjdk-17.tar.gz>'
  3. Set the JAVA_HOME environment variable to point at the location of the unpacked Java package. Add the variable to PATH and update the user’s shell startup file (for example, .bashrc). Use the gpssh utility to perform this on all cluster hosts:

    $ gpssh -f ~/hostfile_all_hosts "echo 'export JAVA_HOME=/home/gpadmin/workspace/<PATH_TO_JAVA_HOME>' >> ~/.bashrc"
    $ gpssh -f ~/hostfile_all_hosts "echo 'export PATH=\"\$JAVA_HOME/bin:\$PATH\"' >> ~/.bashrc"
  4. After editing the profile file, use gpssh to source it and apply the changes on all hosts:

    $ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc"

Install Go

  1. On the Greengage DB master host, download the required Go package from go.dev to ~/workspace and unpack it:

    $ curl -O https://go.dev/dl/<go.tar.gz>
    $ tar zxvf <go.tar.gz>
  2. Set the GOPATH and GOPROXY environment variables and update PATH accordingly:

    $ export GOPATH=<PATH_TO_GO_HOME>
    $ export GOPROXY=https://proxy.golang.org
    $ export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
  3. Install the required Ginkgo package:

    $ go install github.com/onsi/ginkgo/v2/ginkgo@latest

Build PXF from the source code

  1. Clone the PXF repository to ~/workspace:

    $ git clone https://github.com/greengagedb/pxf.git
  2. Optionally, checkout a tag to build and install a specific version:

    $ git checkout <tag_name>

    where <tag_name> matches the version number.

  3. Create the PXF installation directory (/usr/local/pxf) and change its ownership to gpadmin. Use the gpssh utility to perform this on all cluster hosts:

    $ gpssh -f ~/hostfile_all_hosts "sudo mkdir -p /usr/local/pxf"
    $ gpssh -f ~/hostfile_all_hosts "sudo chown -R gpadmin:gpadmin /usr/local/pxf"
  4. Set the PXF_HOME (denoting the PXF installation directory) and PXF_BASE (denoting the PXF runtime configuration directory) environment variables by adding the corresponding commands to .bashrc. Use the gpssh utility to perform this on all cluster hosts:

    $ gpssh -f ~/hostfile_all_hosts "echo 'export PXF_HOME=/usr/local/pxf' >> ~/.bashrc"
    $ gpssh -f ~/hostfile_all_hosts "echo 'export PXF_BASE=\"${HOME}/pxf-base\"' >> ~/.bashrc"
    NOTE

    If PXF_BASE is not set, the PXF_HOME value is applied. In this case, server configurations, libraries, or other configurations might get deleted when reinstalling PXF. See Installation directory for more information on the installation and configuration directories used by PXF.

  5. After editing the profile file, use gpssh to source it and apply the changes on all hosts:

    $ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc"
  6. Run make install to compile and install PXF:

    $ cd ~/workspace/pxf
    $ make install

    The output should look similar to the following:

    ...
    BUILD SUCCESSFUL in 4m 26s
    24 actionable tasks: 24 executed
    install -m 744 -d "build/stage/lib"
    install -m 744 -d "build/stage/lib/native"
    install -m 744 -d "build/stage/servers"
    install -m 744 -d "build/stage/servers/default"
    install -m 700 -d "build/stage/logs"
    install -m 700 -d "build/stage/run"
    install -m 700 -d "build/stage/keytabs"
    make[1]: Leaving directory '/home/gpadmin/workspace/pxf/server'
    ===> PXF compilation is complete <===
  7. Use the gpscp utility to propagate the PXF installation to secondary hosts:

    $ gpscp -f ~/hostfile_secondary_hosts -r /usr/local/pxf =:/usr/local/
  8. Add the PXF command-line executable location to PATH. Use the gpssh utility to perform this on all cluster hosts:

    $ gpssh -f ~/hostfile_all_hosts "echo 'export PATH=/usr/local/pxf/bin:\$PATH' >> ~/.bashrc"
  9. After editing the profile file, use gpssh to source it and apply the changes on all hosts:

    $ gpssh -f ~/hostfile_all_hosts "source ~/.bashrc"

Installation directory

The PXF installation directory (specified by PXF_HOME) includes both the PXF executables and the runtime configuration files.

Item Description

application

The location of the PXF Server application JAR file

bin

The location of the PXF command-line executables

conf

The location of user-customizable PXF configuration files for PXF runtime and logging configuration settings: pxf-application.properties, pxf-env.sh, pxf-log4j2.xml, and pxf-profiles.xml

keytabs

The default location of the Kerberos principal keytab file for secure authentication of the PXF service. The keytabs directory and contained files are readable only by the Greengage DB installation user, typically gpadmin

lib

The location of user-added runtime dependencies. The native subdirectory is the default PXF runtime directory for native libraries

logs

The PXF runtime log file directory. The logs directory and log files are readable only by the Greengage DB installation user, typically gpadmin

run

The default PXF run directory. After starting PXF, this directory contains a PXF process id file, pxf-app.pid. The run directory and its contents are readable only by the Greengage DB installation owner, typically gpadmin

servers

The configuration directory for PXF servers; each subdirectory contains a server definition, and the name of the subdirectory identifies the name of the server. The default server is named default

share

The directory for shared PXF files that may be required depending on the external data stores that you access. The directory initially includes only the PXF HBase JAR file

templates

The PXF directory for server configuration file templates

version

The file denoting the installed PXF version

Start PXF

PXF is not active after installation. Before using the framework, you must explicitly initialize and start the PXF service.

  1. Prepare a new base directory specified by the PXF_BASE environment variable during installation:

    $ pxf cluster prepare
  2. Start the PXF service on all Greengage DB cluster hosts:

    $ pxf cluster start

    The output should look similar to the following:

    Starting PXF on coordinator host, standby coordinator host, and 2 segment hosts...
    PXF started successfully on 4 out of 4 hosts

Check PXF

To check the PXF version, run the following command on each Greengage DB host:

$ pxf version

The output should look similar to the following:

PXF version 6.13.0-SNAPSHOT

Stop PXF

To stop the PXF service on all Greengage DB cluster hosts, run the following command:

$ pxf cluster stop