Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

PL/Container

Andrey Aksenov

The Greengage DB PL/Container language extension allows you to create and run PL/Python and PL/R user-defined functions (UDFs) inside Docker containers. By limiting code access to Greengage DB segment hosts, PL/Container reduces security risks. This topic explains how to install, set up, and use PL/Container.

About PL/Container

The PL/Container language extension enables you to create and run user-defined functions written in PL/Python or PL/R securely inside Docker containers. Docker provides a mechanism to package and execute applications in a loosely isolated environment called a container. For more information about Docker, see the Docker documentation at https://www.docker.com.

Running user-defined functions inside Docker containers provides the following benefits:

  • Functions are executed in a separate environment, decoupling user-defined data processing from core query execution. SQL operators, such as Scan and Filter, are executed on the query executor (QE) side, while advanced data analysis is performed inside the container.

  • User code is fully isolated from the host operating system and file system, preventing it from introducing security risks.

  • Functions cannot connect back to the DBMS when the container is configured with limited or no network access, ensuring data safety.

PL/Container function execution flow

Consider a query that selects table data across all Greengage DB segments and transforms the data using a PL/Container function:

  1. On the first call to the function within a segment, the query executor on the master host initiates a container on that segment host.

  2. The query executor then communicates with the running container to execute the function and retrieve the results.

  3. (Optional) The container may issue a Server Programming Interface (SPI) call — an internal SQL query executed by the container to fetch additional data from the database — and return the resulting data to the query executor.

  4. The container stays idle and ready, listening on a socket while using no CPU resources. PL/Container memory usage depends primarily on the amount of data cached in global dictionaries.

  5. When the Greengage DB session that started the container ends, the container connection is closed, and the container shuts down automatically.

Prerequisites

To demonstrate how to set up and use PL/Container, this guide uses an initialized and running cluster with four hosts:

  • mdw — the master host.

  • smdw — the standby master host.

  • sdw1 and sdw2 — the segment hosts.

In this demonstration environment, the Greengage DB administrative user (gpadmin) has sudo privileges, although this is not required for the DBMS to function correctly. For more information about initializing the DBMS, see Initialize DBMS.

Set up Docker

PL/Container requires Docker to be installed on all Greengage DB cluster hosts. The following instructions describe how to install and configure Docker on Ubuntu 22.04.

Install and start Docker

Perform these steps on each Greengage DB host:

  1. Log in as the gpadmin user:

    $ sudo su - gpadmin
  2. Update all packages that are currently installed on your system:

    $ sudo apt-get update
  3. Install the packages required for Docker installation and usage:

    $ sudo apt-get install ca-certificates curl
  4. Add the official GPG key for the Docker repository:

    $ sudo install -m 0755 -d /etc/apt/keyrings
    $ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    $ sudo chmod a+r /etc/apt/keyrings/docker.asc
  5. Add the Docker repository:

    $ echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "jammy") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
  6. Update the package index to include the Docker repository:

    $ sudo apt-get update
  7. Install Docker Engine and related components:

    $ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
  8. Start the Docker service:

    $ sudo systemctl start docker
  9. Enable Docker to start automatically at system boot:

    $ sudo systemctl enable docker

Grant Docker access to gpadmin

After installing Docker, you must configure permissions so that the gpadmin user can manage Docker images and containers:

  1. Add the gpadmin user to the docker group:

    $ sudo usermod -aG docker gpadmin
  2. Exit the current session and log in again as gpadmin to apply the updated group membership.

  3. Verify that Docker is accessible by running a Docker command:

    $ docker ps

    If Docker is correctly configured, this command displays an empty list of running containers.

Restart the cluster

After configuring Docker, restart the Greengage DBMS to apply the changes and allow it to access Docker:

$ gpstop -r

Install PL/Container

PL/Container must be installed on all Greengage DB cluster hosts.

Build PL/Container from sources

Perform the following steps on each Greengage DB cluster host while logged in as gpadmin:

  1. Clone the plcontainer repository:

    $ git clone https://github.com/arenadata/plcontainer.git
  2. Go to the plcontainer directory:

    $ cd plcontainer
  3. Check out a specific release tag to build a particular version, for example:

    $ git checkout 2.4.0
  4. Create a directory named build and enter it:

    $ mkdir build && cd build
  5. Configure the build using cmake, specifying the path to the pg_config executable:

    $ cmake .. -DPG_CONFIG=/usr/local/gpdb/bin/pg_config
  6. Compile PL/Container using make:

    $ make -j$(nproc)
  7. Run make install to install PL/Container into the Greengage DB directories:

    $ make -j$(nproc) install

    Example output after installation:

    -- Installing: /usr/local/gpdb/lib/postgresql/plcontainer.so
    -- Installing: /usr/local/gpdb/share/postgresql/extension/plcontainer--1.0.0.sql
    -- Installing: /usr/local/gpdb/share/postgresql/extension/plcontainer.control
    -- Installing: /usr/local/gpdb/share/postgresql/plcontainer/plcontainer_configuration.xml
    -- Installing: /usr/local/gpdb/bin/plcontainer
    -- Installing: /usr/local/gpdb/bin/plcontainer_clients/py3client.sh
    -- Installing: /usr/local/gpdb/bin/plcontainer_clients/pyclient.sh
    -- Installing: /usr/local/gpdb/bin/plcontainer_clients/rclient.sh

Manage PL/Container Docker images

Prepare default Docker images

Perform the following steps on the master host to build the default PL/Container Docker images:

  1. Enter the plcontainer/build directory:

    $ cd ~/plcontainer/build/
  2. Build the default Docker images:

    $ make images_artifact

    This step generates the Docker images required for PL/Python and PL/R environments.

  3. List the Docker images to verify creation:

    $ docker image ls

    The result should look like this:

    IMAGE                    ID             DISK USAGE   CONTENT SIZE   EXTRA
    python27.ubuntu:latest   0112f2a648c0        982MB          291MB
    python39.alpine:latest   5fc35ccbcd5f        397MB          100MB
    r.alpine:latest          2dceb4e0251b        753MB          203MB
  4. Check the generated Docker image files:

    $ ls -1 plcontainer-*.tar.gz

    The output should list the following files:

    plcontainer-python2-image-2.4.0-gp6.tar.gz
    plcontainer-python3-image-2.4.0-gp6.tar.gz
    plcontainer-r-image-2.4.0-gp6.tar.gz

Install images on cluster hosts

After building the Docker images, use the plcontainer utility to install them on all Greengage DB cluster hosts, so that PL/Container can use these images for executing functions:

  1. Install the Python 3 image across the cluster using plcontainer image-add:

    $ plcontainer image-add -f plcontainer-python3-image-2.4.0-gp6.tar.gz

    This command distributes the image file to all hosts and loads it into Docker. The output should look as follows:

    20251112:13:37:10:047201 plcontainer:mdw:gpadmin-[INFO]:-Checking whether docker is installed on all hosts...
    20251112:13:37:11:047201 plcontainer:mdw:gpadmin-[INFO]:-Distributing image file plcontainer-python3-image-2.4.0-gp6.tar.gz to all hosts...
    20251112:13:37:12:047201 plcontainer:mdw:gpadmin-[INFO]:-Loading image on all hosts...
    20251112:13:37:17:047201 plcontainer:mdw:gpadmin-[INFO]:-Removing temporary image files on all hosts...
  2. Similarly, install the R image:

    $ plcontainer image-add -f plcontainer-r-image-2.4.0-gp6.tar.gz

    This command installs the R environment image on all cluster hosts.

  3. (Optional) To delete an image that is not needed, use plcontainer image-delete:

    $ plcontainer image-delete -i python27.ubuntu:latest
  4. List all installed images to verify:

    $ plcontainer image-list

    The result might look as follows:

    IMAGE                    ID             DISK USAGE   CONTENT SIZE   EXTRA
    python39.alpine:latest   5fc35ccbcd5f        397MB          100MB
    r.alpine:latest          2dceb4e0251b        753MB          203MB

Manage PL/Container runtime configurations

PL/Container requires runtime configurations to link Docker images with language environments:

  1. Check existing runtime configurations:

    $ plcontainer runtime-show

    Expected result:

    PL/Container Runtime Configuration:
    No runtime or no specified runtime was found.
  2. Add a runtime configuration for Python 3:

    $ plcontainer runtime-add -r plc_python_shared -i python39.alpine:latest -l python3

    This command adds a new runtime and distributes the updated configuration to all cluster hosts. Expected output:

    20251112:13:49:29:050540 plcontainer:mdw:gpadmin-[INFO]:-Distributing file plcontainer_configuration.xml to all locations...
    20251112:13:49:30:050540 plcontainer:mdw:gpadmin-[INFO]:-Configuration has changed. Run "select * from plcontainer_refresh_config" in open sessions. New sessions will get new configuration automatically.
  3. Add a runtime configuration for R:

    $ plcontainer runtime-add -r plc_r_shared -i r.alpine:latest -l r
  4. Verify the runtime configurations again:

    $ plcontainer runtime-show

    The output should look as follows:

    PL/Container Runtime Configuration:
    ---------------------------------------------------------
      Runtime ID: plc_python_shared
      Linked Docker Image: python39.alpine:latest
      Runtime Setting(s):
      Shared Directory:
      ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro'
    ---------------------------------------------------------
    
    ---------------------------------------------------------
      Runtime ID: plc_r_shared
      Linked Docker Image: r.alpine:latest
      Runtime Setting(s):
      Shared Directory:
      ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro'
    ---------------------------------------------------------

The runtime-* commands of the plcontainer utility also let you replace runtime configurations, edit the plcontainer_configuration.xml file using the selected editor, verify that runtime configuration information on the segments matches the master, and perform other related tasks.

Install PL/Container extension

Before using PL/Container user-defined functions, you must install the plcontainer extension in each database that requires it:

  1. Create a new database:

    $ createdb marketplace
  2. Connect to the database:

    $ psql marketplace
  3. Register the PL/Container extension:

    CREATE EXTENSION plcontainer;

    Expected output:

    CREATE EXTENSION
NOTE

If your database no longer requires PL/Container, you can remove it using the DROP EXTENSION command. Use the CASCADE option to automatically remove all dependent objects, including any functions created with PL/Container.

Develop PL/Container functions

When PL/Container is enabled, the plcontainer language can be used in function definitions. A PL/Container user-defined function must include:

  • The first line: # container: <ID> — specifies the runtime configuration to use. <ID> matches a Runtime ID entry in the runtime configuration and determines the actual programming language (Python or R) used by the function.

  • The LANGUAGE plcontainer attribute — specifies that this is a PL/Container function.

Create a Python function

Create a Python function using PL/Container:

CREATE FUNCTION greet_from_python()
    RETURNS TEXT
AS
$$
# container: plc_python_shared
return 'Hello from Python!'
$$ LANGUAGE plcontainer;

Execute the created function:

SELECT greet_from_python();

The result should look like this:

 greet_from_python
--------------------
 Hello from Python!
(1 row)

Create an R function

You can also create an R function using PL/Container:

CREATE FUNCTION greet_from_r()
    RETURNS TEXT
AS
$$
# container: plc_r_shared
return("Hi from R!")
$$ LANGUAGE plcontainer;

Execute the function:

SELECT greet_from_r();

The output should look as follows:

 greet_from_r
--------------
 Hi from R!
(1 row)

Create a function with SPI (plpy)

CREATE TABLE customers
(
    customer_id   SERIAL PRIMARY KEY,
    first_name    VARCHAR(25),
    last_name     VARCHAR(25),
    email         VARCHAR(25)
)
    DISTRIBUTED REPLICATED;
INSERT INTO customers (first_name, last_name, email)
VALUES ('Andrew', 'Fuller', 'andrew@example.com'),
       ('Michael', 'Suyama', 'michael@testmail.com'),
       ('Robert', 'King', 'robert@demo.org');

Create a PL/Container function that executes an SPI query from inside the container environment using plpy.execute(). The function retrieves a customer record by ID and returns a formatted text representation. The query is first prepared with plpy.prepare() and then executed with a parameter list:

CREATE FUNCTION get_customer_by_id(p_id INT)
RETURNS TEXT
AS
$$
# container: plc_python_shared

plan = plpy.prepare(
    """
    SELECT customer_id, first_name, last_name, email
    FROM customers
    WHERE customer_id = $1
    """,
    ["int"]
)

rows = plpy.execute(plan, [p_id])

if not rows:
    return f"No customer with id={p_id}"

r = rows[0]
return f"{r['customer_id']}: {r['first_name']} {r['last_name']} <{r['email']}>"
$$ LANGUAGE plcontainer;

Execute the function:

SELECT get_customer_by_id(2);

The result should look like this:

            get_customer_by_id
------------------------------------------
 2: Michael Suyama <michael@testmail.com>
(1 row)

For information about the plpy methods, see the corresponding topic in the PostgreSQL documentation: Database Access.

PL/Container function limitations

Review the following limitations when creating and using PL/Container functions:

  • Greengage DB domains are not supported.

  • Multi-dimensional arrays are not supported.

  • Python and R call stack information is not displayed when debugging a user-defined function.

  • The plpy.execute() methods nrows() and status() are not supported.

  • The PL/Python function plpy.SPIError() is not supported.

  • Running the SAVEPOINT command with plpy.execute() is not supported.

  • Triggers are not supported.

  • OUT parameters are not supported.

  • The Python dict type cannot be returned from a PL/Python user-defined function. When returning the Python dict type from a function, you can convert the dict type to a Greengage DB user-defined data type.

NOTE

The Python module provides two global dictionaries, GD and SD, that retain data between function calls. GD shares data among all functions within the same container, while SD retains data across multiple calls of a single function. The data is accessible only within the same session while the container runs on a segment or master. For idle sessions, Greengage DB may terminate segment processes, which shut down the containers and clear the data in GD and SD.

Inspect and refresh runtimes in SQL

The PL/Container configuration file is read once, during the first invocation of a PL/Container function in each Greengage DB session that runs PL/Container functions. You can force the configuration file to be re-read by performing a SELECT command on the plcontainer_refresh_config view during the session. For example, this SELECT command forces the configuration file to be read:

SELECT * FROM plcontainer_refresh_config;

The command runs a PL/Container function that updates the configuration on the master and segment instances and returns the status of the refresh:

 gp_segment_id | plcontainer_refresh_local_config
---------------+----------------------------------
             2 | ok
             3 | ok
             1 | ok
             0 | ok
            -1 | ok
(5 rows)

You can also show all the configurations in the session by performing a SELECT command on the plcontainer_show_config view:

SELECT * FROM plcontainer_show_config;

The plcontainer_containers_summary() function displays information about the currently running Docker containers:

SELECT * FROM plcontainer_containers_summary();

The result might look like this:

 SEGMENT_ID |                           CONTAINER_ID                           |    UP_TIME    |  OWNER  | MEMORY_USAGE(KB)
------------+------------------------------------------------------------------+---------------+---------+------------------
 -1         | ab12497d6f0be12e4c93744b0981a812f89574f9add80fc80783e450b729bb68 | Up 19 seconds | gpadmin | 96596
 -1         | 92a15903672ae38d41e407e2cca96dc094c217b6c0776562b341b33429b97af8 | Up 29 seconds | gpadmin | 9608
(2 rows)

If a non-superuser runs plcontainer_containers_summary(), it shows information only for containers created by the user. If a superuser runs the function, information for all containers created by DBMS users is displayed.

Greengage DBMS starts Docker containers to run PL/Container functions and reuses them across calls. When a query executor process is idle for a period, it terminates and its containers are automatically removed. The idle timeout is controlled by the gp_vmem_idle_resource_timeout server configuration parameter, which you can adjust to improve container reuse and reduce startup overhead.

NOTE

Changing the gp_vmem_idle_resource_timeout value may affect overall cluster performance due to resource management considerations. This parameter also controls the release of other DBMS resources beyond Docker containers.

PL/Container resource management

PL/Container Docker containers share CPU and memory with DBMS processes on the same hosts. By default, Greengage DB does not track the resources used by running PL/Container instances. To prevent container workloads from affecting cluster performance, you can manage PL/Container resource usage at two levels:

  • Container level — set per-container CPU and memory limits using the memory_mb and cpu_share parameters in the PL/Container runtime configuration. These limits apply to each container instance for that runtime.

  • Runtime level — control the total CPU and memory used by all containers of a given runtime using resource groups. Assigning a resource group enforces the group’s limits on all containers in that runtime. If no resource group is configured, containers are limited only by system-level resources and may compete with the Greengage DB server for CPU and memory.

Use resource groups to manage PL/Container resources

You can use resource groups to control the total CPU and memory consumed by PL/Container runtimes. For external components such as PL/Container, resource groups rely on Linux control groups (cgroups) to enforce these limits. A resource group assigned to a PL/Container runtime applies its CPU and memory limits to all container instances created by that runtime.

When creating a resource group for PL/Container, you must set MEMORY_AUDITOR=cgroup and CONCURRENCY=0, along with the required CPU and memory limits. For example, the following command creates a resource group named rg_plcontainer:

CREATE RESOURCE GROUP rg_plcontainer WITH (
    CPUSET = '1;1',
    MEMORY_LIMIT = 20,
    CONCURRENCY = 0,
    MEMORY_AUDITOR = cgroup
    );

After creating the group, query the gp_toolkit.gp_resgroup_config view to retrieve its identifier:

SELECT * FROM gp_toolkit.gp_resgroup_config;

Example output:

 groupid |   groupname    | concurrency | cpu_rate_limit | memory_limit | memory_shared_quota | memory_spill_ratio | memory_auditor | cpuset
---------+----------------+-------------+----------------+--------------+---------------------+--------------------+----------------+--------
    6437 | default_group  | 20          | 30             | 0            | 80                  | 0                  | vmtracker      | -1
    6438 | admin_group    | 10          | 10             | 10           | 80                  | 0                  | vmtracker      | -1
   24993 | rg_plcontainer | 0           | -1             | 20           | 80                  | 0                  | cgroup         | 1;1
(3 rows)

Use the groupid value when assigning the resource group to a PL/Container runtime. You can specify it with the -s resource_group_id=<rg_groupid> option of the plcontainer runtime-add (new runtime) or plcontainer runtime-replace (existing runtime) commands, for example:

$ plcontainer runtime-add -r plc_python_shared -i python39.alpine:latest -l python3 -s resource_group_id=24993

PL/Container logging

To enable PL/Container logging for a specific runtime, set the use_container_logging attribute in the runtime configuration. Logging is turned off by default.

When container logging is enabled, the log level is controlled by the log_min_messages server parameter. This parameter affects both PL/Container log output and general DBMS logging. Consider the following points:

  • PL/Container logs include messages produced by functions running inside the Docker container. By default, these logs are forwarded to the system logging service. On Red Hat or CentOS systems, they are sent to the journald service.

  • For testing or troubleshooting PL/Container functions, you can temporarily adjust the log level in your session using the SET command. This example sets the log level to debug1:

    SET log_min_messages = 'debug1';
NOTE

The log_min_messages parameter controls both Greengage DB server logging and PL/Container logging. Increasing the log level may impact overall cluster performance even when no PL/Container user-defined functions are running.

Customize PL/Container images

This section describes how to create, verify, and deploy a custom Docker image for PL/Container.

Load base images

Before creating a custom image, load the default base images as described in Prepare default Docker images.

Create a new image

Start by creating a Dockerfile for your custom image:

$ vi Dockerfile

Example Dockerfile that adds NumPy:

FROM python39.alpine
RUN pip install numpy==2.0.2

Save the file and build the image:

$ docker build -t python39_with_numpy.alpine:latest .

Verify PL/Container compatibility with the new image

Run the PL/Container Python client inside the new image to confirm compatibility:

$ docker run -it --rm \
  -v $(readlink -f /usr/local/gpdb/bin/plcontainer_clients):/clientdir \
  python39_with_numpy.alpine:latest \
  /clientdir/py3client.sh

Example output:

plcontainer log: pythonclient, unknown, unknown, -1, WARNING: USE_CONTAINER_NETWORK is not set, use default value "no".
plcontainer log: pythonclient, unknown, unknown, -1, ERROR: Cannot bind the addr: No such file or directory

If you see errors that begin with plcontainer log, this indicates that the image is compatible with the PL/Container client. These errors are expected when running the client outside of a database and show that the container is executing the client correctly.

Save the image to an archive

Export the image to a tarball for installation across cluster hosts:

$ docker save python39_with_numpy.alpine:latest | gzip > python39_with_numpy.alpine.tar.gz

The resulting file is python39_with_numpy.alpine.tar.gz.

Install the image on cluster hosts

Use plcontainer image-add to distribute and load the image on all cluster hosts:

$ plcontainer image-add -f python39_with_numpy.alpine.tar.gz

Example output:

20251113:06:59:51:061170 plcontainer:mdw:gpadmin-[INFO]:-Checking whether docker is installed on all hosts...
20251113:06:59:51:061170 plcontainer:mdw:gpadmin-[INFO]:-Distributing image file python39_with_numpy.alpine.tar.gz to all hosts...
20251113:06:59:52:061170 plcontainer:mdw:gpadmin-[INFO]:-Loading image on all hosts...
20251113:06:59:59:061170 plcontainer:mdw:gpadmin-[INFO]:-Removing temporary image files on all hosts...

Create a runtime configuration

Create a PL/Container runtime that uses your custom image:

$ plcontainer runtime-add -r plc_python_numpy -i python39_with_numpy.alpine:latest -l python3

Example output:

20251113:07:00:47:061541 plcontainer:mdw:gpadmin-[INFO]:-Distributing file plcontainer_configuration.xml to all locations...
20251113:07:00:47:061541 plcontainer:mdw:gpadmin-[INFO]:-Configuration has changed. Run "select * from plcontainer_refresh_config" in open sessions. New sessions will get new configuration automatically.

Verify the runtime configuration:

$ plcontainer runtime-show -r plc_python_numpy

The result should look like this:

PL/Container Runtime Configuration:
---------------------------------------------------------
  Runtime ID: plc_python_numpy
  Linked Docker Image: python39_with_numpy.alpine:latest
  Runtime Setting(s):
  Shared Directory:
  ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro'
---------------------------------------------------------

Create a function

Connect to a database:

$ psql marketplace

Create a PL/Container function that uses NumPy:

CREATE OR REPLACE FUNCTION get_array_shape(arr TEXT)
    RETURNS TEXT
AS
$$
# container: plc_python_numpy
import numpy as np
import ast
input_array = np.array(ast.literal_eval(arr))
return f"Array shape: {input_array.shape}"
$$ LANGUAGE plcontainer;

Run the function:

SELECT get_array_shape('[[1,2,3],[4,5,6]]');

The result should look as follows:

   get_array_shape
---------------------
 Array shape: (2, 3)
(1 row)