PL/Container
- About PL/Container
- Prerequisites
- Set up Docker
- Install PL/Container
- Manage PL/Container Docker images
- Manage PL/Container runtime configurations
- Install PL/Container extension
- Develop PL/Container functions
- Inspect and refresh runtimes in SQL
- PL/Container resource management
- PL/Container logging
- Customize PL/Container images
The Greengage DB PL/Container language extension allows you to create and run PL/Python and PL/R user-defined functions (UDFs) inside Docker containers. By limiting code access to Greengage DB segment hosts, PL/Container reduces security risks. This topic explains how to install, set up, and use PL/Container.
About PL/Container
The PL/Container language extension enables you to create and run user-defined functions written in PL/Python or PL/R securely inside Docker containers. Docker provides a mechanism to package and execute applications in a loosely isolated environment called a container. For more information about Docker, see the Docker documentation at https://www.docker.com.
Running user-defined functions inside Docker containers provides the following benefits:
-
Functions are executed in a separate environment, decoupling user-defined data processing from core query execution. SQL operators, such as
ScanandFilter, are executed on the query executor (QE) side, while advanced data analysis is performed inside the container. -
User code is fully isolated from the host operating system and file system, preventing it from introducing security risks.
-
Functions cannot connect back to the DBMS when the container is configured with limited or no network access, ensuring data safety.
PL/Container function execution flow
Consider a query that selects table data across all Greengage DB segments and transforms the data using a PL/Container function:
-
On the first call to the function within a segment, the query executor on the master host initiates a container on that segment host.
-
The query executor then communicates with the running container to execute the function and retrieve the results.
-
(Optional) The container may issue a Server Programming Interface (SPI) call — an internal SQL query executed by the container to fetch additional data from the database — and return the resulting data to the query executor.
-
The container stays idle and ready, listening on a socket while using no CPU resources. PL/Container memory usage depends primarily on the amount of data cached in global dictionaries.
-
When the Greengage DB session that started the container ends, the container connection is closed, and the container shuts down automatically.
Prerequisites
To demonstrate how to set up and use PL/Container, this guide uses an initialized and running cluster with four hosts:
-
mdw— the master host. -
smdw— the standby master host. -
sdw1andsdw2— the segment hosts.
In this demonstration environment, the Greengage DB administrative user (gpadmin) has sudo privileges, although this is not required for the DBMS to function correctly.
For more information about initializing the DBMS, see Initialize DBMS.
Set up Docker
PL/Container requires Docker to be installed on all Greengage DB cluster hosts. The following instructions describe how to install and configure Docker on Ubuntu 22.04.
Install and start Docker
Perform these steps on each Greengage DB host:
-
Log in as the
gpadminuser:$ sudo su - gpadmin -
Update all packages that are currently installed on your system:
$ sudo apt-get update -
Install the packages required for Docker installation and usage:
$ sudo apt-get install ca-certificates curl -
Add the official GPG key for the Docker repository:
$ sudo install -m 0755 -d /etc/apt/keyrings $ sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc $ sudo chmod a+r /etc/apt/keyrings/docker.asc -
Add the Docker repository:
$ echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "jammy") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null -
Update the package index to include the Docker repository:
$ sudo apt-get update -
Install Docker Engine and related components:
$ sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin -
Start the Docker service:
$ sudo systemctl start docker -
Enable Docker to start automatically at system boot:
$ sudo systemctl enable docker
Grant Docker access to gpadmin
After installing Docker, you must configure permissions so that the gpadmin user can manage Docker images and containers:
-
Add the
gpadminuser to thedockergroup:$ sudo usermod -aG docker gpadmin -
Exit the current session and log in again as
gpadminto apply the updated group membership. -
Verify that Docker is accessible by running a Docker command:
$ docker psIf Docker is correctly configured, this command displays an empty list of running containers.
Restart the cluster
After configuring Docker, restart the Greengage DBMS to apply the changes and allow it to access Docker:
$ gpstop -r
Install PL/Container
PL/Container must be installed on all Greengage DB cluster hosts.
Build PL/Container from sources
Perform the following steps on each Greengage DB cluster host while logged in as gpadmin:
-
Clone the plcontainer repository:
$ git clone https://github.com/arenadata/plcontainer.git -
Go to the plcontainer directory:
$ cd plcontainer -
Check out a specific release tag to build a particular version, for example:
$ git checkout 2.4.0 -
Create a directory named build and enter it:
$ mkdir build && cd build -
Configure the build using
cmake, specifying the path to the pg_config executable:$ cmake .. -DPG_CONFIG=/usr/local/gpdb/bin/pg_config -
Compile PL/Container using
make:$ make -j$(nproc) -
Run
make installto install PL/Container into the Greengage DB directories:$ make -j$(nproc) installExample output after installation:
-- Installing: /usr/local/gpdb/lib/postgresql/plcontainer.so -- Installing: /usr/local/gpdb/share/postgresql/extension/plcontainer--1.0.0.sql -- Installing: /usr/local/gpdb/share/postgresql/extension/plcontainer.control -- Installing: /usr/local/gpdb/share/postgresql/plcontainer/plcontainer_configuration.xml -- Installing: /usr/local/gpdb/bin/plcontainer -- Installing: /usr/local/gpdb/bin/plcontainer_clients/py3client.sh -- Installing: /usr/local/gpdb/bin/plcontainer_clients/pyclient.sh -- Installing: /usr/local/gpdb/bin/plcontainer_clients/rclient.sh
Manage PL/Container Docker images
Prepare default Docker images
Perform the following steps on the master host to build the default PL/Container Docker images:
-
Enter the plcontainer/build directory:
$ cd ~/plcontainer/build/ -
Build the default Docker images:
$ make images_artifactThis step generates the Docker images required for PL/Python and PL/R environments.
-
List the Docker images to verify creation:
$ docker image lsThe result should look like this:
IMAGE ID DISK USAGE CONTENT SIZE EXTRA python27.ubuntu:latest 0112f2a648c0 982MB 291MB python39.alpine:latest 5fc35ccbcd5f 397MB 100MB r.alpine:latest 2dceb4e0251b 753MB 203MB
-
Check the generated Docker image files:
$ ls -1 plcontainer-*.tar.gzThe output should list the following files:
plcontainer-python2-image-2.4.0-gp6.tar.gz plcontainer-python3-image-2.4.0-gp6.tar.gz plcontainer-r-image-2.4.0-gp6.tar.gz
Install images on cluster hosts
After building the Docker images, use the plcontainer utility to install them on all Greengage DB cluster hosts, so that PL/Container can use these images for executing functions:
-
Install the Python 3 image across the cluster using
plcontainer image-add:$ plcontainer image-add -f plcontainer-python3-image-2.4.0-gp6.tar.gzThis command distributes the image file to all hosts and loads it into Docker. The output should look as follows:
20251112:13:37:10:047201 plcontainer:mdw:gpadmin-[INFO]:-Checking whether docker is installed on all hosts... 20251112:13:37:11:047201 plcontainer:mdw:gpadmin-[INFO]:-Distributing image file plcontainer-python3-image-2.4.0-gp6.tar.gz to all hosts... 20251112:13:37:12:047201 plcontainer:mdw:gpadmin-[INFO]:-Loading image on all hosts... 20251112:13:37:17:047201 plcontainer:mdw:gpadmin-[INFO]:-Removing temporary image files on all hosts...
-
Similarly, install the R image:
$ plcontainer image-add -f plcontainer-r-image-2.4.0-gp6.tar.gzThis command installs the R environment image on all cluster hosts.
-
(Optional) To delete an image that is not needed, use
plcontainer image-delete:$ plcontainer image-delete -i python27.ubuntu:latest -
List all installed images to verify:
$ plcontainer image-listThe result might look as follows:
IMAGE ID DISK USAGE CONTENT SIZE EXTRA python39.alpine:latest 5fc35ccbcd5f 397MB 100MB r.alpine:latest 2dceb4e0251b 753MB 203MB
Manage PL/Container runtime configurations
PL/Container requires runtime configurations to link Docker images with language environments:
-
Check existing runtime configurations:
$ plcontainer runtime-showExpected result:
PL/Container Runtime Configuration: No runtime or no specified runtime was found.
-
Add a runtime configuration for Python 3:
$ plcontainer runtime-add -r plc_python_shared -i python39.alpine:latest -l python3This command adds a new runtime and distributes the updated configuration to all cluster hosts. Expected output:
20251112:13:49:29:050540 plcontainer:mdw:gpadmin-[INFO]:-Distributing file plcontainer_configuration.xml to all locations... 20251112:13:49:30:050540 plcontainer:mdw:gpadmin-[INFO]:-Configuration has changed. Run "select * from plcontainer_refresh_config" in open sessions. New sessions will get new configuration automatically.
-
Add a runtime configuration for R:
$ plcontainer runtime-add -r plc_r_shared -i r.alpine:latest -l r -
Verify the runtime configurations again:
$ plcontainer runtime-showThe output should look as follows:
PL/Container Runtime Configuration: --------------------------------------------------------- Runtime ID: plc_python_shared Linked Docker Image: python39.alpine:latest Runtime Setting(s): Shared Directory: ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' --------------------------------------------------------- --------------------------------------------------------- Runtime ID: plc_r_shared Linked Docker Image: r.alpine:latest Runtime Setting(s): Shared Directory: ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' ---------------------------------------------------------
The runtime-* commands of the plcontainer utility also let you replace runtime configurations, edit the plcontainer_configuration.xml file using the selected editor, verify that runtime configuration information on the segments matches the master, and perform other related tasks.
See also: PL/Container resource management, PL/Container logging.
Install PL/Container extension
Before using PL/Container user-defined functions, you must install the plcontainer extension in each database that requires it:
-
Create a new database:
$ createdb marketplace -
Connect to the database:
$ psql marketplace -
Register the PL/Container extension:
CREATE EXTENSION plcontainer;Expected output:
CREATE EXTENSION
If your database no longer requires PL/Container, you can remove it using the DROP EXTENSION command.
Use the CASCADE option to automatically remove all dependent objects, including any functions created with PL/Container.
Develop PL/Container functions
When PL/Container is enabled, the plcontainer language can be used in function definitions.
A PL/Container user-defined function must include:
-
The first line:
# container: <ID>— specifies the runtime configuration to use.<ID>matches aRuntime IDentry in the runtime configuration and determines the actual programming language (Python or R) used by the function. -
The
LANGUAGE plcontainerattribute — specifies that this is a PL/Container function.
Create a Python function
Create a Python function using PL/Container:
CREATE FUNCTION greet_from_python()
RETURNS TEXT
AS
$$
# container: plc_python_shared
return 'Hello from Python!'
$$ LANGUAGE plcontainer;
Execute the created function:
SELECT greet_from_python();
The result should look like this:
greet_from_python -------------------- Hello from Python! (1 row)
Create an R function
You can also create an R function using PL/Container:
CREATE FUNCTION greet_from_r()
RETURNS TEXT
AS
$$
# container: plc_r_shared
return("Hi from R!")
$$ LANGUAGE plcontainer;
Execute the function:
SELECT greet_from_r();
The output should look as follows:
greet_from_r -------------- Hi from R! (1 row)
Create a function with SPI (plpy)
CREATE TABLE customers
(
customer_id SERIAL PRIMARY KEY,
first_name VARCHAR(25),
last_name VARCHAR(25),
email VARCHAR(25)
)
DISTRIBUTED REPLICATED;
INSERT INTO customers (first_name, last_name, email)
VALUES ('Andrew', 'Fuller', 'andrew@example.com'),
('Michael', 'Suyama', 'michael@testmail.com'),
('Robert', 'King', 'robert@demo.org');
Create a PL/Container function that executes an SPI query from inside the container environment using plpy.execute().
The function retrieves a customer record by ID and returns a formatted text representation.
The query is first prepared with plpy.prepare() and then executed with a parameter list:
CREATE FUNCTION get_customer_by_id(p_id INT)
RETURNS TEXT
AS
$$
# container: plc_python_shared
plan = plpy.prepare(
"""
SELECT customer_id, first_name, last_name, email
FROM customers
WHERE customer_id = $1
""",
["int"]
)
rows = plpy.execute(plan, [p_id])
if not rows:
return f"No customer with id={p_id}"
r = rows[0]
return f"{r['customer_id']}: {r['first_name']} {r['last_name']} <{r['email']}>"
$$ LANGUAGE plcontainer;
Execute the function:
SELECT get_customer_by_id(2);
The result should look like this:
get_customer_by_id ------------------------------------------ 2: Michael Suyama <michael@testmail.com> (1 row)
For information about the plpy methods, see the corresponding topic in the PostgreSQL documentation: Database Access.
PL/Container function limitations
Review the following limitations when creating and using PL/Container functions:
-
Greengage DB domains are not supported.
-
Multi-dimensional arrays are not supported.
-
Python and R call stack information is not displayed when debugging a user-defined function.
-
The
plpy.execute()methodsnrows()andstatus()are not supported. -
The PL/Python function
plpy.SPIError()is not supported. -
Running the
SAVEPOINTcommand withplpy.execute()is not supported. -
Triggers are not supported.
-
OUTparameters are not supported. -
The Python
dicttype cannot be returned from a PL/Python user-defined function. When returning the Pythondicttype from a function, you can convert thedicttype to a Greengage DB user-defined data type.
The Python module provides two global dictionaries, GD and SD, that retain data between function calls.
GD shares data among all functions within the same container, while SD retains data across multiple calls of a single function.
The data is accessible only within the same session while the container runs on a segment or master.
For idle sessions, Greengage DB may terminate segment processes, which shut down the containers and clear the data in GD and SD.
Inspect and refresh runtimes in SQL
The PL/Container configuration file is read once, during the first invocation of a PL/Container function in each Greengage DB session that runs PL/Container functions.
You can force the configuration file to be re-read by performing a SELECT command on the plcontainer_refresh_config view during the session.
For example, this SELECT command forces the configuration file to be read:
SELECT * FROM plcontainer_refresh_config;
The command runs a PL/Container function that updates the configuration on the master and segment instances and returns the status of the refresh:
gp_segment_id | plcontainer_refresh_local_config
---------------+----------------------------------
2 | ok
3 | ok
1 | ok
0 | ok
-1 | ok
(5 rows)
You can also show all the configurations in the session by performing a SELECT command on the plcontainer_show_config view:
SELECT * FROM plcontainer_show_config;
The plcontainer_containers_summary() function displays information about the currently running Docker containers:
SELECT * FROM plcontainer_containers_summary();
The result might look like this:
SEGMENT_ID | CONTAINER_ID | UP_TIME | OWNER | MEMORY_USAGE(KB) ------------+------------------------------------------------------------------+---------------+---------+------------------ -1 | ab12497d6f0be12e4c93744b0981a812f89574f9add80fc80783e450b729bb68 | Up 19 seconds | gpadmin | 96596 -1 | 92a15903672ae38d41e407e2cca96dc094c217b6c0776562b341b33429b97af8 | Up 29 seconds | gpadmin | 9608 (2 rows)
If a non-superuser runs plcontainer_containers_summary(), it shows information only for containers created by the user.
If a superuser runs the function, information for all containers created by DBMS users is displayed.
Greengage DBMS starts Docker containers to run PL/Container functions and reuses them across calls.
When a query executor process is idle for a period, it terminates and its containers are automatically removed.
The idle timeout is controlled by the gp_vmem_idle_resource_timeout server configuration parameter, which you can adjust to improve container reuse and reduce startup overhead.
Changing the gp_vmem_idle_resource_timeout value may affect overall cluster performance due to resource management considerations.
This parameter also controls the release of other DBMS resources beyond Docker containers.
PL/Container resource management
PL/Container Docker containers share CPU and memory with DBMS processes on the same hosts. By default, Greengage DB does not track the resources used by running PL/Container instances. To prevent container workloads from affecting cluster performance, you can manage PL/Container resource usage at two levels:
-
Container level — set per-container CPU and memory limits using the
memory_mbandcpu_shareparameters in the PL/Container runtime configuration. These limits apply to each container instance for that runtime. -
Runtime level — control the total CPU and memory used by all containers of a given runtime using resource groups. Assigning a resource group enforces the group’s limits on all containers in that runtime. If no resource group is configured, containers are limited only by system-level resources and may compete with the Greengage DB server for CPU and memory.
Use resource groups to manage PL/Container resources
You can use resource groups to control the total CPU and memory consumed by PL/Container runtimes. For external components such as PL/Container, resource groups rely on Linux control groups (cgroups) to enforce these limits. A resource group assigned to a PL/Container runtime applies its CPU and memory limits to all container instances created by that runtime.
When creating a resource group for PL/Container, you must set MEMORY_AUDITOR=cgroup and CONCURRENCY=0, along with the required CPU and memory limits.
For example, the following command creates a resource group named rg_plcontainer:
CREATE RESOURCE GROUP rg_plcontainer WITH (
CPUSET = '1;1',
MEMORY_LIMIT = 20,
CONCURRENCY = 0,
MEMORY_AUDITOR = cgroup
);
After creating the group, query the gp_toolkit.gp_resgroup_config view to retrieve its identifier:
SELECT * FROM gp_toolkit.gp_resgroup_config;
Example output:
groupid | groupname | concurrency | cpu_rate_limit | memory_limit | memory_shared_quota | memory_spill_ratio | memory_auditor | cpuset
---------+----------------+-------------+----------------+--------------+---------------------+--------------------+----------------+--------
6437 | default_group | 20 | 30 | 0 | 80 | 0 | vmtracker | -1
6438 | admin_group | 10 | 10 | 10 | 80 | 0 | vmtracker | -1
24993 | rg_plcontainer | 0 | -1 | 20 | 80 | 0 | cgroup | 1;1
(3 rows)
Use the groupid value when assigning the resource group to a PL/Container runtime.
You can specify it with the -s resource_group_id=<rg_groupid> option of the plcontainer runtime-add (new runtime) or plcontainer runtime-replace (existing runtime) commands, for example:
$ plcontainer runtime-add -r plc_python_shared -i python39.alpine:latest -l python3 -s resource_group_id=24993
PL/Container logging
To enable PL/Container logging for a specific runtime, set the use_container_logging attribute in the runtime configuration.
Logging is turned off by default.
When container logging is enabled, the log level is controlled by the log_min_messages server parameter.
This parameter affects both PL/Container log output and general DBMS logging.
Consider the following points:
-
PL/Container logs include messages produced by functions running inside the Docker container. By default, these logs are forwarded to the system logging service. On Red Hat or CentOS systems, they are sent to the
journaldservice. -
For testing or troubleshooting PL/Container functions, you can temporarily adjust the log level in your session using the
SETcommand. This example sets the log level todebug1:SET log_min_messages = 'debug1';
The log_min_messages parameter controls both Greengage DB server logging and PL/Container logging.
Increasing the log level may impact overall cluster performance even when no PL/Container user-defined functions are running.
Customize PL/Container images
This section describes how to create, verify, and deploy a custom Docker image for PL/Container.
Load base images
Before creating a custom image, load the default base images as described in Prepare default Docker images.
Create a new image
Start by creating a Dockerfile for your custom image:
$ vi Dockerfile
Example Dockerfile that adds NumPy:
FROM python39.alpine
RUN pip install numpy==2.0.2
Save the file and build the image:
$ docker build -t python39_with_numpy.alpine:latest .
Verify PL/Container compatibility with the new image
Run the PL/Container Python client inside the new image to confirm compatibility:
$ docker run -it --rm \
-v $(readlink -f /usr/local/gpdb/bin/plcontainer_clients):/clientdir \
python39_with_numpy.alpine:latest \
/clientdir/py3client.sh
Example output:
plcontainer log: pythonclient, unknown, unknown, -1, WARNING: USE_CONTAINER_NETWORK is not set, use default value "no". plcontainer log: pythonclient, unknown, unknown, -1, ERROR: Cannot bind the addr: No such file or directory
If you see errors that begin with plcontainer log, this indicates that the image is compatible with the PL/Container client.
These errors are expected when running the client outside of a database and show that the container is executing the client correctly.
Save the image to an archive
Export the image to a tarball for installation across cluster hosts:
$ docker save python39_with_numpy.alpine:latest | gzip > python39_with_numpy.alpine.tar.gz
The resulting file is python39_with_numpy.alpine.tar.gz.
Install the image on cluster hosts
Use plcontainer image-add to distribute and load the image on all cluster hosts:
$ plcontainer image-add -f python39_with_numpy.alpine.tar.gz
Example output:
20251113:06:59:51:061170 plcontainer:mdw:gpadmin-[INFO]:-Checking whether docker is installed on all hosts... 20251113:06:59:51:061170 plcontainer:mdw:gpadmin-[INFO]:-Distributing image file python39_with_numpy.alpine.tar.gz to all hosts... 20251113:06:59:52:061170 plcontainer:mdw:gpadmin-[INFO]:-Loading image on all hosts... 20251113:06:59:59:061170 plcontainer:mdw:gpadmin-[INFO]:-Removing temporary image files on all hosts...
Create a runtime configuration
Create a PL/Container runtime that uses your custom image:
$ plcontainer runtime-add -r plc_python_numpy -i python39_with_numpy.alpine:latest -l python3
Example output:
20251113:07:00:47:061541 plcontainer:mdw:gpadmin-[INFO]:-Distributing file plcontainer_configuration.xml to all locations... 20251113:07:00:47:061541 plcontainer:mdw:gpadmin-[INFO]:-Configuration has changed. Run "select * from plcontainer_refresh_config" in open sessions. New sessions will get new configuration automatically.
Verify the runtime configuration:
$ plcontainer runtime-show -r plc_python_numpy
The result should look like this:
PL/Container Runtime Configuration: --------------------------------------------------------- Runtime ID: plc_python_numpy Linked Docker Image: python39_with_numpy.alpine:latest Runtime Setting(s): Shared Directory: ---- Shared Directory From HOST '/usr/local/gpdb/bin/plcontainer_clients' to Container '/clientdir', access mode is 'ro' ---------------------------------------------------------
Create a function
Connect to a database:
$ psql marketplace
Create a PL/Container function that uses NumPy:
CREATE OR REPLACE FUNCTION get_array_shape(arr TEXT)
RETURNS TEXT
AS
$$
# container: plc_python_numpy
import numpy as np
import ast
input_array = np.array(ast.literal_eval(arr))
return f"Array shape: {input_array.shape}"
$$ LANGUAGE plcontainer;
Run the function:
SELECT get_array_shape('[[1,2,3],[4,5,6]]');
The result should look as follows:
get_array_shape --------------------- Array shape: (2, 3) (1 row)