PL/Python
PL/Python enables you to execute Python code within SQL functions. This topic explains how to install and work with PL/Python in your database environment.
About PL/Python
PL/Python is a procedural language that allows you to implement user-defined functions in Python. It provides access to the Python runtime and standard modules inside SQL functions.
Greengage DB supports two PL/Python variants:
-
plpythonu(plpython2u) — uses Python 2.7. This extension is included in Greengage DB by default. The nameplpythonualways refers to the Python 2–based implementation in this version of Greengage DB, regardless of what the operating system considers its "default" Python (for example, even if /usr/bin/python points to Python 3). -
plpython3u— uses Python 3.9 or later. To use this extension, set thePYTHONenvironment variable to the path of the desired python3 executable before compiling Greengage DB from source. Learn more in Build Greengage DB with Python 3 support.
Greengage DB PL/Python limitations
-
PL/Python triggers are not supported.
-
Updatable cursors (
UPDATE … WHERE CURRENT OFandDELETE … WHERE CURRENT OF) are not supported. -
Within a single Greengage DB session, all PL/Python functions must use the same PL/Python variant (
plpythonuorplpython3u). To call a function created with a different variant, you must start a new session. -
PL/Python is available only as an untrusted language.
NOTEIf you need to run PL/Python user-defined functions in an isolated and controlled environment, consider using the PL/Container extension.
Enable PL/Python support
Verify the Python environment
Before using PL/Python, verify that all hosts in the cluster use the same Python interpreter.
You can use the gpssh utility to check the interpreter path and version across all hosts:
-
First, confirm which Python executable is used:
$ gpssh -s -f hostfile_all_hosts which python3The hostfile_all_hosts file lists the names of all cluster hosts. Example output:
[sdw1] /usr/bin/python3 [smdw] /usr/bin/python3 [ mdw] /usr/bin/python3 [sdw2] /usr/bin/python3
-
Then confirm that all hosts run the same Python version:
$ gpssh -s -f hostfile_all_hosts python3 --versionExample output:
[sdw1] Python 3.10.12 [sdw2] Python 3.10.12 [ mdw] Python 3.10.12 [smdw] Python 3.10.12
-
If you plan to install additional Python packages, install pip on all hosts.
-
To install a package on all hosts, start a
gpsshinteractive session:$ gpssh -s -f hostfile_all_hostsAt the
=>prompt, type the installation command and pressEnter. For example, the following command installs NumPy on all hosts:pip install numpy==2.2.6Example output:
... [sdw1] Successfully installed numpy-2.2.6 ... [ mdw] Successfully installed numpy-2.2.6 ... [sdw2] Successfully installed numpy-2.2.6 ... [smdw] Successfully installed numpy-2.2.6 ...
To exit the interactive session, type
exitand pressEnter:exit
Build Greengage DB with Python 3 support
By default, Greengage DB is built with support for Python 2 (plpythonu).
To enable the Python 3 variant (plpython3u), you must build Greengage DB from source with a specific Python 3 interpreter.
See Build and install Greengage DB for the full build procedure.
To compile PL/Python against Python 3, install the corresponding development package so that headers and libraries are available. For example, on Ubuntu with Python 3.10, install the package as follows:
$ sudo apt install python3.10-dev
Then, set the PYTHON environment variable to the path of the desired python3 executable before compiling Greengage DB from source:
$ PYTHON=/usr/bin/python3 ./configure \
--with-perl \
--with-python \
--with-pythonsrc-ext \
--with-libxml \
--with-uuid=e2fs \
--with-openssl \
--with-gssapi \
--with-ldap \
--enable-ic-proxy \
--enable-orafce \
--enable-mapreduce \
--prefix=/usr/local/gpdb
Check available Python extensions
Before enabling PL/Python, check which PL/Python language extensions are available in your installation. Connect to a database using psql and run the following query:
SELECT *
FROM pg_available_extensions
WHERE name LIKE 'plpython%';
Example output:
name | default_version | installed_version | comment ------------+-----------------+-------------------+------------------------------------------- plpythonu | 1.0 | | PL/PythonU untrusted procedural language plpython2u | 1.0 | | PL/Python2U untrusted procedural language plpython3u | 1.0 | | PL/Python3U untrusted procedural language (3 rows)
The installed_version column shows whether an extension is already installed in the current database.
In the output above, all entries are empty, which indicates that none of the PL/Python variants have been installed yet.
Install a PL/Python extension
To create PL/Python user-defined functions, you must first install the plpythonu or plpython3u extension in each database where it is required.
For example, to install plpython3u:
CREATE EXTENSION plpython3u;
Expected output:
CREATE EXTENSION
If a database no longer requires PL/Python, you can remove the extension using the DROP EXTENSION command.
Use the CASCADE option to automatically remove all dependent objects, including any functions created with PL/Python:
DROP EXTENSION plpython3u CASCADE;
Create a PL/Python function
Once PL/Python is enabled in your database, you can use the plpython3u language to define functions.
The example below creates a function that returns a greeting message along with the major version of the Python interpreter being used:
CREATE FUNCTION greet_from_python(name TEXT)
RETURNS TEXT
AS
$$
import sys
return f"Hello, {name}, from Python {sys.version_info[0]}!"
$$ LANGUAGE plpython3u;
In PL/Python, the function body is a standard Python script.
Any arguments defined in the SQL function signature are available as ordinary Python variables within the script.
The function result is returned using return for a single value, or yield for a function returning a set.
Call the created function:
SELECT greet_from_python('Alice');
The result should look like this:
greet_from_python ------------------------------ Hello, Alice, from Python 3! (1 row)
Data type conversion
Type mapping
Function arguments
When a PL/Python function is called, its arguments are converted from their Greengage DB data types to corresponding Python types. The table below contains type mappings of function arguments.
| Greengage DB type | Python type |
|---|---|
BOOLEAN |
bool |
SMALLINT, INT, BIGINT, oid |
int |
REAL, DOUBLE |
float |
NUMERIC |
decimal |
BYTEA |
bytes |
NULL |
None (see Nulls) |
ARRAY |
list (see Arrays) |
Other primitive types |
string |
Composite types |
Composite-type arguments are passed to the function as Python mappings (see Pass composite values) |
Return values
A return value of a PL/Python function is converted to the declared Greengage DB return type using the rules described in the table below.
| Greengage DB type | Conversion rules |
|---|---|
BOOLEAN |
The return value will be evaluated according to the Python rules.
That is, |
BYTEA |
The return value will be converted to Python |
NULL |
None (see Nulls) |
ARRAY |
list (see Arrays) |
Other primitive types |
The return value is converted to a string using the Python built-in |
Composite types |
A composite result can be returned as a sequence, a mapping, or an object.
You can also use a function with |
Booleans
PL/Python functions that return a boolean value follow Python’s truth evaluation rules.
For example, 0 and the empty string evaluate to False, while most other values — including non-empty strings such as 'f' — evaluate to True.
Create a function that checks whether a number is positive:
CREATE FUNCTION is_positive(value INT)
RETURNS BOOLEAN
AS
$$
if value > 0:
return True
else:
return False
$$ LANGUAGE plpython3u;
Call the function:
SELECT is_positive(-20);
Expected result:
is_positive ------------- f (1 row)
Nulls
When an SQL NULL value is passed to a function, it appears as None in Python.
You can add the STRICT keyword to the function definition so that Greengage DB automatically returns NULL without calling the function when any argument is NULL.
Alternatively, you can handle NULL inputs explicitly in the function body:
CREATE FUNCTION get_max_value(value1 INT, value2 INT)
RETURNS INT
AS
$$
if value1 is None or value2 is None:
return None
return value1 if value1 > value2 else value2
$$ LANGUAGE plpython3u;
Call the function:
SELECT get_max_value(NULL, 15);
The function returns NULL:
get_max_value --------------- (1 row)
Arrays
SQL array values are passed into PL/Python as a Python list. To return an SQL array value from a PL/Python function, return a Python list.
For example, the command below creates a function that generates an array of integers:
CREATE FUNCTION get_int_array(max_value INT)
RETURNS INT[]
AS
$$
if max_value is None:
return None
return list(range(1, max_value + 1))
$$ LANGUAGE plpython3u;
Call the function:
SELECT get_int_array(5);
The result should look as follows:
get_int_array
---------------
{1,2,3,4,5}
(1 row)
Multidimensional arrays
PL/Python supports multidimensional SQL arrays by mapping them to nested Python lists. Each inner list corresponds to one dimension of the SQL array. When returning a multidimensional array, all inner lists at the same level must have the same length to produce a valid SQL array.
The following function demonstrates how a two-dimensional array is passed into PL/Python and returned without modification:
CREATE FUNCTION get_two_dim_array(arr INT[])
RETURNS INT[]
AS
$$
plpy.info(arr, type(arr))
return arr
$$ LANGUAGE plpython3u;
Call the function:
SELECT get_two_dim_array(ARRAY [[1,2,3],[10,20,30]]);
The result should look like this:
get_two_dim_array
----------------------
{{1,2,3},{10,20,30}}
(1 row)
Note on strings
In Python, strings are sequences, which can lead to unintended behavior when returning them as arrays. Create the following function:
CREATE FUNCTION get_str_array(input_text TEXT)
RETURNS VARCHAR[]
AS
$$
return input_text
$$ LANGUAGE plpython3u;
Call this function:
SELECT get_str_array('Hello!');
The result looks as follows:
get_str_array
---------------
{H,e,l,l,o,!}
(1 row)
To fix this, explicitly wrap the string in a list so that PL/Python returns a one-element array:
CREATE FUNCTION get_str_array_fixed(input_text TEXT)
RETURNS VARCHAR[]
AS
$$
return [input_text]
$$ LANGUAGE plpython3u;
Composite types
Pass composite values
CREATE TABLE sales
(
id INT,
date DATE,
amount DECIMAL(10, 2),
category TEXT
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
INSERT INTO sales(id, date, amount, category)
VALUES (1, '2025-08-01', 100.00, 'Books'),
(2, '2025-08-02', 200.00, 'Electronics'),
(3, '2025-08-03', 150.00, 'Books'),
(4, '2025-08-03', 300.00, 'Furniture');
Composite-type arguments are passed to a function as Python mappings.
The attribute names of the composite type are the element names of the mapping.
For example, consider a function that categorizes sales based on the amount field:
CREATE FUNCTION categorize_sale(s sales)
RETURNS text
AS
$$
amount = s["amount"]
if amount < 150:
return 'Low'
elif amount < 250:
return 'Medium'
else:
return 'High'
$$ LANGUAGE plpython3u;
Call the function:
SELECT id, amount, categorize_sale(sales) AS category_level
FROM sales;
The result should look like this:
id | amount | category_level ----+--------+---------------- 3 | 150.00 | Medium 4 | 300.00 | High 1 | 100.00 | Low 2 | 200.00 | Medium (4 rows)
Return composite values
PL/Python functions can return row or composite types in several ways.
The examples below use a composite type http_status, defined as follows:
CREATE TYPE http_status AS
(
code INT,
message TEXT
);
There are several ways to return a composite result from a PL/Python function, as described below.
Sequence
A sequence object returned by a PL/Python function must have the same number of elements as the composite type has fields.
The element at index 0 is assigned to the first field of the composite type, index 1 to the second field, and so on.
For example:
CREATE FUNCTION create_http_status(code INT, message TEXT)
RETURNS http_status
AS
$$
return code, message
$$ LANGUAGE plpython3u;
CREATE FUNCTION create_http_status(code INT, message TEXT)
RETURNS http_status
AS
$$
return [code, message]
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM create_http_status(200, 'OK');
The result should look as follows:
code | message ------+--------- 200 | OK (1 row)
Mapping
The value for each result type column is retrieved from the mapping with the column name as a key. Example:
CREATE FUNCTION create_http_status(code INT, message TEXT)
RETURNS http_status
AS
$$
return {"code": code, "message": message}
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM create_http_status(301, 'Moved Permanently');
Expected result:
code | message ------+------------------- 301 | Moved Permanently (1 row)
Object
PL/Python also allows you to return objects from a function to represent a composite type. The object’s attributes are mapped to the fields of the SQL composite type, similar to how a Python mapping works. For example:
CREATE FUNCTION create_http_status(code INT, message TEXT)
RETURNS http_status
AS
$$
class HttpStatus:
def __init__(self, c, m):
self.code = c
self.message = m
return HttpStatus(code, message)
$$ LANGUAGE plpython3u;
CREATE FUNCTION create_http_status(code INT, message TEXT)
RETURNS http_status
AS
$$
class HttpStatus: pass
HttpStatus.code = code
HttpStatus.message = message
return HttpStatus
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM create_http_status(404, 'Not Found');
Expected result:
code | message ------+----------- 404 | Not Found (1 row)
OUT parameters
You can use functions with output parameters:
CREATE FUNCTION create_http_status(
code_in INT,
message_in TEXT,
OUT code INT,
OUT message TEXT
)
AS
$$
return code_in, message_in
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM create_http_status(501, 'Not Implemented');
Expected result:
code | message ------+----------------- 501 | Not Implemented (1 row)
Set-returning functions
PL/Python functions can return sets of values, either scalar or composite types. Internally, the returned object is converted into an iterator, which allows the function to yield multiple rows.
The examples below use the following composite type:
CREATE TYPE user_event AS
(
event_type TEXT,
user_name TEXT
);
Sequence
Create a function:
CREATE FUNCTION generate_user_events_sequence(event_type TEXT)
RETURNS SETOF user_event
AS
$$
return (
[event_type, "alice"],
[event_type, "bob"],
[event_type, "charlie"]
)
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM generate_user_events_sequence('login');
Expected result:
event_type | user_name ------------+----------- login | alice login | bob login | charlie (3 rows)
Iterator
Create a function:
CREATE FUNCTION get_user_events_iterator(event_type TEXT)
RETURNS SETOF user_event
AS
$$
class EventProducer:
def __init__(self, event_type, user):
self.event_type = event_type
self.users = user
self.ndx = -1
def __iter__(self):
return self
def __next__(self):
self.ndx += 1
if self.ndx >= len(self.users):
raise StopIteration
return self.event_type, self.users[self.ndx]
return EventProducer(event_type, ["alice", "bob", "charlie"])
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM get_user_events_iterator('logout');
Expected result:
event_type | user_name ------------+----------- logout | alice logout | bob logout | charlie (3 rows)
Generator
Create a function:
CREATE FUNCTION get_user_events_generator(event_type TEXT)
RETURNS SETOF user_event
AS
$$
for user_name in ["alice", "bob", "charlie"]:
yield event_type, user_name
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM get_user_events_generator('password_change');
Expected result:
event_type | user_name -----------------+----------- password_change | alice password_change | bob password_change | charlie (3 rows)
Anonymous code blocks
PL/Python supports anonymous code blocks called with the DO statement.
An anonymous code block takes no arguments, and any value it might return is discarded.
Otherwise, it works like function code.
Example:
DO $$
for user in ['alice', 'bob', 'charlie']:
plpy.notice(f'User {user} is active')
$$ LANGUAGE plpython3u;
Expected result:
NOTICE: User alice is active CONTEXT: PL/Python anonymous code block NOTICE: User bob is active CONTEXT: PL/Python anonymous code block NOTICE: User charlie is active CONTEXT: PL/Python anonymous code block
Database access
PL/Python automatically imports a built-in Python module named plpy.
This module provides functions for issuing SQL commands, preparing execution plans, and interacting with the database from within a PL/Python function.
Commonly used functions include:
-
plpy.execute()— executes an SQL command immediately and returns the result. -
plpy.prepare()— prepares the execution plan for a query. -
plpy.cursor()— opens a cursor for incremental result fetching.
PL/Python exposes useful functions not described in this article, such as plpy.debug(), plpy.log(), and plpy.quote_literal().
For further reading, see the PostgreSQL documentation.
CREATE TABLE sales
(
id INT,
product_name TEXT,
category TEXT,
price NUMERIC(8, 2),
sale_date DATE
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
INSERT INTO sales(id, product_name, category, price, sale_date)
VALUES (1, 'Laptop', 'Electronics', 1200.00, '2025-01-03'),
(2, 'Headphones', 'Electronics', 150.00, '2025-01-05'),
(3, 'Coffee Maker', 'Home', 85.00, '2025-01-07'),
(4, 'T-Shirt', 'Clothing', 25.00, '2025-01-10'),
(5, 'Desk Chair', 'Home', 200.00, '2025-01-15'),
(6, 'Smartphone', 'Electronics', 800.00, '2025-01-22'),
(7, 'Blender', 'Home', 55.00, '2025-01-28'),
(8, 'Jacket', 'Clothing', 60.00, '2025-02-03'),
(9, 'Laptop', 'Electronics', 1200.00, '2025-02-07'),
(10, 'T-Shirt', 'Clothing', 25.00, '2025-02-10');
Execute a query
The plpy.execute() command executes the given query with the optional limit parameter and returns a result object.
The following example returns up to the requested number of rows from the sales table:
CREATE FUNCTION fetch_sales(limit_count INT)
RETURNS SETOF sales
AS
$$
query = "SELECT id, product_name, category, price, sale_date FROM sales"
rows = plpy.execute(query, limit_count)
return rows
$$ LANGUAGE plpython3u;
Call the function:
SELECT * FROM fetch_sales(3);
The result should look like this:
id | product_name | category | price | sale_date ----+--------------+-------------+---------+------------ 1 | Laptop | Electronics | 1200.00 | 2025-01-03 2 | Headphones | Electronics | 150.00 | 2025-01-05 9 | Laptop | Electronics | 1200.00 | 2025-02-07 (3 rows)
The result object returned by plpy.execute() behaves like a list of row objects, where each row acts similarly to a Python dictionary.
You can access rows by index and retrieve column values by using dictionary-style keys.
The following function demonstrates how to iterate over result rows and format them into a text output:
CREATE FUNCTION fetch_sales(limit_count INT)
RETURNS TEXT
AS
$$
rows = plpy.execute(
"SELECT id, product_name, price, sale_date FROM sales ORDER BY id",
limit_count
)
lines = [
f"{row['id']}: {row['product_name']}, ${row['price']} on {row['sale_date']}"
for row in rows
]
return "\n".join(lines)
$$ LANGUAGE plpython3u;
Call the function:
SELECT fetch_sales(4);
The result should look as follows:
fetch_sales --------------------------------------- 1: Laptop, $1200.00 on 2025-01-03 + 2: Headphones, $150.00 on 2025-01-05 + 3: Coffee Maker, $85.00 on 2025-01-07+ 4: T-Shirt, $25.00 on 2025-01-10 (1 row)
Prepare and execute a query
When your function executes the same SQL statement multiple times, it is more efficient to prepare the query once and then execute it with different parameters.
The plpy.prepare() function creates a prepared execution plan.
The returned plan object can be reused across calls within the same function invocation.
The following example prepares a query that filters rows by category and then executes it with the provided arguments:
CREATE FUNCTION fetch_sales_by_category(category TEXT, limit_count INT)
RETURNS TEXT
AS
$$
plan = plpy.prepare(
"SELECT id, product_name, price, sale_date "
"FROM sales "
"WHERE category = $1 "
"ORDER BY id",
["text"]
)
rows = plpy.execute(plan, [category], limit_count)
lines = [
f"{row['id']}: {row['product_name']} - ${row['price']} on {row['sale_date']}"
for row in rows
]
return "\n".join(lines)
$$ LANGUAGE plpython3u;
Call the function:
SELECT fetch_sales_by_category('Electronics', 5);
Expected result:
fetch_sales_by_category --------------------------------------- 1: Laptop - $1200.00 on 2025-01-03 + 2: Headphones - $150.00 on 2025-01-05+ 6: Smartphone - $800.00 on 2025-01-22+ 9: Laptop - $1200.00 on 2025-02-07 (1 row)
PL/Python provides a special dictionary called SD, which persists for the lifetime of the function.
This allows you to cache objects such as prepared plans so they do not need to be recreated each time the function is called.
This is particularly useful for frequently executed functions, where avoiding repeated plpy.prepare() calls can improve performance.
The following example stores the prepared plan in SD the first time the function runs, and reuses it on later calls:
CREATE FUNCTION fetch_sales_by_category_cached(cat TEXT, limit_count INT)
RETURNS TEXT
AS
$$
if "sales_plan" in SD:
plan = SD["sales_plan"]
else:
plan = plpy.prepare(
"SELECT id, product_name, price, sale_date "
"FROM sales "
"WHERE category = $1 "
"ORDER BY id",
["text"]
)
SD["sales_plan"] = plan
rows = plpy.execute(plan, [cat], limit_count)
lines = [
f"{row['id']}: {row['product_name']} - ${row['price']} on {row['sale_date']}"
for row in rows
]
return "\n".join(lines)
$$ LANGUAGE plpython3u;
Call the function:
SELECT fetch_sales_by_category_cached('Electronics', 5);
Expected result:
fetch_sales_by_category_cached --------------------------------------- 1: Laptop - $1200.00 on 2025-01-03 + 2: Headphones - $150.00 on 2025-01-05+ 6: Smartphone - $800.00 on 2025-01-22+ 9: Laptop - $1200.00 on 2025-02-07 (1 row)
Fetch results in chunks
For large result sets, it is often inefficient or impossible to load all rows into memory at once.
The plpy.cursor() function allows you to execute a query and fetch the results incrementally, retrieving only a subset of rows at a time.
A PL/Python cursor represents a server-side cursor managed by the database backend.
Each call to cursor.fetch(n) retrieves up to n additional rows, advancing the cursor position with each request.
The following function demonstrates how to read a query result in batches and log basic information about each batch:
CREATE FUNCTION fetch_sales_in_batches(batch_size INT)
RETURNS VOID
AS
$$
cursor = plpy.cursor(
"SELECT id, product_name, price, sale_date FROM sales ORDER BY id"
)
batch = 1
while True:
rows = cursor.fetch(batch_size)
if not rows:
break
plpy.info(f"Batch #{batch}, rows returned: {len(rows)}")
batch += 1
$$ LANGUAGE plpython3u;
Call the function:
SELECT fetch_sales_in_batches(3);
Expected result:
INFO: Batch #1, rows returned: 3 CONTEXT: PL/Python function "fetch_sales_in_batches" INFO: Batch #2, rows returned: 3 CONTEXT: PL/Python function "fetch_sales_in_batches" INFO: Batch #3, rows returned: 3 CONTEXT: PL/Python function "fetch_sales_in_batches" INFO: Batch #4, rows returned: 1 CONTEXT: PL/Python function "fetch_sales_in_batches"
Explicit subtransactions
PL/Python also supports explicit subtransactions, which allow you to execute a block of code that can be rolled back independently of the main transaction. This is useful for handling errors or exceptions within a function without aborting the entire transaction. For details and examples, see Explicit Subtransactions.
Use Python modules
PL/Python can import modules installed in the system Python environment used by the database. Learn how to install modules in the Verify the Python environment section.
The following example demonstrates how to use NumPy to compute the shape of an array passed to the function as a text literal:
CREATE FUNCTION get_array_shape(arr TEXT)
RETURNS TEXT
AS
$$
import numpy
import ast
input_array = numpy.array(ast.literal_eval(arr))
return f"Array shape: {input_array.shape}"
$$ LANGUAGE plpython3u;
Call the function:
SELECT get_array_shape('[[1,2,3],[4,5,6]]');
Expected result:
get_array_shape --------------------- Array shape: (2, 3) (1 row)