Use PXF JDBC connector to read and write data between Greengage DB and PostgreSQL
This example demonstrates how to configure and use the PXF JDBC connector for reading and writing data between Greengage DB and PostgreSQL by using external tables.
To try it out, make sure you have a PostgreSQL server running and accessible in addition to the Greengage DB server.
In the example, the PostgreSQL 14 server is running on the host named pserver, and the Greengage DB cluster comprises four hosts:
-
mdw— the master host; -
smdw— the standby master host; -
sdw1andsdw2— the segment hosts.
Create source PostgreSQL tables
-
On the PostgreSQL host, connect to the default PostgreSQL database as the
postgresuser:$ psql -U postgres -
Create a database named
customersand connect to it:CREATE DATABASE customers; \c customers -
Create a table named
customersand insert data into it:CREATE TABLE customers ( id INTEGER, name TEXT, city TEXT ); INSERT INTO customers (id, name, city) VALUES (1, 'Alice Smith', 'New York'), (2, 'Bob Johnson', 'Los Angeles'), (3, 'Charlie Brown', 'Chicago'), (4, 'David Lee', 'San Francisco'), (5, 'Emily Davis', 'Houston'), (6, 'Frank Wilson', 'Phoenix'), (7, 'Grace Taylor', 'Philadelphia'), (8, 'Henry Moore', 'Dallas'), (9, 'Isabella Garcia', 'Seattle'), (10, 'Jack Rodriguez', 'Denver'); -
Create a table named
orders:CREATE TABLE orders ( order_id INTEGER, customer_id INTEGER, order_date DATE, amount DECIMAL(10, 2) ); -
Create a PostgreSQL user named
pxfuser:CREATE USER pxfuser WITH PASSWORD 'changeme'; -
Assign
pxfuserall privileges on thecustomersand theorderstables, which will allow the user to read from and write data to the tables. Then exitpsql:GRANT ALL ON customers, orders TO pxfuser; \q -
Add the following content to the PostgreSQL configuration file (pg_hba.conf) to allow
pxfuserto access thecustomersdatabase from each Greengage DB host:host all all mdw trust host all all smdw trust host all all sdw1 trust host all all sdw2 trust -
Reload the PostgreSQL configuration:
$ sudo pg_ctlcluster 14 main reload
Configure the JDBC connector
To have PXF connect to PostgreSQL, you need to create a JDBC server configuration for PostgreSQL as described in Configure a PXF server and then synchronize it to the Greengage DB cluster.
-
Log in to the Greengage DB master host as
gpadmin. -
Go to the $PXF_BASE/servers directory and create a JDBC server configuration directory named postgres-server:
$ cd $PXF_BASE/servers $ mkdir postgres-server -
Copy the jdbc-site.xml server configuration template file from the $PXF_HOME/templates/ directory to the created postgres-server directory:
$ cp $PXF_HOME/templates/jdbc-site.xml $PXF_BASE/servers/postgres-server -
In the jdbc-site.xml file, provide the following configuration specifying
jdbc.driver,jdbc.url,jdbc.user, andjdbc.password:<?xml version="1.0" encoding="UTF-8"?> <configuration> <property> <name>jdbc.driver</name> <value>org.postgresql.Driver</value> <description>Class name of the JDBC driver (e.g. org.postgresql.Driver)</description> </property> <property> <name>jdbc.url</name> <value>jdbc:postgresql://pserver:5432/customers</value> <description>The URL that the JDBC driver can use to connect to the database (e.g. jdbc:postgresql://localhost/postgres)</description> </property> <property> <name>jdbc.user</name> <value>pxfuser</value> <description>User name for connecting to the database (e.g. postgres)</description> </property> <property> <name>jdbc.password</name> <value>changeme</value> <description>Password for connecting to the database (e.g. postgres)</description> </property> </configuration> -
Synchronize the PXF server configuration to the Greengage DB cluster:
$ pxf cluster sync
Create a readable external table
-
On the Greengage DB master host, create a readable PXF external table that references the earlier created
customersPostgreSQL table. In theLOCATIONclause, specify the remote table to connect to (customers), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). In theFORMATclause, specifypxfwritable_import, which is the built-in custom formatter function for read operations:CREATE EXTERNAL TABLE pxf_customers_r ( id INTEGER, name TEXT, city TEXT ) LOCATION ('pxf://customers?PROFILE=jdbc&SERVER=postgres-server') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
Query the created external table:
SELECT * FROM pxf_customers_r;The output should look as follows:
id | name | city ----+-----------------+--------------- 1 | Alice Smith | New York 2 | Bob Johnson | Los Angeles 3 | Charlie Brown | Chicago 4 | David Lee | San Francisco 5 | Emily Davis | Houston 6 | Frank Wilson | Phoenix 7 | Grace Taylor | Philadelphia 8 | Henry Moore | Dallas 9 | Isabella Garcia | Seattle 10 | Jack Rodriguez | Denver (10 rows)
Create a writable external table
-
On the Greengage DB master host, create a writable PXF external table that references the earlier created
ordersPostgreSQL table. In theLOCATIONclause, specify the remote table to connect to (orders), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). In theFORMATclause, specifypxfwritable_export, which is the built-in custom formatter function for write operations:CREATE WRITABLE EXTERNAL TABLE pxf_orders_w ( order_id INTEGER, customer_id INTEGER, order_date DATE, amount DECIMAL(10, 2) ) LOCATION ('pxf://orders?PROFILE=jdbc&SERVER=postgres-server') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export'); -
Insert data into the
pxf_orders_wtable:INSERT INTO pxf_orders_w (order_id, customer_id, order_date, amount) VALUES (101, 1, '2023-01-15', 75.50), (102, 2, '2023-01-20', 120.00), (103, 1, '2023-10-10', 90.25), (104, 3, '2023-02-28', 60.75), (105, 2, '2023-03-05', 150.00), (106, 4, '2023-03-12', 85.00), (107, 5, '2023-05-18', 110.75), (108, 6, '2023-04-01', 95.20), (109, 7, '2023-06-15', 135.50), (110, 8, '2023-04-22', 70.00), (111, 9, '2023-05-01', 105.00), (112, 10, '2023-07-08', 65.75), (113, 1, '2023-05-15', 80.50), (114, 2, '2023-09-22', 140.00), (115, 3, '2023-11-01', 72.25); -
On the PostgreSQL server, query the initially created source
orderstable to view the data:SELECT * FROM orders ORDER BY order_id;The output should look as follows:
order_id | customer_id | order_date | amount ----------+-------------+------------+-------- 101 | 1 | 2023-01-15 | 75.50 102 | 2 | 2023-01-20 | 120.00 103 | 1 | 2023-10-10 | 90.25 104 | 3 | 2023-02-28 | 60.75 105 | 2 | 2023-03-05 | 150.00 106 | 4 | 2023-03-12 | 85.00 107 | 5 | 2023-05-18 | 110.75 108 | 6 | 2023-04-01 | 95.20 109 | 7 | 2023-06-15 | 135.50 110 | 8 | 2023-04-22 | 70.00 111 | 9 | 2023-05-01 | 105.00 112 | 10 | 2023-07-08 | 65.75 113 | 1 | 2023-05-15 | 80.50 114 | 2 | 2023-09-22 | 140.00 115 | 3 | 2023-11-01 | 72.25 (15 rows)
Create a named query
-
Log in to the Greengage DB master host as
gpadmin. -
Go to the $PXF_BASE/servers/postgres-server JDBC server configuration directory and create a text file named order_report.sql having the following content:
SELECT c.name, c.city, SUM(o.amount) AS total, EXTRACT(MONTH FROM o.order_date) AS order_month FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.name, c.city, EXTRACT(MONTH FROM o.order_date) ORDER BY c.city, total;This query retrieves customer names and their cities with the total amount of all orders placed by each customer in a specific month. The data is aggregated per customer and order month, providing a monthly summary of order totals for each customer. The results are then sorted first by city and then by the total order amount.
-
Synchronize the PXF server configuration to the Greengage DB cluster:
$ pxf cluster sync -
Create a readable PXF external table that references the created query file. In the
LOCATIONclause, specify the query name (order_report), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). ThePARTITION_BYclause is used for dividing the data into partitions based on theorder_monthcolumn, with an interval of three months per partition. In theFORMATclause, specifypxfwritable_import, which is the built-in custom formatter function for read operations:CREATE EXTERNAL TABLE pxf_query_order_report ( name text, city text, total int, order_month int ) LOCATION ('pxf://query:order_report?PROFILE=jdbc&SERVER=postgres-server&PARTITION_BY=order_month:int&RANGE=1:12&INTERVAL=3') FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import'); -
Run several example queries on the created external table and view the results:
-
View all data returned by the query:
SELECT * FROM pxf_query_order_report;The output should look as follows:
name | city | total | order_month -----------------+---------------+-------+------------- Henry Moore | Dallas | 70 | 4 Emily Davis | Houston | 110 | 5 Alice Smith | New York | 80 | 5 Grace Taylor | Philadelphia | 135 | 6 Frank Wilson | Phoenix | 95 | 4 Isabella Garcia | Seattle | 105 | 5 Charlie Brown | Chicago | 60 | 2 Bob Johnson | Los Angeles | 120 | 1 Bob Johnson | Los Angeles | 150 | 3 Alice Smith | New York | 75 | 1 David Lee | San Francisco | 85 | 3 Charlie Brown | Chicago | 72 | 11 Alice Smith | New York | 90 | 10 Jack Rodriguez | Denver | 65 | 7 Bob Johnson | Los Angeles | 140 | 9 (15 rows)
-
View the cities with the total ordered amount exceeding 100:
SELECT city, sum(total) FROM pxf_query_order_report WHERE total > 100 GROUP BY city;The output should look as follows:
city | sum --------------+----- Philadelphia | 135 Los Angeles | 410 Seattle | 105 Houston | 110 (4 rows)
-
View top 5 customers by ordered amount:
SELECT name, SUM(total) AS total_ordered FROM pxf_query_order_report GROUP BY name ORDER BY total_ordered DESC LIMIT 5;The output should look as follows:
name | total_ordered ---------------+--------------- Bob Johnson | 410 Alice Smith | 245 Grace Taylor | 135 Charlie Brown | 132 Emily Davis | 110 (5 rows)
-
View total ordered amount by month:
SELECT order_month, SUM(total) AS monthly_total FROM pxf_query_order_report GROUP BY order_month ORDER BY order_month;The output should look as follows:
order_month | monthly_total -------------+--------------- 1 | 195 2 | 60 3 | 235 4 | 165 5 | 295 6 | 135 7 | 65 9 | 140 10 | 90 11 | 72 (10 rows)
-