Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Use PXF JDBC connector to read and write data between Greengage DB and PostgreSQL

Anton Monakov

This example demonstrates how to configure and use the PXF JDBC connector for reading and writing data between Greengage DB and PostgreSQL by using external tables. To try it out, make sure you have a PostgreSQL server running and accessible in addition to the Greengage DB server. In the example, the PostgreSQL 14 server is running on the host named pserver, and the Greengage DB cluster comprises four hosts:

  • mdw — the master host;

  • smdw — the standby master host;

  • sdw1 and sdw2 — the segment hosts.

Create source PostgreSQL tables

  1. On the PostgreSQL host, connect to the default PostgreSQL database as the postgres user:

    $ psql -U postgres
  2. Create a database named customers and connect to it:

    CREATE DATABASE customers;
    \c customers
  3. Create a table named customers and insert data into it:

    CREATE TABLE customers (
        id INTEGER,
        name TEXT,
        city TEXT
    );
    
    INSERT INTO customers (id, name, city) 
    VALUES (1, 'Alice Smith', 'New York'),
           (2, 'Bob Johnson', 'Los Angeles'),
           (3, 'Charlie Brown', 'Chicago'),
           (4, 'David Lee', 'San Francisco'),
           (5, 'Emily Davis', 'Houston'),
           (6, 'Frank Wilson', 'Phoenix'),
           (7, 'Grace Taylor', 'Philadelphia'),
           (8, 'Henry Moore', 'Dallas'),
           (9, 'Isabella Garcia', 'Seattle'),
           (10, 'Jack Rodriguez', 'Denver');
  4. Create a table named orders:

    CREATE TABLE orders (
        order_id INTEGER,
        customer_id INTEGER,
        order_date DATE,
        amount DECIMAL(10, 2)
    );
  5. Create a PostgreSQL user named pxfuser:

    CREATE USER pxfuser WITH PASSWORD 'changeme';
  6. Assign pxfuser all privileges on the customers and the orders tables, which will allow the user to read from and write data to the tables. Then exit psql:

    GRANT ALL ON customers, orders TO pxfuser;
    \q
  7. Add the following content to the PostgreSQL configuration file (pg_hba.conf) to allow pxfuser to access the customers database from each Greengage DB host:

    host    all             all             mdw            trust
    host    all             all             smdw           trust
    host    all             all             sdw1           trust
    host    all             all             sdw2           trust
  8. Reload the PostgreSQL configuration:

    $ sudo pg_ctlcluster 14 main reload

Configure the JDBC connector

To have PXF connect to PostgreSQL, you need to create a JDBC server configuration for PostgreSQL as described in Configure a PXF server and then synchronize it to the Greengage DB cluster.

  1. Log in to the Greengage DB master host as gpadmin.

  2. Go to the $PXF_BASE/servers directory and create a JDBC server configuration directory named postgres-server:

    $ cd $PXF_BASE/servers
    $ mkdir postgres-server
  3. Copy the jdbc-site.xml server configuration template file from the $PXF_HOME/templates/ directory to the created postgres-server directory:

    $ cp $PXF_HOME/templates/jdbc-site.xml $PXF_BASE/servers/postgres-server
  4. In the jdbc-site.xml file, provide the following configuration specifying jdbc.driver, jdbc.url, jdbc.user, and jdbc.password:

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <property>
            <name>jdbc.driver</name>
            <value>org.postgresql.Driver</value>
            <description>Class name of the JDBC driver (e.g. org.postgresql.Driver)</description>
        </property>
        <property>
            <name>jdbc.url</name>
            <value>jdbc:postgresql://pserver:5432/customers</value>
            <description>The URL that the JDBC driver can use to connect to the database (e.g. jdbc:postgresql://localhost/postgres)</description>
        </property>
        <property>
            <name>jdbc.user</name>
            <value>pxfuser</value>
            <description>User name for connecting to the database (e.g. postgres)</description>
        </property>
        <property>
            <name>jdbc.password</name>
            <value>changeme</value>
            <description>Password for connecting to the database (e.g. postgres)</description>
        </property>
    </configuration>
  5. Synchronize the PXF server configuration to the Greengage DB cluster:

    $ pxf cluster sync

Create a readable external table

  1. On the Greengage DB master host, create a readable PXF external table that references the earlier created customers PostgreSQL table. In the LOCATION clause, specify the remote table to connect to (customers), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). In the FORMAT clause, specify pxfwritable_import, which is the built-in custom formatter function for read operations:

    CREATE EXTERNAL TABLE pxf_customers_r (
            id INTEGER,
            name TEXT,
            city TEXT
        )
        LOCATION ('pxf://customers?PROFILE=jdbc&SERVER=postgres-server')
        FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  2. Query the created external table:

    SELECT * FROM pxf_customers_r;

    The output should look as follows:

     id |      name       |     city
    ----+-----------------+---------------
      1 | Alice Smith     | New York
      2 | Bob Johnson     | Los Angeles
      3 | Charlie Brown   | Chicago
      4 | David Lee       | San Francisco
      5 | Emily Davis     | Houston
      6 | Frank Wilson    | Phoenix
      7 | Grace Taylor    | Philadelphia
      8 | Henry Moore     | Dallas
      9 | Isabella Garcia | Seattle
     10 | Jack Rodriguez  | Denver
    (10 rows)

Create a writable external table

  1. On the Greengage DB master host, create a writable PXF external table that references the earlier created orders PostgreSQL table. In the LOCATION clause, specify the remote table to connect to (orders), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). In the FORMAT clause, specify pxfwritable_export, which is the built-in custom formatter function for write operations:

    CREATE WRITABLE EXTERNAL TABLE pxf_orders_w ( 
            order_id INTEGER,
            customer_id INTEGER,
            order_date DATE,
            amount DECIMAL(10, 2)
        )
        LOCATION ('pxf://orders?PROFILE=jdbc&SERVER=postgres-server')
        FORMAT 'CUSTOM' (FORMATTER='pxfwritable_export');
  2. Insert data into the pxf_orders_w table:

    INSERT INTO pxf_orders_w (order_id, customer_id, order_date, amount) 
    VALUES (101, 1, '2023-01-15', 75.50),
           (102, 2, '2023-01-20', 120.00),
           (103, 1, '2023-10-10', 90.25),
           (104, 3, '2023-02-28', 60.75),
           (105, 2, '2023-03-05', 150.00),
           (106, 4, '2023-03-12', 85.00),
           (107, 5, '2023-05-18', 110.75),
           (108, 6, '2023-04-01', 95.20),
           (109, 7, '2023-06-15', 135.50),
           (110, 8, '2023-04-22', 70.00),
           (111, 9, '2023-05-01', 105.00),
           (112, 10, '2023-07-08', 65.75),
           (113, 1, '2023-05-15', 80.50),
           (114, 2, '2023-09-22', 140.00),
           (115, 3, '2023-11-01', 72.25);  
  3. On the PostgreSQL server, query the initially created source orders table to view the data:

    SELECT * FROM orders ORDER BY order_id;

    The output should look as follows:

     order_id | customer_id | order_date | amount
    ----------+-------------+------------+--------
          101 |           1 | 2023-01-15 |  75.50
          102 |           2 | 2023-01-20 | 120.00
          103 |           1 | 2023-10-10 |  90.25
          104 |           3 | 2023-02-28 |  60.75
          105 |           2 | 2023-03-05 | 150.00
          106 |           4 | 2023-03-12 |  85.00
          107 |           5 | 2023-05-18 | 110.75
          108 |           6 | 2023-04-01 |  95.20
          109 |           7 | 2023-06-15 | 135.50
          110 |           8 | 2023-04-22 |  70.00
          111 |           9 | 2023-05-01 | 105.00
          112 |          10 | 2023-07-08 |  65.75
          113 |           1 | 2023-05-15 |  80.50
          114 |           2 | 2023-09-22 | 140.00
          115 |           3 | 2023-11-01 |  72.25
    (15 rows)

Create a named query

  1. Log in to the Greengage DB master host as gpadmin.

  2. Go to the $PXF_BASE/servers/postgres-server JDBC server configuration directory and create a text file named order_report.sql having the following content:

    SELECT
        c.name,
        c.city,
        SUM(o.amount) AS total,
        EXTRACT(MONTH FROM o.order_date) AS order_month
    FROM
        customers c
    JOIN
        orders o ON c.id = o.customer_id
    GROUP BY
        c.name, c.city, EXTRACT(MONTH FROM o.order_date)
    ORDER BY
        c.city, total;

    This query retrieves customer names and their cities with the total amount of all orders placed by each customer in a specific month. The data is aggregated per customer and order month, providing a monthly summary of order totals for each customer. The results are then sorted first by city and then by the total order amount.

  3. Synchronize the PXF server configuration to the Greengage DB cluster:

    $ pxf cluster sync
  4. Create a readable PXF external table that references the created query file. In the LOCATION clause, specify the query name (order_report), the profile used (jdbc), and the remote PostgreSQL server (postgres-server). The PARTITION_BY clause is used for dividing the data into partitions based on the order_month column, with an interval of three months per partition. In the FORMAT clause, specify pxfwritable_import, which is the built-in custom formatter function for read operations:

    CREATE EXTERNAL TABLE pxf_query_order_report (
            name text,
            city text,
            total int,
            order_month int
        )
        LOCATION ('pxf://query:order_report?PROFILE=jdbc&SERVER=postgres-server&PARTITION_BY=order_month:int&RANGE=1:12&INTERVAL=3')
        FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
  5. Run several example queries on the created external table and view the results:

    • View all data returned by the query:

      SELECT * FROM pxf_query_order_report;

      The output should look as follows:

            name       |     city      | total | order_month
      -----------------+---------------+-------+-------------
       Henry Moore     | Dallas        |    70 |           4
       Emily Davis     | Houston       |   110 |           5
       Alice Smith     | New York      |    80 |           5
       Grace Taylor    | Philadelphia  |   135 |           6
       Frank Wilson    | Phoenix       |    95 |           4
       Isabella Garcia | Seattle       |   105 |           5
       Charlie Brown   | Chicago       |    60 |           2
       Bob Johnson     | Los Angeles   |   120 |           1
       Bob Johnson     | Los Angeles   |   150 |           3
       Alice Smith     | New York      |    75 |           1
       David Lee       | San Francisco |    85 |           3
       Charlie Brown   | Chicago       |    72 |          11
       Alice Smith     | New York      |    90 |          10
       Jack Rodriguez  | Denver        |    65 |           7
       Bob Johnson     | Los Angeles   |   140 |           9
      (15 rows)
    • View the cities with the total ordered amount exceeding 100:

      SELECT city, sum(total) FROM pxf_query_order_report
          WHERE total > 100
          GROUP BY city;

      The output should look as follows:

           city     | sum
      --------------+-----
       Philadelphia | 135
       Los Angeles  | 410
       Seattle      | 105
       Houston      | 110
      (4 rows)
    • View top 5 customers by ordered amount:

      SELECT name, SUM(total) AS total_ordered FROM pxf_query_order_report
          GROUP BY name
          ORDER BY total_ordered DESC
          LIMIT 5;

      The output should look as follows:

           name      | total_ordered
      ---------------+---------------
       Bob Johnson   |           410
       Alice Smith   |           245
       Grace Taylor  |           135
       Charlie Brown |           132
       Emily Davis   |           110
      (5 rows)
    • View total ordered amount by month:

      SELECT order_month, SUM(total) AS monthly_total FROM pxf_query_order_report
          GROUP BY order_month
          ORDER BY order_month;

      The output should look as follows:

       order_month | monthly_total
      -------------+---------------
                 1 |           195
                 2 |            60
                 3 |           235
                 4 |           165
                 5 |           295
                 6 |           135
                 7 |            65
                 9 |           140
                10 |            90
                11 |            72
      (10 rows)