Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Work with XML data

Andrey Aksenov

Greengage DB provides the XML data type for storing XML data. It verifies that input values are well-formed XML and can store either full XML documents or XML content fragments. You can use built-in functions and operators to query, transform, and output XML data directly in SQL.

To follow the examples in this topic, connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql. Create a new database and connect to it:

CREATE DATABASE marketplace;
\c marketplace

Overview

Greengage DB provides support for working with XML data through the XML data type. It enables storing, querying, and exchanging hierarchical or semi-structured data within the database alongside relational tables. Key capabilities include:

  • Storing XML data

    The XML type can hold complete XML documents or content fragments directly in table columns. An XML document must have a single root element, whereas XML content may consist of multiple top-level nodes.

  • Querying and filtering XML data

    Functions such as xpath() and xpath_exists() allow extracting data and checking for the presence of specific elements. Greengage DB supports XPath 1.0 expressions only.

  • Producing XML data

    Functions such as xmlelement(), xmlroot(), and xmlcomment() let you build XML structures from SQL data and format query results as XML documents for client applications.

  • Serializing and deserializing XML data

    xmlparse() converts text into XML values, while xmlserialize() performs the reverse operation — converting XML into a textual representation suitable for exchanging data between applications or storing it outside the database.

Character encoding

Ensure correct character encoding when exchanging XML data between client and server:

  • In text mode, Greengage DB automatically converts character data between the client and server encodings. This includes XML values, so any encoding declaration inside the XML data may no longer match the actual encoding after conversion. To handle this, Greengage DB ignores encoding declarations in XML input and assumes the data is in the current server encoding. Therefore, client applications must send XML data in the current client encoding or convert it to that encoding before sending it to the server. When a client retrieves XML values from the server, no encoding declaration is included in the result; the client should interpret the returned data as being in the current client encoding.

  • In binary mode, no character set conversion is performed. XML encoding declarations are honored, and if missing, the data is assumed to be in UTF-8 (as required by the XML standard). On output, an encoding declaration is included unless the client encoding is UTF-8.

For more details on text and binary modes, see the PostgreSQL documentation on libpq command execution functions.

TIP

XML processing is more efficient and less error-prone when the XML data, client, and server all use the same encoding. Since Greengage DB processes XML internally in UTF-8, using UTF-8 for the server encoding is recommended.

XML data access

The XML data type does not support comparison operators (such as <, >, or =) since there is no standard way to compare two XML documents. As a result, you cannot search for rows by directly comparing XML columns with specific values. To identify XML rows, include a separate key field, such as an ID. If needed, XML values can be cast to text for comparison, but this compares only the textual form, not the XML structure.

Because there are no comparison operators, you cannot create indexes directly on XML columns. You can speed up searches by creating an index on a derived expression, such as a text cast of the XML value or the result of an XPath query. For an example of creating and using such indexes, see Index XML data.

Store XML data

Create a table for XML data

In this example, the customer_profiles table defines two XML columns — one for complete XML documents (profile) and another for XML content fragments (details):

CREATE TABLE customer_profiles
(
    id      SERIAL,
    profile XML,
    details XML
)
    DISTRIBUTED BY (id);

To learn more about a table definition, see Tables overview.

Load XML data

Once the table is created, you can insert XML data using the xmlparse() function. The xmlparse() function converts text into an XML value and allows you to specify whether the input is a complete document (document) or a content fragment (content). In the example below, the profile column stores full XML documents representing customer information, while the details column stores XML content fragments with a top-level tags element containing nested elements:

INSERT INTO customer_profiles (profile, details)
VALUES (xmlparse(document
                 '<customer id="c001">
                    <name>Alice Johnson</name>
                    <email>alice@example.com</email>
                    <active>true</active>
                    <registered>2020-01-15</registered>
                  </customer>'
        ),
        xmlparse(content
                 '<tags>
                    <membership>premium</membership>
                    <membership>newsletter</membership>
                    <region>US</region>
                  </tags>'
        )),
       (xmlparse(document
                 '<customer id="c002">
                    <name>Bob Smith</name>
                    <email>bob@example.com</email>
                    <active>false</active>
                    <registered>2019-06-30</registered>
                  </customer>'
        ),
        xmlparse(content
                 '<tags>
                    <membership>trial</membership>
                    <region>EU</region>
                  </tags>'
        )),
       (xmlparse(document
                 '<customer id="c003">
                    <name>Charlie Brown</name>
                    <email>charlie@example.com</email>
                    <active>true</active>
                  </customer>'
        ),
        NULL);

Greengage DB validates XML data for well-formedness on insert and update. It does not perform DTD or XML Schema (XSD) validation. If the XML is not well-formed, the operation is rejected and an error is returned, as illustrated below. For schema conformance, validation should be handled at the application level.

INSERT INTO customer_profiles (profile, details)
VALUES (xmlparse(document
                 '<customer>
                    <name>John Doe</name>
                    <email>john@example.com</email>'
        ),
        NULL);

The operation produces the following error:

ERROR:  invalid XML document  (seg1 slice1 192.168.1.30:10001 pid=2537)
DETAIL:  line 1: Premature end of data in tag customer line 1
<customer><name>John Doe</name><email>john@example.com</email>
                                                              ^
NOTE

Greengage DB also allows you to transform XML data before loading it into tables. See Transform external data for an example using XSLT or Python to convert XML source data and load it via gpfdist or gpload.

Query and filter XML data

The primary function for retrieving data from XML columns in Greengage DB is xpath(). It returns an array of all nodes that match a given XPath expression. To get a single value, you can select the first element with [1] and cast it to text using ::TEXT.

For example, execute the following query to retrieve the /customer/name and /customer/email elements from the profile column:

SELECT (xpath('/customer/name/text()', profile))[1]::TEXT  AS name,
       (xpath('/customer/email/text()', profile))[1]::TEXT AS email
FROM customer_profiles;

The result should look like this:

     name      |        email
---------------+---------------------
 Alice Johnson | alice@example.com
 Bob Smith     | bob@example.com
 Charlie Brown | charlie@example.com
(3 rows)

The same method applies to XML content fragments. This example also shows extracting /tags/region from the details column:

SELECT (xpath('/customer/name/text()', profile))[1]::TEXT  AS name,
       (xpath('/customer/email/text()', profile))[1]::TEXT AS email,
       (xpath('/tags/region/text()', details))[1]::TEXT    AS region
FROM customer_profiles;

The result looks as follows:

     name      |        email        | region
---------------+---------------------+--------
 Charlie Brown | charlie@example.com |
 Alice Johnson | alice@example.com   | US
 Bob Smith     | bob@example.com     | EU
(3 rows)

You can filter rows based on the presence of an element using xpath_exists():

SELECT (xpath('/customer/name/text()', profile))[1]::TEXT       AS name,
       (xpath('/customer/registered/text()', profile))[1]::TEXT AS registered
FROM customer_profiles
WHERE xpath_exists('/customer/registered', profile);

The result is:

     name      | registered
---------------+------------
 Alice Johnson | 2020-01-15
 Bob Smith     | 2019-06-30
(2 rows)

Learn more about retrieving rows from tables and views in the SELECT command overview topic.

Index XML data

When querying large tables, searches on XML values can be slow because XML columns do not support direct comparison operators. As a result, you cannot create indexes directly on XML columns. To speed up searches, you can use an index expression on a derived value, such as the result of xpath() cast to text.

As index expressions are supported by the Postgres planner only, turn off the GPORCA query optimizer:

SET optimizer = off;

Turn the enable_seqscan option off:

SET enable_seqscan = off;

Next, create an index on the /customer/name element in the profile XML column:

CREATE INDEX customer_name_gin_idx
    ON customer_profiles USING gin (cast(xpath('string(/customer/name)', profile) AS TEXT[]));

Update table statistics:

ANALYZE customer_profiles;

You can then use the indexed expression in a query. For example, to find a customer named Charlie Brown:

EXPLAIN (COSTS OFF)
SELECT *
FROM customer_profiles
WHERE cast(xpath('string(/customer/name)', profile) AS TEXT[]) @> ARRAY['Charlie Brown'];

The output indicates that the query uses the index expression:

                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Gather Motion 4:1  (slice1; segments: 4)
   ->  Bitmap Heap Scan on customer_profiles
         Recheck Cond: ((xpath('string(/customer/name)'::text, profile, '{}'::text[]))::text[] @> '{"Charlie Brown"}'::text[])
         ->  Bitmap Index Scan on customer_name_gin_idx
               Index Cond: ((xpath('string(/customer/name)'::text, profile, '{}'::text[]))::text[] @> '{"Charlie Brown"}'::text[])
 Optimizer: Postgres query optimizer
(6 rows)

Produce XML data

CREATE TABLE customers
(
    id    int,
    name  text,
    email text
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);
INSERT INTO customers(id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com'),
       (2, 'Bob Smith', 'bob@example.com'),
       (3, 'Charlie Brown', 'charlie@example.com');

Greengage DB provides a set of functions for producing XML content from SQL data. These functions are particularly suitable for formatting query results into XML documents for processing in client applications.

Use xmlelement() to create an XML element with the specified name, attributes, and content:

SELECT xmlelement(name customer, name) AS xml_customer
FROM customers;

The result looks like this:

            xml_customer
------------------------------------
 <customer>Charlie Brown</customer>
 <customer>Alice Johnson</customer>
 <customer>Bob Smith</customer>
(3 rows)

xmlelement() also supports creating nested elements and adding attributes using xmlattributes():

SELECT xmlelement(
               name customer,
               xmlattributes(id AS id),
               xmlelement(name name, name),
               xmlelement(name email, email)
       ) AS xml_customer
FROM customers;

The result should look like this:

                                       xml_customer
------------------------------------------------------------------------------------------
 <customer id="3"><name>Charlie Brown</name><email>charlie@example.com</email></customer>
 <customer id="1"><name>Alice Johnson</name><email>alice@example.com</email></customer>
 <customer id="2"><name>Bob Smith</name><email>bob@example.com</email></customer>
(3 rows)

To convert the XML value into a text representation suitable for exporting or saving, use xmlserialize():

SELECT xmlserialize(
               content xmlelement(
                name customer,
                xmlattributes(id AS id),
                xmlelement(name name, name),
                xmlelement(name email, email)
                       ) as TEXT
       ) AS xml_text
FROM customers;

The result is:

                                         xml_text
------------------------------------------------------------------------------------------
 <customer id="1"><name>Alice Johnson</name><email>alice@example.com</email></customer>
 <customer id="3"><name>Charlie Brown</name><email>charlie@example.com</email></customer>
 <customer id="2"><name>Bob Smith</name><email>bob@example.com</email></customer>
(3 rows)

See also Produce XML content for functions to construct XML and Map tables to XML for generating XML from tables and queries.

Reference: XML functions and predicates

Serialize/deserialize XML

xmlparse()

Function: xmlparse({ document | content } <value>)

Return type: XML

Produces XML from character data. The first argument specifies the type of <value>:

  • document — XML document.

  • content — XML content fragment.

Example:

SELECT xmlparse(document
                '<customer id="c001">
                   <name>Alice Johnson</name>
                   <email>alice@example.com</email>
                 </customer>'
       );

Result:

                      xmlparse
-----------------------------------------------------
 <customer id="c001">                               +
                    <name>Alice Johnson</name>      +
                    <email>alice@example.com</email>+
                  </customer>
(1 row)

You can also use ::XML — a PostgreSQL-specific conversion:

SELECT '<customer id="c001">
    <name>Alice Johnson</name>
    <email>alice@example.com</email>
</customer>'::XML;

The XML type conversion does not validate input values against a document type declaration (DTD) or other XML schema languages.

xmlserialize()

Function: xmlserialize({ document | content } <value> AS <type>)

Return type: <type>

Creates a character string value from XML. The first argument specifies the type of <value>:

  • document — XML document.

  • content — XML content fragment.

<type> can be the CHARACTER, CHARACTER VARYING, or TEXT data type.

Example:

SELECT xmlserialize(content xmlelement(name customer,
                                       xmlelement(name name, 'Alice Johnson'),
                                       xmlelement(name email, 'alice@example.com'))
           as TEXT
       ) AS xml_text;

Result:

                                    xml_text
---------------------------------------------------------------------------------
 <customer><name>Alice Johnson</name><email>alice@example.com</email></customer>
(1 row)

Produce XML content

NOTE

Binary values (type BYTEA) are encoded in XML according to the xmlbinary option, either base64 or hex. This applies when functions like xmlelement() or xmlforest() convert BYTEA to XML. The choice mainly depends on client application requirements.

xmlcomment()

Function: xmlcomment(<text>)

Return type: XML

Creates an XML comment node containing the specified text.

Example:

SELECT xmlcomment('Hello, world!');

Result:

      xmlcomment
----------------------
 <!--Hello, world!-->
(1 row)

xmlelement()

Function: xmlelement(name <name> [, xmlattributes(<value> [AS <attname>] [, …​])] [, <content>, …​])

Return type: XML

Creates an XML element with the specified name, attributes, and content.

Example 1:

SELECT xmlelement(name email, 'alice@example.com');

Result:

            xmlelement
----------------------------------
 <email>alice@example.com</email>
(1 row)

Example 2:

SELECT xmlelement(name customer,
                  xmlattributes('c001' AS id),
                  xmlelement(name name, 'Alice Johnson'),
                  xmlelement(name email, 'alice@example.com'));

Result:

                                        xmlelement
-------------------------------------------------------------------------------------------
 <customer id="c001"><name>Alice Johnson</name><email>alice@example.com</email></customer>
(1 row)

xmlforest()

Function: xmlforest(<content> [AS <name>] [, …​])

Return type: XML

Produces an XML forest (sequence) of elements using the given names and content.

Example:

SELECT xmlforest('Alice Johnson' AS name,
                 'alice@example.com' AS email,
                 'true' AS active);

Result:

                                    xmlforest
---------------------------------------------------------------------------------
 <name>Alice Johnson</name><email>alice@example.com</email><active>true</active>
(1 row)

xmlroot()

Function: xmlroot(<xml>, version <text> | no value [, standalone yes|no|no value])

Return type: XML

Alters the properties of the root node of an XML value. If version is specified, it replaces the value in the root node’s version declaration. If standalone is specified, it replaces the value in the root node’s standalone declaration.

Example:

SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
               version '1.0', standalone yes);

Result:

                           xmlroot
--------------------------------------------------------------
 <?xml version="1.0" standalone="yes"?><content>abc</content>
(1 row)

xmlagg()

Function: xmlagg(<xml>)

Return type: XML

Aggregates XML values from multiple rows into a single XML value by concatenating them.

Example:

CREATE TABLE customers
(
    id      SERIAL,
    details XML
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);
INSERT INTO customers(details)
VALUES (xmlelement(name customer, xmlelement(name name, 'Alice Johnson'))),
       (xmlelement(name customer, xmlelement(name name, 'Charlie Brown')));
SELECT xmlagg(details)
FROM customers;

Result:

                                             xmlagg
------------------------------------------------------------------------------------------------
 <customer><name>Alice Johnson</name></customer><customer><name>Charlie Brown</name></customer>
(1 row)

XML predicates

NOTE

The xmloption parameter controls whether text is treated as an XML document or as an XML content fragment. It affects functions like xml_is_well_formed(), determining whether the XML is valid as a full document or as a content fragment.

IS DOCUMENT / IS NOT DOCUMENT

Expression: <xml> IS DOCUMENT, <xml> IS NOT DOCUMENT

Return type: BOOLEAN

Determines if an XML value is an XML document or content fragment.

Example:

SELECT xmlparse(document
                '<customer id="c001">
                   <name>Alice Johnson</name>
                   <email>alice@example.com</email>
                 </customer>'
       ) IS DOCUMENT AS is_doc;

Result:

 is_doc
--------
 t
(1 row)

xmlexists()

Function: xmlexists(<text> PASSING [BY REF] <xml> [BY REF])

Return type: BOOLEAN

Determines whether the specified XPath expression returns any nodes from the given XML value. Returns true if matching nodes are found, false if none are found, and null if either argument is null.

Example:

SELECT xmlexists('//customer[name="Bob Smith"]' PASSING BY REF
                 '
                 <customers>
                     <customer>
                         <name>Bob Smith</name>
                     </customer>
                 </customers>');

Result:

 xmlexists
-----------
 t
(1 row)

xml_is_well_formed()

Function: xml_is_well_formed(<text>)

Return type: BOOLEAN

Checks whether a text string is a well-formed XML document or content fragment. The behavior depends on the xmloption configuration parameter: when set to document, the function checks for a well-formed XML document; when set to content, it checks for a well-formed XML content fragment.

Example 1:

SELECT xml_is_well_formed('<customer><name>Alice Johnson</name></customer>');

Result:

 xml_is_well_formed
--------------------
 t
(1 row)

Example 2:

SELECT xml_is_well_formed('<customer><name>Alice Johnson</name>');

Result:

 xml_is_well_formed
--------------------
 f
(1 row)

xml_is_well_formed_document()

Function: xml_is_well_formed_document(<text>)

Return type: BOOLEAN

Checks whether a text string is a well-formed XML document.

Example:

SELECT xml_is_well_formed_document('<customer><name>Bob Smith</name></customer>');

Result:

 xml_is_well_formed_document
-----------------------------
 t
(1 row)

xml_is_well_formed_content()

Function: xml_is_well_formed_content(<text>)

Return type: BOOLEAN

Checks whether a text string is a well-formed XML content fragment.

Example:

SELECT xml_is_well_formed_content('<name>Charlie Brown</name>');

Result:

 xml_is_well_formed_content
----------------------------
 t
(1 row)

Process XML

xpath()

Function: xpath(<xpath>, <xml> [, <nsarray>])

Return type: XML[]

Returns an array of XML values corresponding to the XPath expression. If the XPath expression returns a scalar value, xpath() returns a single-element array.

The second argument must be a well-formed XML document with a single root element.

The optional third argument is an array of namespace mappings. It should be a two-dimensional text array where each subarray has exactly two elements: the first is the namespace alias, the second is the namespace URI. Aliases do not need to match those used in the XML document; they are local to the xpath() function.

Example:

SELECT (xpath('/customer/email/text()',
              '<customer>
                   <name>Alice Johnson</name>
                   <email>alice@example.com</email>
               </customer>'
        ))[1]::TEXT AS email;

Result:

       email
-------------------
 alice@example.com
(1 row)

xpath_exists()

Function: xpath_exists(<xpath>, <xml> [, <nsarray>])

Return type: BOOLEAN

Determines whether the specified XPath expression matches any nodes in the given XML value. This function behaves like the xmlexists() predicate, but also supports an optional namespace mapping argument.

Example 1:

SELECT xpath_exists('/customer/name',
                    '<customer>
                         <name>Alice Johnson</name>
                         <email>alice@example.com</email>
                     </customer>');

Result:

 xpath_exists
--------------
 t
(1 row)

Example 2:

SELECT xpath_exists('/cust:customer/cust:name',
                    '<cust:customer xmlns:cust="https://example.com/other">
                         <cust:name>Alice Johnson</cust:name>
                         <cust:email>alice@example.com</cust:email>
                     </cust:customer>',
                    ARRAY [ARRAY ['cust', 'https://example.com/customer']]);

Result:

 xpath_exists
--------------
 f
(1 row)

Map tables to XML

table_to_xml()

Function: table_to_xml(<table> REGCLASS, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)

Return type: XML

Generates an XML representation of a table specified by the <table> parameter. Other arguments include:

  • The <nulls> parameter determines whether null values should be included in the output.

  • If the <tableforest> parameter is false, the resulting XML document has additional <row> nodes.

  • The <targetns> parameter specifies the XML namespace. If you do not need to define a namespace, pass an empty string.

Example:

CREATE TABLE customers
(
    id    int,
    name  text,
    email text
)
    WITH (appendoptimized = true)
    DISTRIBUTED BY (id);
INSERT INTO customers(id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com'),
       (2, 'Bob Smith', 'bob@example.com'),
       (3, 'Charlie Brown', 'charlie@example.com');
SELECT table_to_xml('customers', true, false, '');

Result:

                           table_to_xml
-------------------------------------------------------------------
 <customers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
                                                                  +
 <row>                                                            +
   <id>2</id>                                                     +
   <name>Bob Smith</name>                                         +
   <email>bob@example.com</email>                                 +
 </row>                                                           +
                                                                  +
 <row>                                                            +
   <id>3</id>                                                     +
   <name>Charlie Brown</name>                                     +
   <email>charlie@example.com</email>                             +
 </row>                                                           +
                                                                  +
 <row>                                                            +
   <id>1</id>                                                     +
   <name>Alice Johnson</name>                                     +
   <email>alice@example.com</email>                               +
 </row>                                                           +
                                                                  +
 </customers>                                                     +

(1 row)

query_to_xml()

Function: query_to_xml(<query> TEXT, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)

Return type: XML

Generates an XML representation of the result set produced by the SQL statement specified in the <query> parameter. Unlike table_to_xml(), this function allows you to convert the output of an arbitrary query rather than a single table.

See also: table_to_xml().

Example:

SELECT query_to_xml('SELECT id, name FROM customers', true, false, '');

Result:

                         query_to_xml
---------------------------------------------------------------
 <table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
                                                              +
 <row>                                                        +
   <id>3</id>                                                 +
   <name>Charlie Brown</name>                                 +
 </row>                                                       +
                                                              +
 <row>                                                        +
   <id>1</id>                                                 +
   <name>Alice Johnson</name>                                 +
 </row>                                                       +
                                                              +
 <row>                                                        +
   <id>2</id>                                                 +
   <name>Bob Smith</name>                                     +
 </row>                                                       +
                                                              +
 </table>                                                     +

(1 row)

cursor_to_xml()

Function: cursor_to_xml(<cursor> REFCURSOR, <count> INTEGER, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)

Return type: XML

Generates an XML representation of rows fetched from the specified <cursor>. The <count> parameter determines how many rows to fetch. This function is recommended for large result sets, as it processes the data incrementally rather than building the entire XML document in memory at once.

See also: table_to_xml().

Example:

BEGIN;

DECLARE customer_data_cur CURSOR FOR
    SELECT id, name
    FROM customers
    ORDER BY id;

SELECT cursor_to_xml('customer_data_cur' ::refcursor, 2, true, true, '');

CLOSE customer_data_cur;

COMMIT;

Result:

                        cursor_to_xml
-------------------------------------------------------------
 <row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
   <id>1</id>                                               +
   <name>Alice Johnson</name>                               +
 </row>                                                     +
                                                            +
 <row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
   <id>2</id>                                               +
   <name>Bob Smith</name>                                   +
 </row>                                                     +
                                                            +

(1 row)