Work with XML data
Greengage DB provides the XML data type for storing XML data.
It verifies that input values are well-formed XML and can store either full XML documents or XML content fragments.
You can use built-in functions and operators to query, transform, and output XML data directly in SQL.
To follow the examples in this topic, connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql.
Create a new database and connect to it:
CREATE DATABASE marketplace;
\c marketplace
Overview
Greengage DB provides support for working with XML data through the XML data type.
It enables storing, querying, and exchanging hierarchical or semi-structured data within the database alongside relational tables.
Key capabilities include:
-
Storing XML data
The
XMLtype can hold complete XML documents or content fragments directly in table columns. An XML document must have a single root element, whereas XML content may consist of multiple top-level nodes. -
Querying and filtering XML data
Functions such as
xpath()andxpath_exists()allow extracting data and checking for the presence of specific elements. Greengage DB supports XPath 1.0 expressions only. -
Producing XML data
Functions such as
xmlelement(),xmlroot(), andxmlcomment()let you build XML structures from SQL data and format query results as XML documents for client applications. -
Serializing and deserializing XML data
xmlparse()converts text into XML values, whilexmlserialize()performs the reverse operation — converting XML into a textual representation suitable for exchanging data between applications or storing it outside the database.
Character encoding
Ensure correct character encoding when exchanging XML data between client and server:
-
In text mode, Greengage DB automatically converts character data between the client and server encodings. This includes XML values, so any encoding declaration inside the XML data may no longer match the actual encoding after conversion. To handle this, Greengage DB ignores encoding declarations in XML input and assumes the data is in the current server encoding. Therefore, client applications must send XML data in the current client encoding or convert it to that encoding before sending it to the server. When a client retrieves XML values from the server, no encoding declaration is included in the result; the client should interpret the returned data as being in the current client encoding.
-
In binary mode, no character set conversion is performed. XML encoding declarations are honored, and if missing, the data is assumed to be in UTF-8 (as required by the XML standard). On output, an encoding declaration is included unless the client encoding is UTF-8.
For more details on text and binary modes, see the PostgreSQL documentation on libpq command execution functions.
XML processing is more efficient and less error-prone when the XML data, client, and server all use the same encoding. Since Greengage DB processes XML internally in UTF-8, using UTF-8 for the server encoding is recommended.
XML data access
The XML data type does not support comparison operators (such as <, >, or =) since there is no standard way to compare two XML documents.
As a result, you cannot search for rows by directly comparing XML columns with specific values.
To identify XML rows, include a separate key field, such as an ID.
If needed, XML values can be cast to text for comparison, but this compares only the textual form, not the XML structure.
Because there are no comparison operators, you cannot create indexes directly on XML columns.
You can speed up searches by creating an index on a derived expression, such as a text cast of the XML value or the result of an XPath query.
For an example of creating and using such indexes, see Index XML data.
Store XML data
Create a table for XML data
In this example, the customer_profiles table defines two XML columns — one for complete XML documents (profile) and another for XML content fragments (details):
CREATE TABLE customer_profiles
(
id SERIAL,
profile XML,
details XML
)
DISTRIBUTED BY (id);
To learn more about a table definition, see Tables overview.
Load XML data
Once the table is created, you can insert XML data using the xmlparse() function.
The xmlparse() function converts text into an XML value and allows you to specify whether the input is a complete document (document) or a content fragment (content).
In the example below, the profile column stores full XML documents representing customer information, while the details column stores XML content fragments with a top-level tags element containing nested elements:
INSERT INTO customer_profiles (profile, details)
VALUES (xmlparse(document
'<customer id="c001">
<name>Alice Johnson</name>
<email>alice@example.com</email>
<active>true</active>
<registered>2020-01-15</registered>
</customer>'
),
xmlparse(content
'<tags>
<membership>premium</membership>
<membership>newsletter</membership>
<region>US</region>
</tags>'
)),
(xmlparse(document
'<customer id="c002">
<name>Bob Smith</name>
<email>bob@example.com</email>
<active>false</active>
<registered>2019-06-30</registered>
</customer>'
),
xmlparse(content
'<tags>
<membership>trial</membership>
<region>EU</region>
</tags>'
)),
(xmlparse(document
'<customer id="c003">
<name>Charlie Brown</name>
<email>charlie@example.com</email>
<active>true</active>
</customer>'
),
NULL);
Greengage DB validates XML data for well-formedness on insert and update. It does not perform DTD or XML Schema (XSD) validation. If the XML is not well-formed, the operation is rejected and an error is returned, as illustrated below. For schema conformance, validation should be handled at the application level.
INSERT INTO customer_profiles (profile, details)
VALUES (xmlparse(document
'<customer>
<name>John Doe</name>
<email>john@example.com</email>'
),
NULL);
The operation produces the following error:
ERROR: invalid XML document (seg1 slice1 192.168.1.30:10001 pid=2537)
DETAIL: line 1: Premature end of data in tag customer line 1
<customer><name>John Doe</name><email>john@example.com</email>
^
Greengage DB also allows you to transform XML data before loading it into tables.
See Transform external data for an example using XSLT or Python to convert XML source data and load it via gpfdist or gpload.
Query and filter XML data
The primary function for retrieving data from XML columns in Greengage DB is xpath().
It returns an array of all nodes that match a given XPath expression.
To get a single value, you can select the first element with [1] and cast it to text using ::TEXT.
For example, execute the following query to retrieve the /customer/name and /customer/email elements from the profile column:
SELECT (xpath('/customer/name/text()', profile))[1]::TEXT AS name,
(xpath('/customer/email/text()', profile))[1]::TEXT AS email
FROM customer_profiles;
The result should look like this:
name | email ---------------+--------------------- Alice Johnson | alice@example.com Bob Smith | bob@example.com Charlie Brown | charlie@example.com (3 rows)
The same method applies to XML content fragments.
This example also shows extracting /tags/region from the details column:
SELECT (xpath('/customer/name/text()', profile))[1]::TEXT AS name,
(xpath('/customer/email/text()', profile))[1]::TEXT AS email,
(xpath('/tags/region/text()', details))[1]::TEXT AS region
FROM customer_profiles;
The result looks as follows:
name | email | region ---------------+---------------------+-------- Charlie Brown | charlie@example.com | Alice Johnson | alice@example.com | US Bob Smith | bob@example.com | EU (3 rows)
You can filter rows based on the presence of an element using xpath_exists():
SELECT (xpath('/customer/name/text()', profile))[1]::TEXT AS name,
(xpath('/customer/registered/text()', profile))[1]::TEXT AS registered
FROM customer_profiles
WHERE xpath_exists('/customer/registered', profile);
The result is:
name | registered ---------------+------------ Alice Johnson | 2020-01-15 Bob Smith | 2019-06-30 (2 rows)
Learn more about retrieving rows from tables and views in the SELECT command overview topic.
Index XML data
When querying large tables, searches on XML values can be slow because XML columns do not support direct comparison operators.
As a result, you cannot create indexes directly on XML columns.
To speed up searches, you can use an index expression on a derived value, such as the result of xpath() cast to text.
As index expressions are supported by the Postgres planner only, turn off the GPORCA query optimizer:
SET optimizer = off;
Turn the enable_seqscan option off:
SET enable_seqscan = off;
Next, create an index on the /customer/name element in the profile XML column:
CREATE INDEX customer_name_gin_idx
ON customer_profiles USING gin (cast(xpath('string(/customer/name)', profile) AS TEXT[]));
Update table statistics:
ANALYZE customer_profiles;
You can then use the indexed expression in a query.
For example, to find a customer named Charlie Brown:
EXPLAIN (COSTS OFF)
SELECT *
FROM customer_profiles
WHERE cast(xpath('string(/customer/name)', profile) AS TEXT[]) @> ARRAY['Charlie Brown'];
The output indicates that the query uses the index expression:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather Motion 4:1 (slice1; segments: 4)
-> Bitmap Heap Scan on customer_profiles
Recheck Cond: ((xpath('string(/customer/name)'::text, profile, '{}'::text[]))::text[] @> '{"Charlie Brown"}'::text[])
-> Bitmap Index Scan on customer_name_gin_idx
Index Cond: ((xpath('string(/customer/name)'::text, profile, '{}'::text[]))::text[] @> '{"Charlie Brown"}'::text[])
Optimizer: Postgres query optimizer
(6 rows)
Produce XML data
CREATE TABLE customers
(
id int,
name text,
email text
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
INSERT INTO customers(id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com'),
(2, 'Bob Smith', 'bob@example.com'),
(3, 'Charlie Brown', 'charlie@example.com');
Greengage DB provides a set of functions for producing XML content from SQL data. These functions are particularly suitable for formatting query results into XML documents for processing in client applications.
Use xmlelement() to create an XML element with the specified name, attributes, and content:
SELECT xmlelement(name customer, name) AS xml_customer
FROM customers;
The result looks like this:
xml_customer ------------------------------------ <customer>Charlie Brown</customer> <customer>Alice Johnson</customer> <customer>Bob Smith</customer> (3 rows)
xmlelement() also supports creating nested elements and adding attributes using xmlattributes():
SELECT xmlelement(
name customer,
xmlattributes(id AS id),
xmlelement(name name, name),
xmlelement(name email, email)
) AS xml_customer
FROM customers;
The result should look like this:
xml_customer ------------------------------------------------------------------------------------------ <customer id="3"><name>Charlie Brown</name><email>charlie@example.com</email></customer> <customer id="1"><name>Alice Johnson</name><email>alice@example.com</email></customer> <customer id="2"><name>Bob Smith</name><email>bob@example.com</email></customer> (3 rows)
To convert the XML value into a text representation suitable for exporting or saving, use xmlserialize():
SELECT xmlserialize(
content xmlelement(
name customer,
xmlattributes(id AS id),
xmlelement(name name, name),
xmlelement(name email, email)
) as TEXT
) AS xml_text
FROM customers;
The result is:
xml_text ------------------------------------------------------------------------------------------ <customer id="1"><name>Alice Johnson</name><email>alice@example.com</email></customer> <customer id="3"><name>Charlie Brown</name><email>charlie@example.com</email></customer> <customer id="2"><name>Bob Smith</name><email>bob@example.com</email></customer> (3 rows)
See also Produce XML content for functions to construct XML and Map tables to XML for generating XML from tables and queries.
Reference: XML functions and predicates
Serialize/deserialize XML
xmlparse()
Function: xmlparse({ document | content } <value>)
Return type: XML
Produces XML from character data.
The first argument specifies the type of <value>:
-
document— XML document. -
content— XML content fragment.
Example:
SELECT xmlparse(document
'<customer id="c001">
<name>Alice Johnson</name>
<email>alice@example.com</email>
</customer>'
);
Result:
xmlparse
-----------------------------------------------------
<customer id="c001"> +
<name>Alice Johnson</name> +
<email>alice@example.com</email>+
</customer>
(1 row)
You can also use ::XML — a PostgreSQL-specific conversion:
SELECT '<customer id="c001">
<name>Alice Johnson</name>
<email>alice@example.com</email>
</customer>'::XML;
The XML type conversion does not validate input values against a document type declaration (DTD) or other XML schema languages.
xmlserialize()
Function: xmlserialize({ document | content } <value> AS <type>)
Return type: <type>
Creates a character string value from XML.
The first argument specifies the type of <value>:
-
document— XML document. -
content— XML content fragment.
<type> can be the CHARACTER, CHARACTER VARYING, or TEXT data type.
Example:
SELECT xmlserialize(content xmlelement(name customer,
xmlelement(name name, 'Alice Johnson'),
xmlelement(name email, 'alice@example.com'))
as TEXT
) AS xml_text;
Result:
xml_text --------------------------------------------------------------------------------- <customer><name>Alice Johnson</name><email>alice@example.com</email></customer> (1 row)
Produce XML content
Binary values (type BYTEA) are encoded in XML according to the xmlbinary option, either base64 or hex.
This applies when functions like xmlelement() or xmlforest() convert BYTEA to XML.
The choice mainly depends on client application requirements.
xmlcomment()
Function: xmlcomment(<text>)
Return type: XML
Creates an XML comment node containing the specified text.
Example:
SELECT xmlcomment('Hello, world!');
Result:
xmlcomment ---------------------- <!--Hello, world!--> (1 row)
xmlelement()
Function: xmlelement(name <name> [, xmlattributes(<value> [AS <attname>] [, …])] [, <content>, …])
Return type: XML
Creates an XML element with the specified name, attributes, and content.
Example 1:
SELECT xmlelement(name email, 'alice@example.com');
Result:
xmlelement ---------------------------------- <email>alice@example.com</email> (1 row)
Example 2:
SELECT xmlelement(name customer,
xmlattributes('c001' AS id),
xmlelement(name name, 'Alice Johnson'),
xmlelement(name email, 'alice@example.com'));
Result:
xmlelement ------------------------------------------------------------------------------------------- <customer id="c001"><name>Alice Johnson</name><email>alice@example.com</email></customer> (1 row)
xmlforest()
Function: xmlforest(<content> [AS <name>] [, …])
Return type: XML
Produces an XML forest (sequence) of elements using the given names and content.
Example:
SELECT xmlforest('Alice Johnson' AS name,
'alice@example.com' AS email,
'true' AS active);
Result:
xmlforest --------------------------------------------------------------------------------- <name>Alice Johnson</name><email>alice@example.com</email><active>true</active> (1 row)
xmlroot()
Function: xmlroot(<xml>, version <text> | no value [, standalone yes|no|no value])
Return type: XML
Alters the properties of the root node of an XML value.
If version is specified, it replaces the value in the root node’s version declaration.
If standalone is specified, it replaces the value in the root node’s standalone declaration.
Example:
SELECT xmlroot(xmlparse(document '<?xml version="1.1"?><content>abc</content>'),
version '1.0', standalone yes);
Result:
xmlroot -------------------------------------------------------------- <?xml version="1.0" standalone="yes"?><content>abc</content> (1 row)
xmlagg()
Function: xmlagg(<xml>)
Return type: XML
Aggregates XML values from multiple rows into a single XML value by concatenating them.
Example:
CREATE TABLE customers
(
id SERIAL,
details XML
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
INSERT INTO customers(details)
VALUES (xmlelement(name customer, xmlelement(name name, 'Alice Johnson'))),
(xmlelement(name customer, xmlelement(name name, 'Charlie Brown')));
SELECT xmlagg(details)
FROM customers;
Result:
xmlagg ------------------------------------------------------------------------------------------------ <customer><name>Alice Johnson</name></customer><customer><name>Charlie Brown</name></customer> (1 row)
XML predicates
The xmloption parameter controls whether text is treated as an XML document or as an XML content fragment.
It affects functions like xml_is_well_formed(), determining whether the XML is valid as a full document or as a content fragment.
IS DOCUMENT / IS NOT DOCUMENT
Expression: <xml> IS DOCUMENT, <xml> IS NOT DOCUMENT
Return type: BOOLEAN
Determines if an XML value is an XML document or content fragment.
Example:
SELECT xmlparse(document
'<customer id="c001">
<name>Alice Johnson</name>
<email>alice@example.com</email>
</customer>'
) IS DOCUMENT AS is_doc;
Result:
is_doc -------- t (1 row)
xmlexists()
Function: xmlexists(<text> PASSING [BY REF] <xml> [BY REF])
Return type: BOOLEAN
Determines whether the specified XPath expression returns any nodes from the given XML value.
Returns true if matching nodes are found, false if none are found, and null if either argument is null.
Example:
SELECT xmlexists('//customer[name="Bob Smith"]' PASSING BY REF
'
<customers>
<customer>
<name>Bob Smith</name>
</customer>
</customers>');
Result:
xmlexists ----------- t (1 row)
xml_is_well_formed()
Function: xml_is_well_formed(<text>)
Return type: BOOLEAN
Checks whether a text string is a well-formed XML document or content fragment.
The behavior depends on the xmloption configuration parameter: when set to document, the function checks for a well-formed XML document; when set to content, it checks for a well-formed XML content fragment.
Example 1:
SELECT xml_is_well_formed('<customer><name>Alice Johnson</name></customer>');
Result:
xml_is_well_formed -------------------- t (1 row)
Example 2:
SELECT xml_is_well_formed('<customer><name>Alice Johnson</name>');
Result:
xml_is_well_formed -------------------- f (1 row)
xml_is_well_formed_document()
Function: xml_is_well_formed_document(<text>)
Return type: BOOLEAN
Checks whether a text string is a well-formed XML document.
Example:
SELECT xml_is_well_formed_document('<customer><name>Bob Smith</name></customer>');
Result:
xml_is_well_formed_document ----------------------------- t (1 row)
xml_is_well_formed_content()
Function: xml_is_well_formed_content(<text>)
Return type: BOOLEAN
Checks whether a text string is a well-formed XML content fragment.
Example:
SELECT xml_is_well_formed_content('<name>Charlie Brown</name>');
Result:
xml_is_well_formed_content ---------------------------- t (1 row)
Process XML
xpath()
Function: xpath(<xpath>, <xml> [, <nsarray>])
Return type: XML[]
Returns an array of XML values corresponding to the XPath expression.
If the XPath expression returns a scalar value, xpath() returns a single-element array.
The second argument must be a well-formed XML document with a single root element.
The optional third argument is an array of namespace mappings.
It should be a two-dimensional text array where each subarray has exactly two elements: the first is the namespace alias, the second is the namespace URI.
Aliases do not need to match those used in the XML document; they are local to the xpath() function.
Example:
SELECT (xpath('/customer/email/text()',
'<customer>
<name>Alice Johnson</name>
<email>alice@example.com</email>
</customer>'
))[1]::TEXT AS email;
Result:
email ------------------- alice@example.com (1 row)
xpath_exists()
Function: xpath_exists(<xpath>, <xml> [, <nsarray>])
Return type: BOOLEAN
Determines whether the specified XPath expression matches any nodes in the given XML value.
This function behaves like the xmlexists() predicate, but also supports an optional namespace mapping argument.
Example 1:
SELECT xpath_exists('/customer/name',
'<customer>
<name>Alice Johnson</name>
<email>alice@example.com</email>
</customer>');
Result:
xpath_exists -------------- t (1 row)
Example 2:
SELECT xpath_exists('/cust:customer/cust:name',
'<cust:customer xmlns:cust="https://example.com/other">
<cust:name>Alice Johnson</cust:name>
<cust:email>alice@example.com</cust:email>
</cust:customer>',
ARRAY [ARRAY ['cust', 'https://example.com/customer']]);
Result:
xpath_exists -------------- f (1 row)
Map tables to XML
table_to_xml()
Function: table_to_xml(<table> REGCLASS, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)
Return type: XML
Generates an XML representation of a table specified by the <table> parameter.
Other arguments include:
-
The
<nulls>parameter determines whethernullvalues should be included in the output. -
If the
<tableforest>parameter isfalse, the resulting XML document has additional<row>nodes. -
The
<targetns>parameter specifies the XML namespace. If you do not need to define a namespace, pass an empty string.
Example:
CREATE TABLE customers
(
id int,
name text,
email text
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
INSERT INTO customers(id, name, email)
VALUES (1, 'Alice Johnson', 'alice@example.com'),
(2, 'Bob Smith', 'bob@example.com'),
(3, 'Charlie Brown', 'charlie@example.com');
SELECT table_to_xml('customers', true, false, '');
Result:
table_to_xml
-------------------------------------------------------------------
<customers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
+
<row> +
<id>2</id> +
<name>Bob Smith</name> +
<email>bob@example.com</email> +
</row> +
+
<row> +
<id>3</id> +
<name>Charlie Brown</name> +
<email>charlie@example.com</email> +
</row> +
+
<row> +
<id>1</id> +
<name>Alice Johnson</name> +
<email>alice@example.com</email> +
</row> +
+
</customers> +
(1 row)
query_to_xml()
Function: query_to_xml(<query> TEXT, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)
Return type: XML
Generates an XML representation of the result set produced by the SQL statement specified in the <query> parameter.
Unlike table_to_xml(), this function allows you to convert the output of an arbitrary query rather than a single table.
See also: table_to_xml().
Example:
SELECT query_to_xml('SELECT id, name FROM customers', true, false, '');
Result:
query_to_xml
---------------------------------------------------------------
<table xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
+
<row> +
<id>3</id> +
<name>Charlie Brown</name> +
</row> +
+
<row> +
<id>1</id> +
<name>Alice Johnson</name> +
</row> +
+
<row> +
<id>2</id> +
<name>Bob Smith</name> +
</row> +
+
</table> +
(1 row)
cursor_to_xml()
Function: cursor_to_xml(<cursor> REFCURSOR, <count> INTEGER, <nulls> BOOLEAN, <tableforest> BOOLEAN, <targetns> TEXT)
Return type: XML
Generates an XML representation of rows fetched from the specified <cursor>.
The <count> parameter determines how many rows to fetch.
This function is recommended for large result sets, as it processes the data incrementally rather than building the entire XML document in memory at once.
See also: table_to_xml().
Example:
BEGIN;
DECLARE customer_data_cur CURSOR FOR
SELECT id, name
FROM customers
ORDER BY id;
SELECT cursor_to_xml('customer_data_cur' ::refcursor, 2, true, true, '');
CLOSE customer_data_cur;
COMMIT;
Result:
cursor_to_xml
-------------------------------------------------------------
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
<id>1</id> +
<name>Alice Johnson</name> +
</row> +
+
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
<id>2</id> +
<name>Bob Smith</name> +
</row> +
+
(1 row)