CREATE EXTERNAL TABLE
Defines a new external table.
Synopsis
CREATE [READABLE] EXTERNAL [TEMPORARY | TEMP] TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table>)
LOCATION ('file://<seghost>[:<port>]/<path>/<file>' [, ...])
| ('gpfdist://<filehost>[:<port>]/<file_pattern>[#transform=<trans_name>]'
[, ...])
| ('gpfdists://<filehost>[:<port>]/<file_pattern>[#transform=<trans_name>]'
[, ...])
| ('pxf://<path-to-data>?PROFILE=<profile_name>[&SERVER=<server_name>][&<custom-option>=<value>[...]]')
| ('s3://<S3_endpoint>[:<port>]/<bucket_name>/[<S3_prefix>] [region=<S3-region>] [config=<config_file> | config_server=<url>]')
[ON MASTER]
FORMAT 'TEXT'
[( [HEADER]
[DELIMITER [AS] '<delimiter>' | 'OFF']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF']
[NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
[FILL MISSING FIELDS] )]
| 'CSV'
[( [HEADER]
[QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE NOT NULL <column> [, ...]]
[ESCAPE [AS] '<escape>']
[NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
[FILL MISSING FIELDS] )]
| 'CUSTOM' (Formatter=<formatter_specification>)
[ ENCODING '<encoding>' ]
[ [LOG ERRORS [PERSISTENTLY]] SEGMENT REJECT LIMIT <count>
[ROWS | PERCENT] ]
CREATE [READABLE] EXTERNAL WEB [TEMPORARY | TEMP] TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table>)
LOCATION ('http://<webhost>[:<port>]/<path>/<file>' [, ...])
| EXECUTE '<command>' [ON ALL
| MASTER
| <number_of_segments>
| HOST ['<segment_hostname>']
| SEGMENT <segment_id> ]
FORMAT 'TEXT'
[( [HEADER]
[DELIMITER [AS] '<delimiter>' | 'OFF']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF']
[NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
[FILL MISSING FIELDS] )]
| 'CSV'
[( [HEADER]
[QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE NOT NULL <column> [, ...]]
[ESCAPE [AS] '<escape>']
[NEWLINE [ AS ] 'LF' | 'CR' | 'CRLF']
[FILL MISSING FIELDS] )]
| 'CUSTOM' (Formatter=<formatter_specification>)
[ ENCODING '<encoding>' ]
[ [LOG ERRORS [PERSISTENTLY]] SEGMENT REJECT LIMIT <count>
[ROWS | PERCENT] ]
CREATE WRITABLE EXTERNAL [TEMPORARY | TEMP] TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table>)
LOCATION('gpfdist://<outputhost>[:<port>]/<filename>[#transform=<trans_name>]'
[, ...])
| ('gpfdists://<outputhost>[:<port>]/<file_pattern>[#transform=<trans_name>]'
[, ...])
FORMAT 'TEXT'
[( [DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF'] )]
| 'CSV'
[([QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE QUOTE <column> [, ...] | * ]
[ESCAPE [AS] '<escape>'] )]
| 'CUSTOM' (Formatter=<formatter_specification>)
[ ENCODING '<write_encoding>' ]
[ DISTRIBUTED BY ({<column> [<opclass>]}, [ ... ] ) | DISTRIBUTED RANDOMLY ]
CREATE WRITABLE EXTERNAL [TEMPORARY | TEMP] TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table>)
LOCATION('s3://<S3_endpoint>[:<port>]/<bucket_name>/[<S3_prefix>] [region=<S3-region>] [config=<config_file> | config_server=<url>]')
[ON MASTER]
FORMAT 'TEXT'
[( [DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF'] )]
| 'CSV'
[([QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE QUOTE <column> [, ...] | * ]
[ESCAPE [AS] '<escape>'] )]
CREATE WRITABLE EXTERNAL WEB [TEMPORARY | TEMP] TABLE <table_name>
( <column_name> <data_type> [, ...] | LIKE <other_table> )
EXECUTE '<command>'
FORMAT 'TEXT'
[( [DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[ESCAPE [AS] '<escape>' | 'OFF'] )]
| 'CSV'
[([QUOTE [AS] '<quote>']
[DELIMITER [AS] '<delimiter>']
[NULL [AS] '<null string>']
[FORCE QUOTE <column> [, ...] | * ]
[ESCAPE [AS] '<escape>'] )]
| 'CUSTOM' (Formatter=<formatter_specification>)
[ ENCODING '<write_encoding>' ]
[ DISTRIBUTED BY ({<column> [<opclass>]}, [ ... ] ) | DISTRIBUTED RANDOMLY ]
Description
CREATE EXTERNAL TABLE or CREATE EXTERNAL WEB TABLE creates a new readable external table definition in Greengage DB.
Readable external tables are typically used for fast, parallel data loading.
Once an external table is defined, you can query its data directly (and in parallel) using SQL commands.
For example, you can select, join, or sort external table data.
You can also create views for external tables.
DML operations (UPDATE, INSERT, DELETE, or TRUNCATE) are not allowed on readable external tables, and you cannot create indexes on readable external tables.
CREATE WRITABLE EXTERNAL TABLE or CREATE WRITABLE EXTERNAL WEB TABLE creates a new writable external table definition in Greengage DB.
Writable external tables are typically used for unloading data from the database into a set of files or named pipes.
Writable external web tables can also be used to output data to an executable program.
Writable external tables can also be used as output targets for Greengage DB parallel MapReduce calculations.
Once a writable external table is defined, data can be selected from database tables and inserted into the writable external table.
Writable external tables only allow INSERT operations.
SELECT, UPDATE, DELETE, or TRUNCATE are not allowed.
The main difference between regular external tables and external web tables is their data sources. Regular readable external tables access static flat files, whereas external web tables access dynamic data sources — either on a web server or by running OS commands or scripts.
See External tables overview for detailed information about working with external tables.
Parameters
| Parameter | Description |
|---|---|
READABLE | WRITABLE |
Specifies the type of external table, readable being the default. Readable external tables are used for loading data into Greengage DB. Writable external tables are used for unloading data |
WEB |
Creates a readable or writable external web table definition in Greengage DB. There are two forms of readable external web tables — those that access files via the HTTP protocol or those that access data by running OS commands. Writable external web tables output data to an executable program that can accept an input stream of data. External web tables are not rescannable during query execution. The |
TEMPORARY | TEMP |
If specified, creates a temporary readable or writable external table definition in Greengage DB. Temporary external tables exist in a special schema; you cannot specify a schema name when you create the table. Temporary external tables are automatically dropped at the end of a session. An existing permanent table with the same name is not visible to the current session while the temporary table exists, unless you reference the permanent table with its schema-qualified name |
table_name |
The name of the new external table |
column_name |
The name of a column to create in the external table definition. Unlike regular tables, external tables do not have column constraints or default values, so do not specify those |
LIKE <other_table> |
The |
data_type |
The data type of the column |
LOCATION ('<protocol>://[<host>[:<port>]]/<path>/<file>' [, …]) |
If you use the If you use the For readable external tables, specifies the URI of the external data source to be used to populate the external table or web table.
Regular readable external tables allow the 'gpfdist://filehost:8081/*' 'gpfdist://masterhost/my_load_file' 'file://seghost1/dbfast1/external/myfile.txt' 'http://intranet.example.com/finance/expenses.csv' For writable external tables, specifies the URI location of the 'gpfdist://outputhost:8081/data1.out', 'gpfdist://outputhost:8081/data2.out' With two With the |
ON MASTER |
Restricts all table-related operations to the Greengage DB master segment.
Permitted only on readable and writable external tables created with the NOTE
Be aware of potential resource impacts when reading from or writing to external tables you create with the |
EXECUTE '<command>' [ON …] |
Allowed for readable external web tables or writable external tables only.
For readable external web tables, specifies the OS command to be run by the segment instances.
The command can be a single OS command or a script.
The
For writable external tables, the command specified in the |
FORMAT 'TEXT | CSV' (<options>) |
When the For more information on formatting external data, refer to Format external data. If you use the |
FORMAT 'CUSTOM' (formatter=<formatter_specification>) |
Specifies a custom data format.
The For more information on custom formats, refer to Use custom formats and protocols for external tables. If you use the |
DELIMITER |
Specifies a single ASCII character that separates columns within each row (line) of data.
The default is a tab character in For the |
NULL |
Specifies the string that represents a As an example for the
|
ESCAPE |
Specifies the single character that is used for C escape sequences (such as |
NEWLINE |
Specifies the newline used in your data files — |
HEADER |
For readable external tables, specifies that the first line in the data file is a header row (contains the names of the table columns) and should not be included as data for the table. If using multiple data source files, all files must have a header row. For the The |
QUOTE |
Specifies the quotation character for |
FORCE NOT NULL |
In |
FORCE QUOTE |
In |
FILL MISSING FIELDS |
In both |
ENCODING '<encoding>' |
Character set encoding to use for the external table.
Specify a string constant (such as |
LOG ERRORS [PERSISTENTLY] |
This is an optional clause that can precede a The data is deleted when the external table is dropped unless you specify the keyword The error log data is accessed with the Greengage DB built-in SQL function If you use the See Notes for information about the error log information and built-in functions for viewing and managing error log information |
SEGMENT REJECT LIMIT <count> [ROWS | PERCENT] |
Runs a The limit for the number of initially rejected rows can be changed with the Greengage DB server configuration parameter |
DISTRIBUTED BY ({<column> [<opclass>]}, [ … ] ) |
Used to declare the Greengage DB distribution policy for a writable external table.
By default, writable external tables are distributed randomly.
If the source table you are exporting data from has a hash distribution policy, defining the same distribution key column(s) and operator class(es), |
Examples
Start the gpfdist file server program in the background on port 8081 serving files from directory /var/data/staging:
$ gpfdist -p 8081 -d /var/data/staging -l /home/<gpadmin>/log &
Create a readable external table named ext_customer using the gpfdist protocol and any text formatted files (*.txt) found in the gpfdist directory.
The files are formatted with a pipe (|) as the column delimiter and an empty space as NULL.
Also access the external table in single row error isolation mode:
CREATE EXTERNAL TABLE ext_customer
(
id int,
name text,
sponsor text
)
LOCATION ( 'gpfdist://filehost:8081/*.txt' )
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
LOG ERRORS SEGMENT REJECT LIMIT 5;
Create the same readable external table definition as above, but with CSV formatted files:
CREATE EXTERNAL TABLE ext_customer
(
id int,
name text,
sponsor text
)
LOCATION ( 'gpfdist://filehost:8081/*.csv' )
FORMAT 'CSV' ( DELIMITER ',' );
Create a readable external table named ext_expenses using the file protocol and several CSV formatted files that have a header row:
CREATE EXTERNAL TABLE ext_expenses
(
name text,
date date,
amount float4,
category text,
description text
)
LOCATION (
'file://seghost1/dbfast/external/expenses1.csv',
'file://seghost1/dbfast/external/expenses2.csv',
'file://seghost2/dbfast/external/expenses3.csv',
'file://seghost2/dbfast/external/expenses4.csv',
'file://seghost3/dbfast/external/expenses5.csv',
'file://seghost3/dbfast/external/expenses6.csv'
)
FORMAT 'CSV' ( HEADER );
Create a readable external web table that runs a script once per segment host:
CREATE EXTERNAL WEB TABLE log_output
(
linenum int,
message text
)
EXECUTE '/var/load_scripts/get_log_data.sh' ON HOST
FORMAT 'TEXT' (DELIMITER '|');
Create a writable external table named sales_out that uses gpfdist to write output data to a file named sales.out.
The files are formatted with a pipe (|) as the column delimiter and an empty space as NULL:
CREATE WRITABLE EXTERNAL TABLE sales_out
(
LIKE sales
)
LOCATION ('gpfdist://etl1:8081/sales.out')
FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
DISTRIBUTED BY (txn_id);
Create a writable external web table that pipes output data received by the segments to an executable script named to_adreport_etl.sh:
CREATE WRITABLE EXTERNAL WEB TABLE campaign_out
(
LIKE campaign
)
EXECUTE '/var/unload_scripts/to_adreport_etl.sh'
FORMAT 'TEXT' (DELIMITER '|');
Use the writable external table defined above to unload selected data:
INSERT INTO campaign_out
SELECT *
FROM campaign
WHERE customer_id = 123;
Notes
When you specify the LOG ERRORS clause, Greengage DB captures errors that occur while reading the external table data.
You can view and manage the captured error log data.
The functions to manage log data depend on whether the data is persistent (the PERSISTENTLY keyword is used with the LOG ERRORS clause).
-
Functions that manage non-persistent error log data from external tables that were defined without the
PERSISTENTLYkeyword.-
The built-in SQL function
gp_read_error_log('table_name')displays error log information for an external table. This example displays the error log data from the external tableext_expenses:SELECT * from gp_read_error_log('ext_expenses');The function returns no data if you created the external table with the
LOG ERRORS PERSISTENTLYclause, or if the external table does not exist. -
The built-in SQL function
gp_truncate_error_log('table_name')deletes the error log data fortable_name. This example deletes the error log data captured from the external tableext_expenses:SELECT gp_truncate_error_log('ext_expenses');Dropping the table also deletes the table’s log data. The function does not truncate log data if the external table is defined with the
LOG ERRORS PERSISTENTLYclause. The function returnsFALSEif the table does not exist.
-
-
Functions that manage persistent error log data from external tables that were defined with the
PERSISTENTLYkeyword.NOTEThe functions that manage persistent error log data from external tables are defined in the file $GPHOME/share/postgresql/contrib/gpexterrorhandle.sql. The functions must be installed in the databases that use persistent error log data from an external table. This
psqlcommand installs the functions into the databasetestdb.$ psql -d test -U gpadmin -f $GPHOME/share/postgresql/contrib/gpexterrorhandle.sql-
The SQL function
gp_read_persistent_error_log('table_name')displays persistent log data for an external table. The function returns no data if you created the external table without thePERSISTENTLYkeyword. The function returns persistent log data for an external table even after the table has been dropped. -
The SQL function
gp_truncate_persistent_error_log('table_name')truncates persistent log data for a table. For persistent log data, you must manually delete the data. Dropping the external table does not delete persistent log data.
-
-
These items apply to both non-persistent and persistent error log data and the related functions:
-
The
gp_read_*functions require theSELECTprivilege on the table. -
The
gp_truncate_*functions require owner privilege on the table. -
You can use the
*wildcard character to delete error log information for existing tables in the current database. Specify the string *.* to delete all database error log information, including error log information that was not deleted due to previous database issues. If*is specified, database owner privilege is required. If *.* is specified, operating system superuser privilege is required. Non-persistent and persistent error log data must be deleted with their respectivegp_truncate_*functions.
-
When multiple Greengage DB external tables are defined with the gpfdist, gpfdists, or file protocol and access the same named pipe on a Linux system, Greengage DB restricts access to the named pipe to a single reader.
An error is returned if a second reader attempts to access the named pipe.
Compatibility
CREATE EXTERNAL TABLE is a Greengage DB extension.
The SQL standard makes no provisions for external tables.