COPY command
The COPY
command is used for copying data between Greengage DB tables and standard file system files, program input or output, or stdin / stdout.
COPY
can be used with regular tables, writable external tables, or the results of a SELECT
query, not with readable external tables or views.
Unless ON SEGMENT is specified, COPY
is not parallel (data is loaded in a single process using the Greengage DB master instance), so using it is only recommended for small data files.
You must have the SELECT
privilege on the table whose values are read by COPY TO
, and the INSERT
privilege on the table into which values are inserted by COPY FROM
.
It is sufficient to have column privileges on the columns listed in the command.
To learn more about the Greengage DB access control system, see Roles and privileges.
A non-superuser can only run COPY FROM STDIN
and COPY TO STDOUT
.
Learn more in Load data from stdin / to stdout.
Syntax overview
The COPY
command syntax is provided below.
See Clauses and parameters reference for the complete description of clauses and parameters.
COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>' | STDIN}
[ ON SEGMENT ]
[IGNORE EXTERNAL PARTITIONS]
[[LOG ERRORS]
SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
[ [ WITH ] (
FORMAT <'text' | 'csv' | 'binary'>,
OIDS [ <boolean> ],
FREEZE [ <boolean> ],
DELIMITER '<delimiter_character>',
NULL '<null string>',
HEADER [ <boolean> ],
QUOTE '<quote_character>',
NEWLINE '<newline_character>',
ESCAPE '<escape_character>',
FORCE_QUOTE { ( <column_name> [, ...] ) | * },
FORCE_NOT_NULL ( <column_name> [, ...] ),
FORCE_NULL ( <column_name> [, ...] ),
ENCODING '<encoding_name>',
FILL MISSING FIELDS
) ]
COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO {'<filename>' | PROGRAM '<command>' | STDOUT}
[ ON SEGMENT ]
[IGNORE EXTERNAL PARTITIONS]
[[LOG ERRORS]
SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
[ [ WITH ] (
FORMAT <'text' | 'csv' | 'binary'>,
OIDS [ <boolean> ],
FREEZE [ <boolean> ],
DELIMITER '<delimiter_character>',
NULL '<null string>',
HEADER [ <boolean> ],
QUOTE '<quote_character>',
NEWLINE '<newline_character>',
ESCAPE '<escape_character>',
FORCE_QUOTE { ( <column_name> [, ...] ) | * },
FORCE_NOT_NULL ( <column_name> [, ...] ),
FORCE_NULL ( <column_name> [, ...] ),
ENCODING '<encoding_name>',
FILL MISSING FIELDS
) ]
When enumerating options in the WITH
clause, replace spaces in option names with underscores (_
), for example, FILL_MISSING_FIELDS
, ON_SEGMENT
.
Alternatively, you can omit WITH
and specify options separately.
The following two commands are equivalent:
-
Using
WITH
:COPY customers FROM '/tmp/customers_<SEGID>.csv' WITH ( FORMAT CSV, HEADER, ON_SEGMENT );
-
Not using
WITH
:COPY customers FROM '/tmp/customers_<SEGID>.csv' CSV HEADER ON SEGMENT;
Usage examples
Load data from files / to files
To directly read from or write to a file located on the Greengage DB master host, run COPY with the provided file name (<filename>
).
COPY FROM
lets you copy data from a file to a table and append it to the already existing data.
COPY TO
lets you copy the contents of a table to a file (or multiple files based on the segment ID as described in Use ON SEGMENT clause).
COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM '<filename>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO '<filename>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
If ON SEGMENT
is not used, the file name must be specified as an absolute path on the master host or a relative path to the master data directory.
Otherwise, if ON SEGMENT
is used, the file name must be specified as a path on the Greengage DB segment host.
The files must be accessible to and readable or writable by the Greengage DB system user (the user ID the server runs as), and not the client.
Only database superusers are permitted to directly use COPY
with files, because this allows reading or writing any file that the server has privileges to access.
On Windows, use an E'' escape string constant and double any backslashes used in the path name.
Examples
In these examples, the COPY
command is used for copying the customers.csv file’s content into the customers
table and the table’s content into a file.
-
In the /tmp directory on the master host, create the customers.csv file:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
COPY FROM
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
):COPY customers FROM '/tmp/customers.csv' WITH (FORMAT CSV, HEADER);
-
Query the
customers
table:TABLE customers;
The output should look as follows:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 1 | John | Doe | john.doe@example.com | 123 Elm Street 2 | Jane | Smith | jane.smith@example.com | 456 Oak Street 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Empty the
customers
table and populate it with data as follows:TRUNCATE TABLE customers; INSERT INTO customers (id, first_name, last_name, email, address) VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'), (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'), (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'), (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');
-
Run the
COPY TO
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
):COPY customers TO '/tmp/customers.csv' WITH (FORMAT CSV, HEADER);
-
Review the contents of the customers.csv file created in /tmp:
$ cat /tmp/customers.csv
The output should look as follows:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
Load data by running a program
To load or unload table data via a program, run COPY with a shell command (<command>
) specified in the PROGRAM
clause:
COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM PROGRAM '<command>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO PROGRAM '<command>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
The command must be specified as if entered directly on the Greengage DB master host (if ON SEGMENT
is not used) and must be executable by the superuser.
In COPY FROM
, the input is read from standard output of the command.
In COPY TO
, the output is written to the standard input of the command.
When ON SEGMENT is specified, the command must be executable on all Greengage DB primary segment hosts by the Greengage DB superuser, and the <SEGID>
value, which is substituted with the actual segment ID at runtime, is required in the command.
The command is invoked by a shell. When passing arguments to the shell, strip or escape any special characters that have a special meaning for the shell. For security reasons, it is best to use a fixed command string or at least avoid passing any user input in the string.
Examples
In these examples, the COPY
command reads and writes the customers.csv file’s contents via cat
.
-
In the /tmp directory on the master host, create the customers.csv file:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
COPY FROM
command. In thePROGRAM
clause, provide thecat
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
).COPY customers FROM PROGRAM 'cat /tmp/customers.csv' WITH (FORMAT CSV, HEADER);
-
Query the
customers
table:TABLE customers;
The output should look as follows:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 1 | John | Doe | john.doe@example.com | 123 Elm Street 2 | Jane | Smith | jane.smith@example.com | 456 Oak Street 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Empty the
customers
table and populate it with data as follows:TRUNCATE TABLE customers; INSERT INTO customers (id, first_name, last_name, email, address) VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'), (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'), (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'), (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');
-
Run the
COPY TO
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
):COPY customers TO PROGRAM 'cat > /tmp/customers.csv' WITH (FORMAT CSV, HEADER);
-
Review the contents of the customers.csv file created in /tmp:
$ cat /tmp/customers.csv
The output should look as follows:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
Load data from stdin / to stdout
To load or unload table data through standard input (stdin) or standard output (stdout), run COPY with the STDIN
/ STDOUT
clause provided as follows:
COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM STDIN
[ [ WITH ] ( <option> [, ...] ) ]
COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO STDOUT
[ [ WITH ] ( <option> [, ...] ) ]
STDIN
specifies that the input comes from the client application.
STDOUT
specifies that output goes to the client application.
Data is transmitted via the connection between the client and the master.
The ON SEGMENT clause is not supported with STDIN
or STDOUT
.
To be able to pipe data in and out of COPY
, you can run it as part of the psql
command directly from the command line as described in Execute commands non-interactively.
Use the -d
option to specify the database to connect to (<database>
), and the -c
option to specify the command to execute, for example:
$ psql -d <database> -c 'COPY <table_name> FROM STDIN WITH (FORMAT CSV)'
You can use the \copy
meta-command to perform a frontend (client) copy as described in Load data using \copy.
Examples
In these examples, the COPY
command is used for copying the user-provided values into the customers
table via stdin, and the contents of the customers
table via stdout.
Data input and output are performed via the client terminal.
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
COPY FROM
command. In theWITH
clause, setCSV
as the data format.COPY customers FROM STDIN WITH (FORMAT CSV);
-
Greengage DB displays an interactive input prompt. Type in the desired values separated by a comma. Start each data row on a new line and finalize input by typing
\.
on a separate line:1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street \.
-
Query the
customers
table:TABLE customers;
The output should look as follows:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 1 | John | Doe | john.doe@example.com | 123 Elm Street 2 | Jane | Smith | jane.smith@example.com | 456 Oak Street 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Empty the
customers
table and populate it with data as follows:TRUNCATE TABLE customers; INSERT INTO customers (id, first_name, last_name, email, address) VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'), (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'), (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'), (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');
-
Run the
COPY TO
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
):COPY customers TO STDOUT WITH (FORMAT CSV, HEADER);
The output should look as follows:
id,first_name,last_name,email,address 3,Bob,Brown,bob.brown@example.com,789 Pine Street 1,John,Doe,john.doe@example.com,123 Elm Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street
Load data using \copy
You can use the \copy meta-command to perform a frontend (client) copy.
\copy
invokes a regular COPY FROM STDIN / COPY TO STDOUT command and then loads or stores the data in a file accessible to the psql
client.
Reading or writing the specified file is performed not by the server but rather the psql
utility, which routes the data between the server and the local file system.
Therefore, a file must reside on the host where the psql
client runs and must be accessible to the user running the client.
\copy
is only available when you are connected to Greengage DB via psql
as described in Connect to Greengage DB via psql.
You cannot run \copy
when using any other database client application.
The \copy
command syntax is similar to the one of the COPY command:
\copy { table [ ( <column_list> ) ] | ( <query> ) }
{ from | to }
{ 'filename' | program 'command' | stdin | stdout | pstdin | pstdout }
[ [ with ] ( <option> [, ...] ) ]
When using \copy from stdin
, the backend accepts lines of data until a single line only containing a backslash-period (\.
) is entered or the stream reaches EOF
.
For \copy to stdout
, output is sent to the same place as the psql
command output.
To read or write psql
's standard input or output regardless of the current command source or the -f
or -o
options, use from pstdin
or to pstdout
.
The \copy
command does not support ON SEGMENT
.
Examples
In these examples, the \copy
command is used for copying the customers.csv file’s content into the customers
table, and the table’s content into a file.
-
In the /tmp directory on the master host, create the customers.csv file:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
\copy from
command. In thewith
clause, setcsv
as the data format and specify that the file contains a header row (header
):\copy customers from '/tmp/customers.csv' with (format csv,header)
-
Query the
customers
table:TABLE customers;
The output should look as follows:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 1 | John | Doe | john.doe@example.com | 123 Elm Street 2 | Jane | Smith | jane.smith@example.com | 456 Oak Street 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Empty the
customers
table and populate it with data as follows:TRUNCATE TABLE customers; INSERT INTO customers (id, first_name, last_name, email, address) VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'), (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'), (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'), (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');
-
Run the
\copy to
command. In thewith
clause, setcsv
as the data format and specify that the file contains a header row (header
):\copy customers to '/tmp/customers.csv' with (format csv,header)
-
Review the contents of the customers.csv file created in /tmp:
$ cat /tmp/customers.csv
The output should look as follows:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street 2,Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
Use ON SEGMENT clause
The ON SEGMENT
clause of the COPY command lets you specify individual segment data files on the segment hosts.
Each file contains the table data that is managed by the primary segment.
For example, when copying data to files from a table with a COPY TO … ON SEGMENT
command, a file is created on the segment host for each segment instance on the host.
Files are created on the segment hosts, rather than on the master, as they would be in a standard COPY
operation.
The COPY
command does not copy data from or to mirror segment instances and segment data files.
The ON SEGMENT
clause can be used for operations such as migrating data between clusters or performing a backup.
Segment data created by the ON SEGMENT
clause can be restored by tools such as gpfdist, which is useful for high-speed data loading.
The general COPY … ON SEGMENT
syntax is as follows:
COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>'}
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO {'<filename>' | PROGRAM '<command>'}
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]
The STDIN
and STDOUT
clauses are not supported with ON SEGMENT
.
When specifying an absolute path and file name of a source data file either directly (<filename>
) or as part of a command (<command>
), use the <SEG_DATA_DIR>
and <SEGID>
string literals with the following syntax:
COPY <table>
[TO|FROM] '<SEG_DATA_DIR>/<datafilename><SEGID>_<suffix>'
ON SEGMENT;
where:
-
<SEG_DATA_DIR>
— the absolute path of the segment instance data directory. Using it is optional: an absolute path can be provided instead. -
<SEGID>
— the content ID number of the segment instance.<SEGID>
is a required part of the file name whenON SEGMENT
is specified. WithCOPY FROM
, specify the segment instance content ID in the name of the source file and place that file on the segment host. There must be a file for each primary segment instance on each host.
When the COPY
operation is run, the string literals are substituted with the actual segment IDs and the paths of the segment data directories.
The angle brackets (<
and >
) are part of the string literal.
Examples
In these examples, the COPY
command is used for copying the content of four files, customers_0.csv — customers_3.csv, into the customers
table, and the table’s content into files.
The files are located in the /tmp directory on segment hosts.
To view your segments configuration and the location of data directories, query the gp_segment_configuration
system table:
SELECT content,hostname,datadir FROM gp_segment_configuration;
The output should look as follows:
content | hostname | datadir ---------+-------------+----------------------- -1| mdw | /data1/master/gpseg-1 0| sdw1 | /data1/primary/gpseg0 0| sdw2 | /data1/mirror/gpseg0 1| sdw1 | /data1/primary/gpseg1 1| sdw2 | /data1/mirror/gpseg1 2| sdw2 | /data1/primary/gpseg2 2| sdw1 | /data1/mirror/gpseg2 3| sdw2 | /data1/primary/gpseg3 3| sdw1 | /data1/mirror/gpseg3 -1| smdw | /data1/master/gpseg-1
-
In the /tmp directory of each segment host, create the customers_<SEGID>.csv files, with <SEGID> corresponding to the current segment’s content ID:
-
customers_0.csv:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street
-
customers_1.csv:
id,first_name,last_name,email,address 2,Jane,Smith,jane.smith@example.com,456 Oak Street
-
customers_2.csv:
id,first_name,last_name,email,address 3,Bob,Brown,bob.brown@example.com,789 Pine Street
-
customers_3.csv:
id,first_name,last_name,email,address 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
-
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
COPY FROM
command. SetCSV
as the data format and specify that the file contains a header row (HEADER
):COPY customers FROM '/tmp/customers_<SEGID>.csv' CSV HEADER ON SEGMENT;
-
Query the
customers
table:TABLE customers;
The output should look as follows:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 1 | John | Doe | john.doe@example.com | 123 Elm Street 2 | Jane | Smith | jane.smith@example.com | 456 Oak Street 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Empty the
customers
table and populate it with data as follows:TRUNCATE TABLE customers; INSERT INTO customers (id, first_name, last_name, email, address) VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'), (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'), (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'), (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');
-
Run the
COPY TO
command.COPY customers TO '/tmp/customers_<SEGID>.csv' CSV HEADER ON SEGMENT;
-
Review the contents of the customers_0.csv — customers_3.csv files created in /tmp on each segment host, for example:
$ cat /tmp/customers_0.csv && cat /tmp/customers_1.csv
$ cat /tmp/customers_2.csv && cat /tmp/customers_3.csv
The output should look as follows:
id,first_name,last_name,email,address 1,John,Doe,john.doe@example.com,123 Elm Street id,first_name,last_name,email,address 2,Jane,Smith,jane.smith@example.com,456 Oak Street id,first_name,last_name,email,address 3,Bob,Brown,bob.brown@example.com,789 Pine Street id,first_name,last_name,email,address 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
Run in single-row error isolation mode
By default, COPY
stops an operation at the first error: the operation fails and no data loads.
If you run COPY FROM
in single-row error isolation mode, Greengage DB skips rows that contain format errors and loads properly formatted rows.
To run a COPY FROM operation in single-row error isolation mode, use the SEGMENT REJECT LIMIT
clause as follows.
COPY <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>' | STDIN}
[ [ WITH ] ( <option> [, ...] ) ]
[LOG ERRORS] SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]
[ ON SEGMENT ]
During the load operation, the input rows containing formatting errors are discarded until the reject limit count (<count>
) is reached on any Greengage DB segment instance.
Notice that the error row count is for each Greengage DB segment, not for the entire load operation.
The reject limit count can be specified as a number of rows (the default) or percentage of total rows (from 1 to 100).
If percentage is specified (PERCENT
), each segment starts calculating the percentage of invalid rows only after the number of rows specified by the gp_reject_percent_threshold
parameter has been processed.
If the limit is not reached, all valid rows are loaded and invalid rows are discarded.
Single row error isolation mode applies only to rows in the input file that contain format errors.
If the data contains a constraint error such as violation of a NOT NULL
, check, or unique constraint, the operation fails and no data loads.
If the SEGMENT REJECT LIMIT
is not triggered first, the COPY
operation is stopped and rolled back if the first 1000 rows are rejected.
You can modify this value by using the gp_initial_bad_row_limit
configuration parameter.
To keep error rows for further examination, specify the LOG ERRORS
clause to capture the error log information.
The information is stored internally in Greengage DB.
To view it, use the gp_read_error_log()
built-in SQL function with the table name (<table_name>
) provided as argument:
SELECT gp_read_error_log('<table_name>');
gp_read_error_log
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
("2025-06-11 01:01:52.80594+03",customers,<stdin>,2,,"invalid input syntax for integer: ""John"", column id","John,Doe,john.doe@example.com,123 Elm Street",)
("2025-06-11 01:02:35.075181+03",customers,<stdin>,3,,"invalid input syntax for integer: ""Jane"", column id","Jane,Smith,jane.smith@example.com,456 Oak Street",)
(2 rows)
Example
In this example, the COPY
command is used for copying the customers.csv file’s content into the customers
table.
The content contains formatting errors, so the SEGMENT REJECT LIMIT
clause is used for running in single-row error isolation mode.
The LOG ERRORS
clause is used for logging the loading errors.
-
In the /tmp directory on the master host, create the customers.csv file. The first two data rows are misformatted and contain missing
id
values.id,first_name,last_name,email,address John,Doe,john.doe@example.com,123 Elm Street Jane,Smith,jane.smith@example.com,456 Oak Street 3,Bob,Brown,bob.brown@example.com,789 Pine Street 4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
-
Empty the
customers
table:TRUNCATE TABLE customers;
-
Run the
COPY FROM
command. In theWITH
clause, setCSV
as the data format and specify that the file contains a header row (HEADER
). Provide theLOG ERRORS
clause and setSEGMENT REJECT LIMIT
to3
:COPY customers FROM '/tmp/customers.csv' WITH (FORMAT CSV,HEADER) LOG ERRORS SEGMENT REJECT LIMIT 3;
-
Query the
customers
table:TABLE customers;
The output should contain two rows, with misformatted rows skipped:
id | first_name | last_name | email | address ----+------------+-----------+------------------------+------------------- 3 | Bob | Brown | bob.brown@example.com | 789 Pine Street 4 | Rob | Stuart | rob.stuart@example.com | 119 Willow Street
-
Call the
gp_read_error_log()
function on thecustomers
table to view the error log:SELECT t.linenum, t.errmsg FROM gp_read_error_log('customers') t;
The output should look as follows:
linenum | errmsg ---------+----------------------------------------------------- 2 | invalid input syntax for integer: "John", column id 3 | invalid input syntax for integer: "Jane", column id (2 rows)
Use binary format
When running the COPY
command, you can specify BINARY
as the format option to read or write data in a binary format rather than text.
The format is faster than the text and CSV formats but is less portable across machine architectures and Greengage DB versions.
The binary format is very data type specific.
For example, binary data output from a SMALLINT
column cannot be read into an INTEGER
column; in text format, such a scenario would work.
Using data formatting clauses (DELIMITER
, NULL
, HEADER
, QUOTE
, ESCAPE
, and ENCODING
) is not supported for the binary format.
You cannot run COPY FROM
in single-row error isolation mode if the data is in binary format.
The binary file format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are in network byte order (most significant byte first).
File header
The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area. The fixed fields are as follows:
-
Signature — 11-byte sequence
PGCOPY\n\377\r\n\0
. Note that the zero byte is a required part of the signature. The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes. -
Flags field — 32-bit integer bit mask to denote important aspects of the file format. Bits are numbered from 0 (least significant bit, LSB) to 31 (most significant bit, MSB). Note that this field is stored in network byte order, as are all the integer fields used in the file format. Bits 16-31 are reserved to denote critical file format issues; a reader should cancel if it finds an unexpected bit set in this range. Bits 0-15 are reserved to signal backwards-compatible format issues; a reader should simply ignore any unexpected bits set in this range. Currently only one flag is defined, and the rest must be zero.
Bit 16:
1
if data has OIDs,0
if not.OID
system columns are no longer supported, but the format still contains the indicator. -
Header extension area length — 32-bit integer, length in bytes of remainder of header, not including self. Currently, this is zero, and the first tuple follows immediately. Future changes to the format might allow additional data to be present in the header. A reader should silently skip over any unknown header extension data. The header extension area is envisioned to contain a sequence of self-identifying chunks. The flags field is not intended to tell readers what is in the extension area. Specific design of header extension contents is left for a later release.
Tuples
Each tuple begins with a 16-bit integer count of the number of fields in the tuple.
Presently, all tuples in a table have the same count, but that might not always be true.
Then, repeated for each field in the tuple, there is a 32-bit length word followed by that many bytes of field data.
The length word does not include itself, and can be zero.
As a special case, -1
indicates a NULL
field value.
No value bytes follow in the NULL
case.
There is no alignment padding or any other extra data between fields.
Presently, all data values in a binary-format file are assumed to be in the binary format (format code one). A custom extension may add a header field that allows per-column format codes to be specified.
If OIDs are included in the file, the OID
field immediately follows the field-count word.
It is a normal field except that it is not included in the field-count.
In particular, it has a length word — this will allow handling of 4-byte vs. 8-byte OIDs and will allow OIDs to be shown as NULL
if that ever proves desirable.
File trailer
The file trailer consists of a 16-bit integer word containing -1
.
This is easily distinguished from a tuple’s field-count word.
A reader should report an error if a field-count word is neither -1
nor the expected number of columns.
This provides an extra check against somehow getting out of sync with the data.
Clauses and parameters reference
Clause / Parameter | Description |
---|---|
BINARY |
Causes the data to be read or written in binary format rather than text. You can also specify binary format as part of the Learn more in Use binary format |
<table_name> |
The name of an existing table |
<column_name> |
An optional list of columns to be copied. If a list of columns is specified, If no column list is specified, all columns of the table are copied |
<query> |
A |
<filename> |
The path to the input or output file. An input file name can be an absolute or relative path; an output file name must be an absolute path. The file must be accessible to the master host. The specified path must refer to a location on the master host’s file system. Learn more in Load data from files / to files |
PROGRAM '<command>' |
A command to run. In Learn more in Load data by running a program |
STDIN | STDOUT |
The Learn more in Load data from stdin / to stdout |
WITH <option> |
Specifies additional options, which are described below. |
FORMAT <'text' | 'csv' | 'binary'> |
Defines the data format, which can be You can specify binary format by using the dedicated
To learn more about formatting source data, see Format external data |
OIDS [ <boolean> ] |
Specifies copying the OID for each row.
An error is raised if |
FREEZE [ <boolean> ] |
Requests copying the data with rows already frozen, just as they would be after running the Note that all other sessions can immediately see the data once it is successfully loaded. This violates the normal rules of MVCC visibility, so you should be aware of the potential problems this might cause. To learn more about the |
DELIMITER '<delimiter_character>' |
Designates a single ASCII character to act as a column delimiter. The delimiter must reside between any two data value fields but not at the beginning or the end of a row. You can also specify a non-printable ASCII character or a non-printable Unicode character, such as For text files, the default column delimiter is the horizontal TAB character ( For CSV files, the default column delimiter is the comma character ( See Format columns for details |
NULL '<null string>' |
Designates a string representing a null value, which indicates an unknown piece of data in a column or field. For text files, the default string is For CSV files, the default string is an empty value with no quotation marks. See Represent NULL values for details |
HEADER [ <boolean> ] |
Designates whether the data file contains a header row. If using multiple data source files, all off them must have a header row. The default is to assume that the input files do not have a header row |
QUOTE '<quote_character>' |
Specifies the quotation character, double quote ( |
NEWLINE '<newline_character>' |
Designates a character used as a newline character, which can be If not specified, the newline character detected at the end of the first line of data is used. See Format rows for details |
ESCAPE '<escape_character>' |
Designates a single character used as an escape character. For text files, the default escape character is a backslash ( For CSV files, the default escape character is a double quote ( See Escape characters for details |
FORCE_QUOTE { ( <column_name> [, …] ) | * } |
Enforces quoting for all non- If an asterisk ( |
FORCE_NOT_NULL ( <column_name> [, …] ) |
Treats values in each specified column as if they were quoted. Since the default null string is an empty unquoted string, this causes missing values to be evaluated as zero-length strings |
FORCE_NULL ( <column_name> [, …] ) |
Matches the specified columns' values against the null string.
If a match is found, sets the value to |
ENCODING '<encoding_name>' |
Defines the character set encoding of the source data. If not specified, the default client encoding is used. See Character encoding for details |
FILL MISSING FIELDS |
Sets missing trailing field values at the end of a line or row to |
LOG ERRORS |
Enables capturing error log information about rows with formatting errors. Error log information is stored internally and can be accessed by using the Greengage DB’s Learn more in Run in single-row error isolation mode |
SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ] |
Runs Learn more in Run in single-row error isolation mode |
IGNORE EXTERNAL PARTITIONS |
If specified, when copying data from partitioned tables, data is not copied from leaf child partitions that are external tables. The corresponding message is added to the log file. Otherwise, if not specified and Greengage DB attempts to copy data from a leaf child partition that is an external table, an error is returned. To copy data from a partitioned table with a leaf child partition that is an external table, use an SQL query to select the data to copy |
ON SEGMENT |
Specifies that data loading or unloading is performed via individual data files located on the segment hosts. Learn more in Use ON SEGMENT clause |