COPY command

Anton Monakov

Contents

Syntax overview
Usage examples
Use binary format
Clauses and parameters reference

The COPY command is used for copying data between Greengage DB tables and standard file system files, program input or output, or stdin / stdout. COPY can be used with regular tables, writable external tables, or the results of a SELECT query, not with readable external tables or views.

Unless ON SEGMENT is specified, COPY is not parallel (data is loaded in a single process using the Greengage DB master instance), so using it is only recommended for small data files.

You must have the SELECT privilege on the table whose values are read by COPY TO, and the INSERT privilege on the table into which values are inserted by COPY FROM. It is sufficient to have column privileges on the columns listed in the command. To learn more about the Greengage DB access control system, see Roles and privileges.

A non-superuser can only run COPY FROM STDIN and COPY TO STDOUT. Learn more in Load data from stdin / to stdout.

The COPY command syntax is provided below. See Clauses and parameters reference for the complete description of clauses and parameters.

COPY FROM

COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>' | STDIN}
[ ON SEGMENT ]
[IGNORE EXTERNAL PARTITIONS]
[[LOG ERRORS]
SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
[ [ WITH ] (
    FORMAT <'text' | 'csv' | 'binary'>,
    OIDS [ <boolean> ],
    FREEZE [ <boolean> ],
    DELIMITER '<delimiter_character>',
    NULL '<null string>',
    HEADER [ <boolean> ],
    QUOTE '<quote_character>',
    NEWLINE '<newline_character>',
    ESCAPE '<escape_character>',
    FORCE_QUOTE { ( <column_name> [, ...] ) | * },
    FORCE_NOT_NULL ( <column_name> [, ...] ),
    FORCE_NULL ( <column_name> [, ...] ),
    ENCODING '<encoding_name>',
    FILL MISSING FIELDS
    ) ]

COPY TO

COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO {'<filename>' | PROGRAM '<command>' | STDOUT}
[ ON SEGMENT ]
[IGNORE EXTERNAL PARTITIONS]
[[LOG ERRORS]
SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]]
[ [ WITH ] (
    FORMAT <'text' | 'csv' | 'binary'>,
    OIDS [ <boolean> ],
    FREEZE [ <boolean> ],
    DELIMITER '<delimiter_character>',
    NULL '<null string>',
    HEADER [ <boolean> ],
    QUOTE '<quote_character>',
    NEWLINE '<newline_character>',
    ESCAPE '<escape_character>',
    FORCE_QUOTE { ( <column_name> [, ...] ) | * },
    FORCE_NOT_NULL ( <column_name> [, ...] ),
    FORCE_NULL ( <column_name> [, ...] ),
    ENCODING '<encoding_name>',
    FILL MISSING FIELDS
    ) ]

NOTE

When enumerating options in the WITH clause, replace spaces in option names with underscores (_), for example, FILL_MISSING_FIELDS, ON_SEGMENT. Alternatively, you can omit WITH and specify options separately.

The following two commands are equivalent:

Using WITH:

COPY customers FROM '/tmp/customers_<SEGID>.csv'
WITH (
    FORMAT CSV,
    HEADER,
    ON_SEGMENT
);

Not using WITH:

COPY customers FROM '/tmp/customers_<SEGID>.csv'
CSV
HEADER
ON SEGMENT;

Prerequisites

To try out the practical examples listed in this section, connect to the Greengage DB master host as gpadmin using psql as described in Connect to Greengage DB via psql. Then create the customers test database, and connect to it:

DROP DATABASE IF EXISTS customers;
CREATE DATABASE customers;
\c customers

Then create the customers table:

CREATE TABLE customers
(
    id INTEGER,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    address VARCHAR(255)
)
    DISTRIBUTED RANDOMLY;

To directly read from or write to a file located on the Greengage DB master host, run COPY with the provided file name (<filename>). COPY FROM lets you copy data from a file to a table and append it to the already existing data. COPY TO lets you copy the contents of a table to a file (or multiple files based on the segment ID as described in Use ON SEGMENT clause).

COPY FROM

COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM '<filename>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

COPY TO

COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO '<filename>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

If ON SEGMENT is not used, the file name must be specified as an absolute path on the master host or a relative path to the master data directory. Otherwise, if ON SEGMENT is used, the file name must be specified as a path on the Greengage DB segment host. The files must be accessible to and readable or writable by the Greengage DB system user (the user ID the server runs as), and not the client. Only database superusers are permitted to directly use COPY with files, because this allows reading or writing any file that the server has privileges to access.

NOTE

On Windows, use an E'' escape string constant and double any backslashes used in the path name.

In these examples, the COPY command is used for copying the customers.csv file’s content into the customers table and the table’s content into a file.

COPY FROM

In the /tmp directory on the master host, create the customers.csv file:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the COPY FROM command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER):
```
COPY customers FROM '/tmp/customers.csv'
WITH (FORMAT CSV, HEADER);
```

Query the customers table:

TABLE customers;

The output should look as follows:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  1 | John       | Doe       | john.doe@example.com   | 123 Elm Street
  2 | Jane       | Smith     | jane.smith@example.com | 456 Oak Street
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

COPY TO

Empty the customers table and populate it with data as follows:

TRUNCATE TABLE customers;

INSERT INTO customers (id, first_name, last_name, email, address)
VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'),
       (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'),
       (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'),
       (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');

Run the COPY TO command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER):
```
COPY customers TO '/tmp/customers.csv'
WITH (FORMAT CSV, HEADER);
```

Review the contents of the customers.csv file created in /tmp:

$ cat /tmp/customers.csv

The output should look as follows:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

To load or unload table data via a program, run COPY with a shell command (<command>) specified in the PROGRAM clause:

COPY FROM

COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM PROGRAM '<command>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

COPY TO

COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO PROGRAM '<command>'
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

The command must be specified as if entered directly on the Greengage DB master host (if ON SEGMENT is not used) and must be executable by the superuser. In COPY FROM, the input is read from standard output of the command. In COPY TO, the output is written to the standard input of the command.

When ON SEGMENT is specified, the command must be executable on all Greengage DB primary segment hosts by the Greengage DB superuser, and the <SEGID> value, which is substituted with the actual segment ID at runtime, is required in the command.

NOTE

The command is invoked by a shell. When passing arguments to the shell, strip or escape any special characters that have a special meaning for the shell. For security reasons, it is best to use a fixed command string or at least avoid passing any user input in the string.

In these examples, the COPY command reads and writes the customers.csv file’s contents via cat.

COPY FROM

In the /tmp directory on the master host, create the customers.csv file:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the COPY FROM command. In the PROGRAM clause, provide the cat command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER).
```
COPY customers FROM PROGRAM 'cat /tmp/customers.csv'
WITH (FORMAT CSV, HEADER);
```

Query the customers table:

TABLE customers;

The output should look as follows:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  1 | John       | Doe       | john.doe@example.com   | 123 Elm Street
  2 | Jane       | Smith     | jane.smith@example.com | 456 Oak Street
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

COPY TO

Empty the customers table and populate it with data as follows:

TRUNCATE TABLE customers;

INSERT INTO customers (id, first_name, last_name, email, address)
VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'),
       (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'),
       (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'),
       (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');

Run the COPY TO command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER):
```
COPY customers TO PROGRAM 'cat > /tmp/customers.csv'
WITH (FORMAT CSV, HEADER);
```

Review the contents of the customers.csv file created in /tmp:

$ cat /tmp/customers.csv

The output should look as follows:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

To load or unload table data through standard input (stdin) or standard output (stdout), run COPY with the STDIN / STDOUT clause provided as follows:

COPY FROM

COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM STDIN
[ [ WITH ] ( <option> [, ...] ) ]

COPY TO

COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO STDOUT
[ [ WITH ] ( <option> [, ...] ) ]

STDIN specifies that the input comes from the client application. STDOUT specifies that output goes to the client application. Data is transmitted via the connection between the client and the master. The ON SEGMENT clause is not supported with STDIN or STDOUT.

TIP

To be able to pipe data in and out of COPY, you can run it as part of the psql command directly from the command line as described in Execute commands non-interactively. Use the -d option to specify the database to connect to (<database>), and the -c option to specify the command to execute, for example:

$ psql -d <database> -c 'COPY <table_name> FROM STDIN WITH (FORMAT CSV)'

You can use the \copy meta-command to perform a frontend (client) copy as described in Load data using \copy.

In these examples, the COPY command is used for copying the user-provided values into the customers table via stdin, and the contents of the customers table via stdout. Data input and output are performed via the client terminal.

COPY FROM

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the COPY FROM command. In the WITH clause, set CSV as the data format.
```
COPY customers FROM STDIN
WITH (FORMAT CSV);
```

Greengage DB displays an interactive input prompt. Type in the desired values separated by a comma. Start each data row on a new line and finalize input by typing \. on a separate line:

1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
\.

Query the customers table:

TABLE customers;

The output should look as follows:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  1 | John       | Doe       | john.doe@example.com   | 123 Elm Street
  2 | Jane       | Smith     | jane.smith@example.com | 456 Oak Street
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

COPY TO

Empty the customers table and populate it with data as follows:

TRUNCATE TABLE customers;

INSERT INTO customers (id, first_name, last_name, email, address)
VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'),
       (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'),
       (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'),
       (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');

Run the COPY TO command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER):

COPY customers TO STDOUT
WITH (FORMAT CSV, HEADER);

The output should look as follows:

id,first_name,last_name,email,address
3,Bob,Brown,bob.brown@example.com,789 Pine Street
1,John,Doe,john.doe@example.com,123 Elm Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street

You can use the \copy meta-command to perform a frontend (client) copy.

\copy invokes a regular COPY FROM STDIN / COPY TO STDOUT command and then loads or stores the data in a file accessible to the psql client.

Reading or writing the specified file is performed not by the server but rather the psql utility, which routes the data between the server and the local file system. Therefore, a file must reside on the host where the psql client runs and must be accessible to the user running the client.

CAUTION

\copy is only available when you are connected to Greengage DB via psql as described in Connect to Greengage DB via psql. You cannot run \copy when using any other database client application.

The \copy command syntax is similar to the one of the COPY command:

\copy { table [ ( <column_list> ) ] | ( <query> ) }
{ from | to }
{ 'filename' | program 'command' | stdin | stdout | pstdin | pstdout }
[ [ with ] ( <option> [, ...] ) ]

When using \copy from stdin, the backend accepts lines of data until a single line only containing a backslash-period (\.) is entered or the stream reaches EOF. For \copy to stdout, output is sent to the same place as the psql command output. To read or write psql's standard input or output regardless of the current command source or the -f or -o options, use from pstdin or to pstdout.

The \copy command does not support ON SEGMENT.

In these examples, the \copy command is used for copying the customers.csv file’s content into the customers table, and the table’s content into a file.

\copy from

In the /tmp directory on the master host, create the customers.csv file:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the \copy from command. In the with clause, set csv as the data format and specify that the file contains a header row (header):
```
\copy customers from '/tmp/customers.csv' with (format csv,header)
```

Query the customers table:

TABLE customers;

The output should look as follows:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  1 | John       | Doe       | john.doe@example.com   | 123 Elm Street
  2 | Jane       | Smith     | jane.smith@example.com | 456 Oak Street
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

\copy to

Empty the customers table and populate it with data as follows:

TRUNCATE TABLE customers;

INSERT INTO customers (id, first_name, last_name, email, address)
VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'),
       (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'),
       (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'),
       (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');

Run the \copy to command. In the with clause, set csv as the data format and specify that the file contains a header row (header):
```
\copy customers to '/tmp/customers.csv' with (format csv,header)
```

Review the contents of the customers.csv file created in /tmp:

$ cat /tmp/customers.csv

The output should look as follows:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street
2,Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

The ON SEGMENT clause of the COPY command lets you specify individual segment data files on the segment hosts. Each file contains the table data that is managed by the primary segment. For example, when copying data to files from a table with a COPY TO … ON SEGMENT command, a file is created on the segment host for each segment instance on the host. Files are created on the segment hosts, rather than on the master, as they would be in a standard COPY operation. The COPY command does not copy data from or to mirror segment instances and segment data files.

The ON SEGMENT clause can be used for operations such as migrating data between clusters or performing a backup. Segment data created by the ON SEGMENT clause can be restored by tools such as gpfdist, which is useful for high-speed data loading.

The general COPY … ON SEGMENT syntax is as follows:

COPY FROM

COPY [BINARY] <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>'}
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

COPY TO

COPY [BINARY] { <table_name> [(<column_name> [, ...])] | (<query>)}
TO {'<filename>' | PROGRAM '<command>'}
[ ON SEGMENT ]
[ [ WITH ] ( <option> [, ...] ) ]

NOTE

The STDIN and STDOUT clauses are not supported with ON SEGMENT.

When specifying an absolute path and file name of a source data file either directly (<filename>) or as part of a command (<command>), use the <SEG_DATA_DIR> and <SEGID> string literals with the following syntax:

COPY <table>
[TO|FROM] '<SEG_DATA_DIR>/<datafilename><SEGID>_<suffix>'
ON SEGMENT;

where:

<SEG_DATA_DIR> — the absolute path of the segment instance data directory. Using it is optional: an absolute path can be provided instead.
<SEGID> — the content ID number of the segment instance. <SEGID> is a required part of the file name when ON SEGMENT is specified. With COPY FROM, specify the segment instance content ID in the name of the source file and place that file on the segment host. There must be a file for each primary segment instance on each host.

When the COPY operation is run, the string literals are substituted with the actual segment IDs and the paths of the segment data directories.

NOTE

The angle brackets (< and >) are part of the string literal.

For a COPY FROM … ON SEGMENT command, the table distribution policy is checked when data is copied into the table. By default, an error is returned if a data row violates the table distribution policy. You can deactivate the distribution policy check by using the gp_enable_segment_copy_checking server configuration parameter.

Data from a table that is generated by a COPY TO … ON SEGMENT command can be used to restore table data with COPY FROM … ON SEGMENT. However, the data restored to the segments is distributed according to the table distribution policy at the time the files were generated with the COPY TO command. The COPY command might return table distribution policy errors if you attempt to restore table data and the table distribution policy was changed after running COPY FROM … ON SEGMENT. If you run COPY FROM … ON SEGMENT and the gp_enable_segment_copy_checking server configuration parameter is false, manual redistribution of table data might be required.

When copying data from a replicated table (DISTRIBUTED REPLICATED) as source, the created target files on segment hosts contain all table rows.

Learn more about table distribution in Distribution.

In these examples, the COPY command is used for copying the content of four files, customers_0.csv — customers_3.csv, into the customers table, and the table’s content into files. The files are located in the /tmp directory on segment hosts.

To view your segments configuration and the location of data directories, query the gp_segment_configuration system table:

SELECT content,hostname,datadir FROM gp_segment_configuration;

The output should look as follows:

 content |  hostname   |        datadir
---------+-------------+-----------------------
       -1|     mdw     | /data1/master/gpseg-1
        0|     sdw1    | /data1/primary/gpseg0
        0|     sdw2    | /data1/mirror/gpseg0
        1|     sdw1    | /data1/primary/gpseg1
        1|     sdw2    | /data1/mirror/gpseg1
        2|     sdw2    | /data1/primary/gpseg2
        2|     sdw1    | /data1/mirror/gpseg2
        3|     sdw2    | /data1/primary/gpseg3
        3|     sdw1    | /data1/mirror/gpseg3
       -1|     smdw    | /data1/master/gpseg-1

COPY FROM

In the /tmp directory of each segment host, create the customers_<SEGID>.csv files, with <SEGID> corresponding to the current segment’s content ID:

customers_0.csv:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street

customers_1.csv:

id,first_name,last_name,email,address
2,Jane,Smith,jane.smith@example.com,456 Oak Street

customers_2.csv:

id,first_name,last_name,email,address
3,Bob,Brown,bob.brown@example.com,789 Pine Street

customers_3.csv:

id,first_name,last_name,email,address
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the COPY FROM command. Set CSV as the data format and specify that the file contains a header row (HEADER):
```
COPY customers FROM '/tmp/customers_<SEGID>.csv'
CSV HEADER
ON SEGMENT;
```

Query the customers table:

TABLE customers;

The output should look as follows:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  1 | John       | Doe       | john.doe@example.com   | 123 Elm Street
  2 | Jane       | Smith     | jane.smith@example.com | 456 Oak Street
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

COPY TO

Empty the customers table and populate it with data as follows:

TRUNCATE TABLE customers;

INSERT INTO customers (id, first_name, last_name, email, address)
VALUES (1,'John','Doe','john.doe@example.com','123 Elm Street'),
       (2,'Jane','Smith','jane.smith@example.com','456 Oak Street'),
       (3,'Bob','Brown','bob.brown@example.com','789 Pine Street'),
       (4,'Rob','Stuart','rob.stuart@example.com','119 Willow Street');

Run the COPY TO command.

COPY customers TO '/tmp/customers_<SEGID>.csv'
CSV HEADER
ON SEGMENT;

Review the contents of the customers_0.csv — customers_3.csv files created in /tmp on each segment host, for example:

$ cat /tmp/customers_0.csv && cat /tmp/customers_1.csv

$ cat /tmp/customers_2.csv && cat /tmp/customers_3.csv

The output should look as follows:

id,first_name,last_name,email,address
1,John,Doe,john.doe@example.com,123 Elm Street

id,first_name,last_name,email,address
2,Jane,Smith,jane.smith@example.com,456 Oak Street

id,first_name,last_name,email,address
3,Bob,Brown,bob.brown@example.com,789 Pine Street

id,first_name,last_name,email,address
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

By default, COPY stops an operation at the first error: the operation fails and no data loads. If you run COPY FROM in single-row error isolation mode, Greengage DB skips rows that contain format errors and loads properly formatted rows.

To run a COPY FROM operation in single-row error isolation mode, use the SEGMENT REJECT LIMIT clause as follows.

COPY <table_name> [(<column_name> [, ...])]
FROM {'<filename>' | PROGRAM '<command>' | STDIN}
[ [ WITH ] ( <option> [, ...] ) ]
[LOG ERRORS] SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]
[ ON SEGMENT ]

During the load operation, the input rows containing formatting errors are discarded until the reject limit count (<count>) is reached on any Greengage DB segment instance. Notice that the error row count is for each Greengage DB segment, not for the entire load operation. The reject limit count can be specified as a number of rows (the default) or percentage of total rows (from 1 to 100). If percentage is specified (PERCENT), each segment starts calculating the percentage of invalid rows only after the number of rows specified by the gp_reject_percent_threshold parameter has been processed.

If the limit is not reached, all valid rows are loaded and invalid rows are discarded.

NOTE

Single row error isolation mode applies only to rows in the input file that contain format errors. If the data contains a constraint error such as violation of a NOT NULL, check, or unique constraint, the operation fails and no data loads.

If the SEGMENT REJECT LIMIT is not triggered first, the COPY operation is stopped and rolled back if the first 1000 rows are rejected. You can modify this value by using the gp_initial_bad_row_limit configuration parameter.

To keep error rows for further examination, specify the LOG ERRORS clause to capture the error log information. The information is stored internally in Greengage DB. To view it, use the gp_read_error_log() built-in SQL function with the table name (<table_name>) provided as argument:

SELECT gp_read_error_log('<table_name>');

Example gp_read_error_log() output

                                                                         gp_read_error_log
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 ("2025-06-11 01:01:52.80594+03",customers,<stdin>,2,,"invalid input syntax for integer: ""John"", column id","John,Doe,john.doe@example.com,123 Elm Street",)
 ("2025-06-11 01:02:35.075181+03",customers,<stdin>,3,,"invalid input syntax for integer: ""Jane"", column id","Jane,Smith,jane.smith@example.com,456 Oak Street",)
(2 rows)

In this example, the COPY command is used for copying the customers.csv file’s content into the customers table. The content contains formatting errors, so the SEGMENT REJECT LIMIT clause is used for running in single-row error isolation mode. The LOG ERRORS clause is used for logging the loading errors.

In the /tmp directory on the master host, create the customers.csv file. The first two data rows are misformatted and contain missing id values.

id,first_name,last_name,email,address
John,Doe,john.doe@example.com,123 Elm Street
Jane,Smith,jane.smith@example.com,456 Oak Street
3,Bob,Brown,bob.brown@example.com,789 Pine Street
4,Rob,Stuart,rob.stuart@example.com,119 Willow Street

Empty the customers table:
```
TRUNCATE TABLE customers;
```
Run the COPY FROM command. In the WITH clause, set CSV as the data format and specify that the file contains a header row (HEADER). Provide the LOG ERRORS clause and set SEGMENT REJECT LIMIT to 3:
```
COPY customers FROM '/tmp/customers.csv'
WITH (FORMAT CSV,HEADER)
LOG ERRORS SEGMENT REJECT LIMIT 3;
```

Query the customers table:

TABLE customers;

The output should contain two rows, with misformatted rows skipped:

 id | first_name | last_name |         email          |      address
----+------------+-----------+------------------------+-------------------
  3 | Bob        | Brown     | bob.brown@example.com  | 789 Pine Street
  4 | Rob        | Stuart    | rob.stuart@example.com | 119 Willow Street

Call the gp_read_error_log() function on the customers table to view the error log:

SELECT t.linenum, t.errmsg FROM gp_read_error_log('customers') t;

The output should look as follows:

 linenum |                       errmsg
---------+-----------------------------------------------------
       2 | invalid input syntax for integer: "John", column id
       3 | invalid input syntax for integer: "Jane", column id
(2 rows)

When running the COPY command, you can specify BINARY as the format option to read or write data in a binary format rather than text. The format is faster than the text and CSV formats but is less portable across machine architectures and Greengage DB versions.

The binary format is very data type specific. For example, binary data output from a SMALLINT column cannot be read into an INTEGER column; in text format, such a scenario would work.

Using data formatting clauses (DELIMITER, NULL, HEADER, QUOTE, ESCAPE, and ENCODING) is not supported for the binary format.

NOTE

You cannot run COPY FROM in single-row error isolation mode if the data is in binary format.

The binary file format consists of a file header, zero or more tuples containing the row data, and a file trailer. Headers and data are in network byte order (most significant byte first).

The file header consists of 15 bytes of fixed fields, followed by a variable-length header extension area. The fixed fields are as follows:

Signature — 11-byte sequence PGCOPY\n\377\r\n\0. Note that the zero byte is a required part of the signature. The signature is designed to allow easy identification of files that have been munged by a non-8-bit-clean transfer. This signature will be changed by end-of-line-translation filters, dropped zero bytes, dropped high bits, or parity changes.
Flags field — 32-bit integer bit mask to denote important aspects of the file format. Bits are numbered from 0 (least significant bit, LSB) to 31 (most significant bit, MSB). Note that this field is stored in network byte order, as are all the integer fields used in the file format. Bits 16-31 are reserved to denote critical file format issues; a reader should cancel if it finds an unexpected bit set in this range. Bits 0-15 are reserved to signal backwards-compatible format issues; a reader should simply ignore any unexpected bits set in this range. Currently only one flag is defined, and the rest must be zero.

Bit 16: 1 if data has OIDs, 0 if not. OID system columns are no longer supported, but the format still contains the indicator.
Header extension area length — 32-bit integer, length in bytes of remainder of header, not including self. Currently, this is zero, and the first tuple follows immediately. Future changes to the format might allow additional data to be present in the header. A reader should silently skip over any unknown header extension data. The header extension area is envisioned to contain a sequence of self-identifying chunks. The flags field is not intended to tell readers what is in the extension area. Specific design of header extension contents is left for a later release.

Each tuple begins with a 16-bit integer count of the number of fields in the tuple. Presently, all tuples in a table have the same count, but that might not always be true. Then, repeated for each field in the tuple, there is a 32-bit length word followed by that many bytes of field data. The length word does not include itself, and can be zero. As a special case, -1 indicates a NULL field value. No value bytes follow in the NULL case.

There is no alignment padding or any other extra data between fields.

Presently, all data values in a binary-format file are assumed to be in the binary format (format code one). A custom extension may add a header field that allows per-column format codes to be specified.

If OIDs are included in the file, the OID field immediately follows the field-count word. It is a normal field except that it is not included in the field-count. In particular, it has a length word — this will allow handling of 4-byte vs. 8-byte OIDs and will allow OIDs to be shown as NULL if that ever proves desirable.

The file trailer consists of a 16-bit integer word containing -1. This is easily distinguished from a tuple’s field-count word. A reader should report an error if a field-count word is neither -1 nor the expected number of columns. This provides an extra check against somehow getting out of sync with the data.

Clauses and parameters

Clause / Parameter Description

Clause / Parameter	Description
BINARY	Causes the data to be read or written in binary format rather than text. You can also specify binary format as part of the `FORMAT` clause as `FORMAT 'binary'`. Learn more in Use binary format
<table_name>	The name of an existing table
<column_name>	An optional list of columns to be copied. If a list of columns is specified, `COPY` only copies the data in the specified columns to or from the file. `COPY FROM` inserts the default values for columns in the table that are not in the column list. If no column list is specified, all columns of the table are copied
<query>	A `SELECT` or `VALUES` command whose results are to be copied. Note that parentheses are required around the query
<filename>	The path to the input or output file. An input file name can be an absolute or relative path; an output file name must be an absolute path. The file must be accessible to the master host. The specified path must refer to a location on the master host’s file system. Learn more in Load data from files / to files
PROGRAM '<command>'	A command to run. In `COPY FROM`, the input is read from the standard output of the command, and in `COPY TO`, the output is written to the standard input of the command. The command must be specified as if entered directly on the Greengage DB master host system and must be executable by the superuser. Learn more in Load data by running a program
STDIN \| STDOUT	`STDIN` specifies that input comes from the client application. `STDOUT` specifies that output goes to the client application. The `ON SEGMENT` clause is not supported with `STDIN` or `STDOUT`. Learn more in Load data from stdin / to stdout
WITH <option>	Specifies additional options, which are described below.
FORMAT <'text' \| 'csv' \| 'binary'>	Defines the data format, which can be `text`, `csv`, or binary. You can specify binary format by using the dedicated `BINARY` keyword as follows: `COPY BINARY <table_name> [(<column_name> [, ...])] ...` To learn more about formatting source data, see Format external data
OIDS [ <boolean> ]	Specifies copying the OID for each row. An error is raised if `OIDS` is specified for a table that does not have OIDs, or in the case of copying a query.
FREEZE [ <boolean> ]	Requests copying the data with rows already frozen, just as they would be after running the `VACUUM FREEZE` command. This is intended as a performance option for initial data loading. Rows are frozen only if the table being loaded is created or truncated in the current subtransaction, there are no cursors open, and there are no older snapshots held by this transaction. Note that all other sessions can immediately see the data once it is successfully loaded. This violates the normal rules of MVCC visibility, so you should be aware of the potential problems this might cause. To learn more about the `VACUUM` operation, see Remove expired table rows via VACUUM
DELIMITER '<delimiter_character>'	Designates a single ASCII character to act as a column delimiter. The delimiter must reside between any two data value fields but not at the beginning or the end of a row. You can also specify a non-printable ASCII character or a non-printable Unicode character, such as `\x1B` or `\u001B`. The escape string syntax, `E'<character-code>'`, is also supported for non-printable characters. The ASCII or Unicode character must be enclosed in single quotes, for example, `E'\x1B'` or `E'\u001B'`. For text files, the default column delimiter is the horizontal TAB character (`0x09`). For CSV files, the default column delimiter is the comma character (`,`). See Format columns for details
NULL '<null string>'	Designates a string representing a null value, which indicates an unknown piece of data in a column or field. For text files, the default string is `\N`. For CSV files, the default string is an empty value with no quotation marks. See Represent NULL values for details
HEADER [ <boolean> ]	Designates whether the data file contains a header row. If using multiple data source files, all off them must have a header row. The default is to assume that the input files do not have a header row
QUOTE '<quote_character>'	Specifies the quotation character, double quote (`"`) by default
NEWLINE '<newline_character>'	Designates a character used as a newline character, which can be `LF` (Line feed, `0x0A`), `CR` (Carriage return, `0x0D`), or `CR` followed by `LF` (`CRLF`, `0x0D 0x0A`). If not specified, the newline character detected at the end of the first line of data is used. See Format rows for details
ESCAPE '<escape_character>'	Designates a single character used as an escape character. For text files, the default escape character is a backslash (`\`). You can deactivate escaping by providing the `OFF` value. For CSV files, the default escape character is a double quote (`"`). See Escape characters for details
FORCE_QUOTE { ( <column_name> [, …] ) \| * }	Enforces quoting for all non-`NULL` values in each specified column. If an asterisk (`*`) is specified, non-`NULL` values are quoted in all columns. `NULL` output is never quoted
FORCE_NOT_NULL ( <column_name> [, …] )	Treats values in each specified column as if they were quoted. Since the default null string is an empty unquoted string, this causes missing values to be evaluated as zero-length strings
FORCE_NULL ( <column_name> [, …] )	Matches the specified columns' values against the null string. If a match is found, sets the value to `NULL` even if the value is quoted. In the default case where the null string is empty, this converts a quoted empty string to `NULL`
ENCODING '<encoding_name>'	Defines the character set encoding of the source data. If not specified, the default client encoding is used. See Character encoding for details
FILL MISSING FIELDS	Sets missing trailing field values at the end of a line or row to `NULL`. If not set, an error is reported in such cases. Blank rows, fields with a `NOT NULL` constraint, and trailing delimiters on a line will still report an error
LOG ERRORS	Enables capturing error log information about rows with formatting errors. Error log information is stored internally and can be accessed by using the Greengage DB’s `gp_read_error_log()` built-in SQL function. Learn more in Run in single-row error isolation mode
SEGMENT REJECT LIMIT <count> [ ROWS \| PERCENT ]	Runs `COPY FROM` in single-row error isolation mode. Can be specified as a number of rows (the default) or percentage of total rows (from 1 to 100). Learn more in Run in single-row error isolation mode
IGNORE EXTERNAL PARTITIONS	If specified, when copying data from partitioned tables, data is not copied from leaf child partitions that are external tables. The corresponding message is added to the log file. Otherwise, if not specified and Greengage DB attempts to copy data from a leaf child partition that is an external table, an error is returned. To copy data from a partitioned table with a leaf child partition that is an external table, use an SQL query to select the data to copy
ON SEGMENT	Specifies that data loading or unloading is performed via individual data files located on the segment hosts. Learn more in Use ON SEGMENT clause

BINARY

Causes the data to be read or written in binary format rather than text.

You can also specify binary format as part of the FORMAT clause as FORMAT 'binary'.

Learn more in Use binary format

<table_name>

The name of an existing table

<column_name>

An optional list of columns to be copied.

If a list of columns is specified, COPY only copies the data in the specified columns to or from the file. COPY FROM inserts the default values for columns in the table that are not in the column list.

If no column list is specified, all columns of the table are copied

<query>

A SELECT or VALUES command whose results are to be copied. Note that parentheses are required around the query

The path to the input or output file.

An input file name can be an absolute or relative path; an output file name must be an absolute path. The file must be accessible to the master host. The specified path must refer to a location on the master host’s file system.

Learn more in Load data from files / to files

PROGRAM '<command>'

A command to run.

In COPY FROM, the input is read from the standard output of the command, and in COPY TO, the output is written to the standard input of the command. The command must be specified as if entered directly on the Greengage DB master host system and must be executable by the superuser.

Learn more in Load data by running a program

STDIN | STDOUT

STDIN specifies that input comes from the client application. STDOUT specifies that output goes to the client application.

The ON SEGMENT clause is not supported with STDIN or STDOUT.

Learn more in Load data from stdin / to stdout

WITH <option>

Specifies additional options, which are described below.

FORMAT <'text' | 'csv' | 'binary'>

Defines the data format, which can be text, csv, or binary.

You can specify binary format by using the dedicated BINARY keyword as follows:

COPY BINARY <table_name> [(<column_name> [, ...])]
...

To learn more about formatting source data, see Format external data

OIDS [ <boolean> ]

Specifies copying the OID for each row. An error is raised if OIDS is specified for a table that does not have OIDs, or in the case of copying a query.

FREEZE [ <boolean> ]

Requests copying the data with rows already frozen, just as they would be after running the VACUUM FREEZE command. This is intended as a performance option for initial data loading. Rows are frozen only if the table being loaded is created or truncated in the current subtransaction, there are no cursors open, and there are no older snapshots held by this transaction.

Note that all other sessions can immediately see the data once it is successfully loaded. This violates the normal rules of MVCC visibility, so you should be aware of the potential problems this might cause.

To learn more about the VACUUM operation, see Remove expired table rows via VACUUM

DELIMITER '<delimiter_character>'

Designates a single ASCII character to act as a column delimiter. The delimiter must reside between any two data value fields but not at the beginning or the end of a row.

You can also specify a non-printable ASCII character or a non-printable Unicode character, such as \x1B or \u001B. The escape string syntax, E'<character-code>', is also supported for non-printable characters. The ASCII or Unicode character must be enclosed in single quotes, for example, E'\x1B' or E'\u001B'.

For text files, the default column delimiter is the horizontal TAB character (0x09).

For CSV files, the default column delimiter is the comma character (,).

See Format columns for details

NULL '<null string>'

Designates a string representing a null value, which indicates an unknown piece of data in a column or field.

For text files, the default string is \N.

For CSV files, the default string is an empty value with no quotation marks.

See Represent NULL values for details

HEADER [ <boolean> ]

Designates whether the data file contains a header row.

If using multiple data source files, all off them must have a header row. The default is to assume that the input files do not have a header row

QUOTE '<quote_character>'

Specifies the quotation character, double quote (") by default

NEWLINE '<newline_character>'

Designates a character used as a newline character, which can be LF (Line feed, 0x0A), CR (Carriage return, 0x0D), or CR followed by LF (CRLF, 0x0D 0x0A).

If not specified, the newline character detected at the end of the first line of data is used.

See Format rows for details

ESCAPE '<escape_character>'

Designates a single character used as an escape character.

For text files, the default escape character is a backslash (\). You can deactivate escaping by providing the OFF value.

For CSV files, the default escape character is a double quote (").

See Escape characters for details

FORCE_QUOTE { ( <column_name> [, …] ) | * }

Enforces quoting for all non-NULL values in each specified column.

If an asterisk (*) is specified, non-NULL values are quoted in all columns. NULL output is never quoted

FORCE_NOT_NULL ( <column_name> [, …] )

Treats values in each specified column as if they were quoted. Since the default null string is an empty unquoted string, this causes missing values to be evaluated as zero-length strings

FORCE_NULL ( <column_name> [, …] )

Matches the specified columns' values against the null string. If a match is found, sets the value to NULL even if the value is quoted. In the default case where the null string is empty, this converts a quoted empty string to NULL

ENCODING '<encoding_name>'

Defines the character set encoding of the source data.

If not specified, the default client encoding is used.

See Character encoding for details

FILL MISSING FIELDS

Sets missing trailing field values at the end of a line or row to NULL. If not set, an error is reported in such cases. Blank rows, fields with a NOT NULL constraint, and trailing delimiters on a line will still report an error

LOG ERRORS

Enables capturing error log information about rows with formatting errors.

Error log information is stored internally and can be accessed by using the Greengage DB’s gp_read_error_log() built-in SQL function.

Learn more in Run in single-row error isolation mode

SEGMENT REJECT LIMIT <count> [ ROWS | PERCENT ]

Runs COPY FROM in single-row error isolation mode. Can be specified as a number of rows (the default) or percentage of total rows (from 1 to 100).

Learn more in Run in single-row error isolation mode

IGNORE EXTERNAL PARTITIONS

If specified, when copying data from partitioned tables, data is not copied from leaf child partitions that are external tables. The corresponding message is added to the log file.

Otherwise, if not specified and Greengage DB attempts to copy data from a leaf child partition that is an external table, an error is returned.

To copy data from a partitioned table with a leaf child partition that is an external table, use an SQL query to select the data to copy

ON SEGMENT

Specifies that data loading or unloading is performed via individual data files located on the segment hosts.

Learn more in Use ON SEGMENT clause

Aggregate functions

Overview

COPY command

Syntax overview

Usage examples

Load data from files / to files

Examples

Load data by running a program

Examples

Load data from stdin / to stdout

Examples

Load data using \copy

Examples

Use ON SEGMENT clause

Examples

Run in single-row error isolation mode

Example

Use binary format

File header

Tuples

File trailer

Clauses and parameters reference