Data compression
Data compression in Greengage DB clusters helps optimize I/O performance and reduce storage requirements. Compression is available for append-optimized tables only and can be configured at two distinct levels:
-
Table-level compression applies one compression algorithm to all table columns.
-
Column-level compression allows different compression algorithms to be applied to individual columns. This compression type is available for column-oriented append-optimized tables only.
Overview
Supported compression algorithms
The table below compares compression algorithms supported in Greengage DB, including their usage scope and corresponding storage options.
Zstandard | Deflate | Run-length encoding | |
---|---|---|---|
Description |
Combines high speed with efficient compression |
Typically used when maximum compatibility with legacy formats or systems is required. For most workloads, Zstandard generally achieves better compression ratios and performance |
Run-length encoding (RLE) compresses data better than other algorithms when the same data values occur in many consecutive rows. However, RLE is not useful for columns where repeated values are less common |
Table types |
AO: row and column-oriented |
AO: row and column-oriented |
AO: column-oriented |
Applies to |
Tables and columns |
Tables and columns |
Columns |
compresstype |
zstd |
zlib |
rle_type |
compresslevel |
Possible values are
|
Possible values are
|
Possible values are
|
Note the following:
-
Higher compression levels provide better data reduction but increase CPU usage and processing time. Choose a compression level that balances storage savings with available CPU and performance requirements.
-
The performance of compressed tables varies based on hardware and query characteristics. Benchmark different compression settings in your environment to determine their impact.
Use rle_type
compression for columns with large sets of repeated values.
For general use, zstd
at level 1
provides an optimal balance of speed and compression, comparable to zlib
at level 5
.
Compression settings syntax
You can specify compression settings at the table or column level using the CREATE TABLE
or ALTER TABLE
commands.
The basic CREATE TABLE
command syntax is described in the CREATE TABLE syntax section.
To specify compression settings for a table or column, use one of the following clauses:
-
WITH
to apply compression at the table level:CREATE TABLE <table_name> ( [ ... ] ) WITH ( <storage_directive> [, ... ] )
-
ENCODING
to apply compression to the specified column:<column_name> <data_type> ENCODING ( <storage_directive> [, ... ] ), ... -- or -- COLUMN <column_name> ENCODING ( <storage_directive> [, ... ] ), ... -- or -- DEFAULT COLUMN ENCODING ( <storage_directive> [, ... ] )
Regardless of the level at which you define compression settings, you can use the following storage directives:
-
compresstype
— specifies the compression algorithm to use. See the compresstype row in the Supported compression algorithms table for possible values. Values are case-insensitive. The default isnone
(no compression). -
compresslevel
— specifies the compression level. See the compresslevel row in the Supported compression algorithms table for possible values. -
blocksize
— defines the size (in bytes) of each storage block in a table. Smaller block sizes favor faster random access, while larger block sizes favor faster sequential access.
You can also use the storage directives mentioned above to specify compression settings when defining a new data type using the CREATE TYPE
command.
The gp_default_storage_options
server configuration parameter allows you to set the default values for table storage options, including compresstype
and compresslevel
.
Precedence of compression settings
Column compression settings are inherited in the following order of precedence — from lowest to highest:
type -> table -> partition -> subpartition
This means that:
-
Table-level settings override type-level settings.
-
Partition-level settings override table and type-level settings.
-
Subpartition-level settings override all higher-level settings, including partition, table, and type.
When an ENCODING
clause conflicts with a WITH
clause, ENCODING
has a higher precedence.
The gp_add_column_inherits_table_setting
server configuration parameter enables automatic inheritance of compression settings when adding a new column to a column-oriented append-optimized table.
Configure compression settings
Table level
This command creates a row-oriented append-optimized table with Zstandard (zstd
) compression at level 5
specified using the WITH
clause:
CREATE TABLE customers_table_compression
(
customer_id INTEGER,
name VARCHAR(25)
)
WITH (
appendoptimized = true,
compresstype = zstd,
compresslevel = 5
)
DISTRIBUTED BY (customer_id);
INSERT INTO customers_table_compression (customer_id, name)
SELECT customer_number AS customer_id,
'customer' || customer_number AS name
FROM generate_series(1, 1000) AS customer_number;
You can use a similar command to create a compressed column-oriented table:
CREATE TABLE orders_table_compression
(
order_id INTEGER,
customer_id INTEGER,
amount DECIMAL(6, 2)
)
WITH (
appendoptimized = true,
orientation = column,
compresstype = zstd,
compresslevel = 5
)
DISTRIBUTED BY (customer_id);
INSERT INTO orders_table_compression (order_id, customer_id, amount)
SELECT order_number AS order_id,
FLOOR(RANDOM() * 1000 + 1)::INTEGER AS customer_id,
ROUND((100 + RANDOM() * 1000)::NUMERIC, 2) AS amount
FROM generate_series(1, 10000) AS order_number;
Column level
The command below applies compression at the column level using the ENCODING
clause.
The order_status
column uses RLE compression (rle_type
), while the amount
column is compressed using Zstandard (zstd
):
CREATE TABLE orders_column_compression
(
order_id INTEGER,
customer_id INTEGER,
order_status TEXT ENCODING (compresstype = rle_type, compresslevel = 1),
amount DECIMAL(6, 2) ENCODING (compresstype = zstd, compresslevel = 5)
)
WITH (
appendoptimized = true,
orientation = column
)
DISTRIBUTED BY (customer_id);
INSERT INTO orders_column_compression (order_id, customer_id, order_status, amount)
SELECT order_number AS order_id,
FLOOR(RANDOM() * 1000 + 1)::INTEGER AS customer_id,
(ARRAY ['new', 'shipped', 'cancelled', 'returned'])[FLOOR(RANDOM() * 4 + 1)::INTEGER] AS order_status,
ROUND((100 + RANDOM() * 1000)::NUMERIC, 2) AS amount
FROM generate_series(1, 10000) AS order_number;
View compression information
Table compression information
To view compression options applied to a specific table, use the \d+
meta-command:
\d+ orders_table_compression
In the output, compression options are shown in the Options
section:
Append-Only Columnar Table "public.orders_table_compression" Column | Type | Modifiers | Storage | Stats target | Compression Type | Compression Level | Block Size | Description -------------+--------------+-----------+---------+--------------+------------------+-------------------+------------+------------- order_id | integer | | plain | | zstd | 5 | 32768 | customer_id | integer | | plain | | zstd | 5 | 32768 | amount | numeric(6,2) | | main | | zstd | 5 | 32768 | Checksum: t Distributed by: (customer_id) Options: appendonly=true, orientation=column, compresstype=zstd, compresslevel=5
You can also query the pg_appendonly
system table to get information about the storage options and other characteristics of an append-optimized table:
SELECT compresstype, compresslevel, blocksize, columnstore
FROM pg_appendonly
WHERE relid = 'orders_table_compression'::regclass;
The output might look as follows:
compresstype | compresslevel | blocksize | columnstore --------------+---------------+-----------+------------- zstd | 5 | 32768 | t (1 row)
Column compression information
To view compression options applied to columns, use the \d+
meta-command:
\d+ orders_column_compression
The Compression Type
and Compression Level
columns show the compression algorithm and level used for each table column:
Append-Only Columnar Table "public.orders_column_compression" Column | Type | Modifiers | Storage | Stats target | Compression Type | Compression Level | Block Size | Description --------------+--------------+-----------+----------+--------------+------------------+-------------------+------------+------------- order_id | integer | | plain | | none | 0 | 32768 | customer_id | integer | | plain | | none | 0 | 32768 | order_status | text | | extended | | rle_type | 1 | 32768 | amount | numeric(6,2) | | main | | zstd | 5 | 32768 | Checksum: t Distributed by: (customer_id) Options: appendonly=true, orientation=column
You can also query the pg_attribute_encoding
system table to get column storage information:
SELECT *
FROM pg_attribute_encoding
WHERE attrelid = 'orders_table_compression'::regclass;
The result looks like this:
attrelid | attnum | attoptions ----------+--------+----------------------------------------------------- 393543 | 1 | {compresstype=zstd,compresslevel=5,blocksize=32768} 393543 | 2 | {compresstype=zstd,compresslevel=5,blocksize=32768} 393543 | 3 | {compresstype=zstd,compresslevel=5,blocksize=32768} (3 rows)
Compression ratio
Use the get_ao_compression_ratio()
function to check the compression ratio of a compressed table:
SELECT get_ao_compression_ratio('orders_table_compression');
A returned value of 2.14 indicates that the data, if uncompressed, would occupy more than twice the space compared to the compressed table:
get_ao_compression_ratio -------------------------- 2.14