Hello, I’m DocuDroid!
Submitting feedback
Thank you for rating our AI Search!
We would be grateful if you could share your thoughts so we can improve our AI Search for you and other readers.
GitHub

Data compression

Andrey Aksenov

Data compression in Greengage DB clusters helps optimize I/O performance and reduce storage requirements. Compression is available for append-optimized tables only and can be configured at two distinct levels:

  • Table-level compression applies one compression algorithm to all table columns.

  • Column-level compression allows different compression algorithms to be applied to individual columns. This compression type is available for column-oriented append-optimized tables only.

Overview

Supported compression algorithms

The table below compares compression algorithms supported in Greengage DB, including their usage scope and corresponding storage options.

Zstandard Deflate Run-length encoding

Description

Combines high speed with efficient compression

Typically used when maximum compatibility with legacy formats or systems is required. For most workloads, Zstandard generally achieves better compression ratios and performance

Run-length encoding (RLE) compresses data better than other algorithms when the same data values occur in many consecutive rows. However, RLE is not useful for columns where repeated values are less common

Table types

AO: row and column-oriented

AO: row and column-oriented

AO: column-oriented

Applies to

Tables and columns

Tables and columns

Columns

compresstype

zstd

zlib

rle_type

compresslevel

Possible values are 1 — 19:

  • 1 (default) — fastest, least compression.

  • 19 — slowest, highest compression.

Possible values are 1 — 9:

  • 1 (default) — fastest, least compression.

  • 9 — slowest, highest compression.

Possible values are 1 — 4:

  • 1 (default) — RLE only. Fastest, least compression.

  • 2 — RLE + zlib with compression level 1.

  • 3 — RLE + zlib with compression level 5.

  • 4 — RLE + zlib with compression level 9. Slowest, highest compression.

Note the following:

  • Higher compression levels provide better data reduction but increase CPU usage and processing time. Choose a compression level that balances storage savings with available CPU and performance requirements.

  • The performance of compressed tables varies based on hardware and query characteristics. Benchmark different compression settings in your environment to determine their impact.

TIP

Use rle_type compression for columns with large sets of repeated values. For general use, zstd at level 1 provides an optimal balance of speed and compression, comparable to zlib at level 5.

Compression settings syntax

You can specify compression settings at the table or column level using the CREATE TABLE or ALTER TABLE commands. The basic CREATE TABLE command syntax is described in the CREATE TABLE syntax section.

To specify compression settings for a table or column, use one of the following clauses:

  • WITH to apply compression at the table level:

    CREATE TABLE <table_name>
    (
        [ ... ]
    )
    WITH ( <storage_directive> [, ... ] )
  • ENCODING to apply compression to the specified column:

    <column_name> <data_type> ENCODING ( <storage_directive> [, ... ] ), ...
    -- or --
    COLUMN <column_name> ENCODING ( <storage_directive> [, ... ] ), ...
    -- or --
    DEFAULT COLUMN ENCODING ( <storage_directive> [, ... ] )

Regardless of the level at which you define compression settings, you can use the following storage directives:

  • compresstype — specifies the compression algorithm to use. See the compresstype row in the Supported compression algorithms table for possible values. Values are case-insensitive. The default is none (no compression).

  • compresslevel — specifies the compression level. See the compresslevel row in the Supported compression algorithms table for possible values.

  • blocksize — defines the size (in bytes) of each storage block in a table. Smaller block sizes favor faster random access, while larger block sizes favor faster sequential access.

You can also use the storage directives mentioned above to specify compression settings when defining a new data type using the CREATE TYPE command.

NOTE

The gp_default_storage_options server configuration parameter allows you to set the default values for table storage options, including compresstype and compresslevel.

Precedence of compression settings

Column compression settings are inherited in the following order of precedence — from lowest to highest:

type -> table -> partition -> subpartition

This means that:

  • Table-level settings override type-level settings.

  • Partition-level settings override table and type-level settings.

  • Subpartition-level settings override all higher-level settings, including partition, table, and type.

When an ENCODING clause conflicts with a WITH clause, ENCODING has a higher precedence.

NOTE

The gp_add_column_inherits_table_setting server configuration parameter enables automatic inheritance of compression settings when adding a new column to a column-oriented append-optimized table.

Configure compression settings

Prerequisites

To execute commands described in the following sections, connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql. Then, create a new database and connect to it:

CREATE DATABASE marketplace;
\c marketplace

Table level

This command creates a row-oriented append-optimized table with Zstandard (zstd) compression at level 5 specified using the WITH clause:

CREATE TABLE customers_table_compression
(
    customer_id INTEGER,
    name        VARCHAR(25)
)
    WITH (
        appendoptimized = true,
        compresstype = zstd,
        compresslevel = 5
        )
    DISTRIBUTED BY (customer_id);
INSERT INTO customers_table_compression (customer_id, name)
SELECT customer_number               AS customer_id,
       'customer' || customer_number AS name
FROM generate_series(1, 1000) AS customer_number;

You can use a similar command to create a compressed column-oriented table:

CREATE TABLE orders_table_compression
(
    order_id    INTEGER,
    customer_id INTEGER,
    amount      DECIMAL(6, 2)
)
    WITH (
        appendoptimized = true,
        orientation = column,
        compresstype = zstd,
        compresslevel = 5
        )
    DISTRIBUTED BY (customer_id);
INSERT INTO orders_table_compression (order_id, customer_id, amount)
SELECT order_number                               AS order_id,
       FLOOR(RANDOM() * 1000 + 1)::INTEGER        AS customer_id,
       ROUND((100 + RANDOM() * 1000)::NUMERIC, 2) AS amount
FROM generate_series(1, 10000) AS order_number;

Column level

The command below applies compression at the column level using the ENCODING clause. The order_status column uses RLE compression (rle_type), while the amount column is compressed using Zstandard (zstd):

CREATE TABLE orders_column_compression
(
    order_id     INTEGER,
    customer_id  INTEGER,
    order_status TEXT ENCODING (compresstype = rle_type, compresslevel = 1),
    amount       DECIMAL(6, 2) ENCODING (compresstype = zstd, compresslevel = 5)
)
    WITH (
        appendoptimized = true,
        orientation = column
        )
    DISTRIBUTED BY (customer_id);
INSERT INTO orders_column_compression (order_id, customer_id, order_status, amount)
SELECT order_number                                                                          AS order_id,
       FLOOR(RANDOM() * 1000 + 1)::INTEGER                                                   AS customer_id,
       (ARRAY ['new', 'shipped', 'cancelled', 'returned'])[FLOOR(RANDOM() * 4 + 1)::INTEGER] AS order_status,
       ROUND((100 + RANDOM() * 1000)::NUMERIC, 2)                                            AS amount
FROM generate_series(1, 10000) AS order_number;

View compression information

Table compression information

To view compression options applied to a specific table, use the \d+ meta-command:

\d+ orders_table_compression

In the output, compression options are shown in the Options section:

                                   Append-Only Columnar Table "public.orders_table_compression"
   Column    |     Type     | Modifiers | Storage | Stats target | Compression Type | Compression Level | Block Size | Description
-------------+--------------+-----------+---------+--------------+------------------+-------------------+------------+-------------
 order_id    | integer      |           | plain   |              | zstd             | 5                 | 32768      |
 customer_id | integer      |           | plain   |              | zstd             | 5                 | 32768      |
 amount      | numeric(6,2) |           | main    |              | zstd             | 5                 | 32768      |
Checksum: t
Distributed by: (customer_id)
Options: appendonly=true, orientation=column, compresstype=zstd, compresslevel=5

You can also query the pg_appendonly system table to get information about the storage options and other characteristics of an append-optimized table:

SELECT compresstype, compresslevel, blocksize, columnstore
FROM pg_appendonly
WHERE relid = 'orders_table_compression'::regclass;

The output might look as follows:

 compresstype | compresslevel | blocksize | columnstore
--------------+---------------+-----------+-------------
 zstd         |             5 |     32768 | t
(1 row)

Column compression information

To view compression options applied to columns, use the \d+ meta-command:

\d+ orders_column_compression

The Compression Type and Compression Level columns show the compression algorithm and level used for each table column:

                                    Append-Only Columnar Table "public.orders_column_compression"
    Column    |     Type     | Modifiers | Storage  | Stats target | Compression Type | Compression Level | Block Size | Description
--------------+--------------+-----------+----------+--------------+------------------+-------------------+------------+-------------
 order_id     | integer      |           | plain    |              | none             | 0                 | 32768      |
 customer_id  | integer      |           | plain    |              | none             | 0                 | 32768      |
 order_status | text         |           | extended |              | rle_type         | 1                 | 32768      |
 amount       | numeric(6,2) |           | main     |              | zstd             | 5                 | 32768      |
Checksum: t
Distributed by: (customer_id)
Options: appendonly=true, orientation=column

You can also query the pg_attribute_encoding system table to get column storage information:

SELECT *
FROM pg_attribute_encoding
WHERE attrelid = 'orders_table_compression'::regclass;

The result looks like this:

 attrelid | attnum |                     attoptions
----------+--------+-----------------------------------------------------
   393543 |      1 | {compresstype=zstd,compresslevel=5,blocksize=32768}
   393543 |      2 | {compresstype=zstd,compresslevel=5,blocksize=32768}
   393543 |      3 | {compresstype=zstd,compresslevel=5,blocksize=32768}
(3 rows)

Compression ratio

Use the get_ao_compression_ratio() function to check the compression ratio of a compressed table:

SELECT get_ao_compression_ratio('orders_table_compression');

A returned value of 2.14 indicates that the data, if uncompressed, would occupy more than twice the space compared to the compressed table:

 get_ao_compression_ratio
--------------------------
                     2.14