Manage spill files

Pavel Semyonov

Contents

When spill files appear
Configure spill files
Check spill files
- gp_toolkit.gp_workfile_* views
- EXPLAIN ANALYZE command
Reduce spill file usage

Spill files (or workfiles) are temporary files that Greengage DB writes to disk when a query needs more memory than it is allowed to use. These files store intermediate query results produced during operations such as joins, sorting, grouping, and other operations that exceed the available memory.

Spill files help prevent out-of-memory (OOM) errors during query execution. Instead of terminating a query that exceeds its memory limit, Greengage DB redirects the excess data to disk and continues processing. This improves system stability, particularly when many queries run concurrently or when a single query processes a large volume of data.

However, spill files can negatively impact cluster performance because reading from and writing to disk is slower than using memory. They also temporarily consume disk space, especially in queries that involve large joins, sorts, or aggregations on wide datasets.

Spill files are temporary. Greengage DB deletes them automatically after the query or transaction is completed. Administrators typically do not need to manage these files manually. Still, understanding when and why spill files are created is important for tuning queries, optimizing configuration, and managing disk usage.

Spill files are created when a query uses more memory than it is allocated. This usually happens during resource-intensive operations that process large amounts of data or use memory-intensive algorithms.

Common causes of spill files include:

Insufficient memory allocation in the cluster configuration. The amount of memory available to a query is controlled by configuration parameters such as statement_mem, max_statement_mem, and work_mem. Make sure that their values match the cluster’s expected workload and system capacity.
Data skew in the cluster. When data is unevenly distributed, some segments may receive significantly more data than others. This can exhaust memory on certain segments even when the overall workload is balanced.
Non-optimal queries, such as:
- Large joins, especially hash joins, which use in-memory hash tables. When the data exceeds memory limits, parts of the hash table or input are written to disk.
- Aggregations without filters (WHERE clause), where the entire dataset is processed without being narrowed down first.
- DISTINCT clause on a large number of fields. Such queries generate all unique combinations of values in the specified columns, requiring additional storage space.
- ORDER BY clause on large datasets. When it is used in a query, Greengage DB needs a place to temporarily store sorting results. Initially, it is memory, but large datasets often exceed the available memory and spill to disk.
Outdated statistics. When table statistics are outdated or missing, the optimizer may incorrectly estimate memory requirements and choose plans that lead to spilling.

Understanding these causes helps administrators and developers optimize queries and configure the system to prevent unnecessary spills.

Greengage DB provides options to configure the following aspects of spill files:

location
limits
compression

The location where Greengage DB creates spill files is determined by the assigned tablespaces. Spill files are stored in the pgsql_tmp/ directory inside the assigned tablespace. For example, with the default tablespace, the path to spill files is $MASTER_DATA_DIRECTORY/base/pgsql_tmp/.

To control the location of spill files, use the following configuration parameters:

temp_tablespaces — specifies common tablespaces for all temporary objects in a database, including spill files.
temp_spill_files_tablespaces — defines a separate tablespace specifically for spill files. If set, it overrides temp_tablespaces for spill file storage.

To set up a dedicated tablespace for spill files, create this tablespace and run the following commands:

$ gpconfig -c temp_spill_files_tablespaces -v '<spill_tablespace>' --masteronly
$ gpstop -u

where <spill_tablespace> is the name of the dedicated tablespace.

You can also specify several dedicated tablespaces in a comma-separated list:

$ gpconfig -c temp_spill_files_tablespaces -v '<spill_tablespace1>,...,<spill_tablespaceN>' --masteronly
$ gpstop -u

For more details, see the Configure default tablespaces section.

Spill files naming

Names of spill files follow this template:

pgsql_tmp_<operation>_<PID>.<N>

where:

<operation> is the type of the operation that created the file.
<PID> is the PID of a process that created the file.
<N> is a numeric identifier of a spill file within a process.

For example, two spill files generated by the same operation can have names pgsql_tmp_HashAggregate_2_18001.100 and pgsql_tmp_HashAggregate_2_18001.101.

The maximum number of spill files existing at a time is controlled by the gp_workfile_limit_files_per_query parameter. It sets how many spill files a single query can create on each segment. The default value — 100000 — is enough for most cases. If queries hit this limit, consider optimizing them to reduce spill file usage. The 0 value disables the limit.

To control the disk space used by spill files, use the following parameters:

gp_workfile_limit_per_query — maximum space (in kilobytes) that a single query can use on each segment.
gp_workfile_limit_per_segment — total space allowed for spill files of all running queries on each segment.

By default, both these parameters are set to 0, which means that no limits are applied. To limit the disk space for spill files, run a set of commands like the following:

$ gpconfig -c gp_workfile_limit_per_query -v '2097152'
$ gpconfig -c gp_workfile_limit_per_segment -v '16777216'
$ gpstop -r

NOTE

Note that gp_workfile_limit_per_segment is set on all segments and requires a cluster restart to apply changes. gp_workfile_limit_per_query can be changed with a configuration reload.

When resource groups are used to manage cluster resources, you can configure a memory consumption threshold in percent. If a transaction consumes this fraction of the group’s available memory, spill files are created.

You can define the memory usage threshold at two levels:

Resource group — the MEMORY_SPILL_RATIO resource group attribute sets the default threshold for all transactions in the group.
Session — the memory_spill_ratio configuration parameter can override the group default for a specific session or transaction.

To learn more about the spilling threshold, see Spill files generation: MEMORY_SPILL_RATIO.

To decrease the disk usage by spill files, you can enable their compression using the gp_workfile_compression boolean parameter:

$ gpconfig -c gp_workfile_compression -v ON --masteronly
$ gpstop -u

NOTE

Note that compression applies only to spill files generated by hash aggregation and hash join operations. Other spill files are not compressed.

When enabling spill file compression, keep in mind that compression reduces disk space consumption and I/O operations at the expense of increased CPU usage.

Compression requires additional memory that is not counted against the query’s statement_mem limit. This overhead memory is limited by the dedicated gp_workfile_compression_overhead_limit parameter, which is 2GB by default. If many queries use compressed spill files at the same time and this limit is reached, Greengage DB does not compress the following spill files. To adjust the memory overhead on spill files compression, change the gp_workfile_compression_overhead_limit value or set it to 0 to remove the limit.

Spill files are temporary — they only exist while a query is running. To determine if a query is creating spill files, you need to check the relevant system catalog views during the query execution. Once the query finishes, the files and related statistics are automatically removed.

There are two ways to check spill file usage:

Monitor gp_workfile_* system catalog views during query execution. These views provide real-time information about active spill files.
Use the EXPLAIN ANALYZE command. The output of this command includes details about spill files created during the query’s execution. This can be used to analyze how much data spilled to disk and to identify potential issues in query execution plans.

Spill files generation example

To reproduce the examples shown in this section, follow these steps:

Connect to the Greengage DB master host using psql as described in Connect to Greengage DB via psql.

Create a new database and connect to it:

CREATE DATABASE marketplace;
\c marketplace

Create a sample table:

CREATE TABLE items
(
    item_id SERIAL PRIMARY KEY,
    category TEXT,
    price NUMERIC,
    count NUMERIC,
    value NUMERIC,
    description TEXT
)
    DISTRIBUTED BY (item_id);

Fill the table with a significant amount of data, for example, one million rows:

INSERT INTO items (category, price, count, value, description)
SELECT
    CASE WHEN id % 5 = 0 THEN 'Category A'
         WHEN id % 5 = 1 THEN 'Category B'
         WHEN id % 5 = 2 THEN 'Category C'
         WHEN id % 5 = 3 THEN 'Category D'
         ELSE 'Category E' END,
    RANDOM() * 1000,
    RANDOM() * 1000,
    RANDOM() * 1000,
    md5(RANDOM()::text)
FROM generate_series(1, 1000000) id;

Set a small statement memory limit, which is likely to exhaust during a heavy query execution, for example, 5MB. This will force the DBMS to generate spill files.
```
SET statement_mem = '5MB';
```

In such a configuration, the following query consumes all available memory and generates spill files:

SELECT DISTINCT (item_id, price, count), value
FROM items
GROUP BY item_id
ORDER BY value;

To check the spill files that it creates, open another psql connection to the same database and query the system catalog views listed below while the query is running. When the query finishes, spill files are cleared, and the views will no longer contain data.

CAUTION

Don’t reproduce the provided examples on a production cluster. They alter the cluster configuration and can cause a heavy load, potentially leading to disruptions. Always use a development or staging environment for testing.

The following views of the gp_toolkit administrative schema provide information about active spill files. They are useful for monitoring and optimizing resource usage:

gp_toolkit.gp_workfile_usage_per_query — shows the usage of spill files by query. It provides aggregate information about the number and size of spill files created by each query. The view includes metadata such as spill file size, count, associated query, and more. It helps to identify resource-intensive queries.

Example:

SELECT datname, usename, query, segid, size, numfiles
FROM gp_toolkit.gp_workfile_usage_per_query;

Example result:

   datname   | usename |                                           query                                            | segid |   size   | numfiles
-------------+---------+--------------------------------------------------------------------------------------------+-------+----------+----------
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     0 | 18317312 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     3 | 18317312 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     1 | 18219008 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     2 | 18350080 |        1
(4 rows)

gp_toolkit.gp_workfile_entries — provides detailed information about active spill files in the cluster. Unlike gp_workfile_usage_per_query, which aggregates per query, this view shows one row per query operator (the optype column) that created a spill file. This granularity helps with detailed query analysis and fine-tuning.

Example:

SELECT datname, usename, query, segid, optype, size, numfiles
FROM gp_toolkit.gp_workfile_entries;

Example result:

   datname   | usename |                                           query                                            | segid | optype |   size   | numfiles
-------------+---------+--------------------------------------------------------------------------------------------+-------+--------+----------+----------
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     2 | Sort   | 18382848 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     3 | Sort   | 18284544 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     0 | Sort   | 18350080 |        1
 marketplace | gpadmin | SELECT DISTINCT (item_id, price, count), value FROM items GROUP BY item_id ORDER BY value; |     1 | Sort   | 18284544 |        1
(4 rows)

gp_toolkit.gp_workfile_usage_per_segment — displays spill files usage at the segment level. It helps understand the distribution of spill files across the cluster. By analyzing this view, you can detect load imbalances and investigate performance bottlenecks.

Example:
```
SELECT * FROM gp_toolkit.gp_workfile_usage_per_segment;
```
Example result:

 segid |   size   | numfiles
-------+----------+----------
     1 | 18251776 |        1
     0 | 18350080 |        1
     2 | 18350080 |        1
     3 | 18284544 |        1
    -1 |        0 |
(5 rows)

NOTE

The gp_toolkit.gp_workfile_entries, gp_toolkit.gp_workfile_usage_per_query, and gp_toolkit.gp_workfile_usage_per_segment views are available to all users. However, non-superusers see only information about databases they are authorized to use.

In testing environments, you can check whether a query generates spill files using the EXPLAIN ANALYZE SQL command. This command builds the query plan, executes it, and prints detailed execution statistics, including memory usage and spill information.

For example, this command outputs the query plan and spill files information:

EXPLAIN ANALYZE
    SELECT DISTINCT (item_id, price, count), value
    FROM items
    GROUP BY item_id
    ORDER BY value;

Result:

                                                                              QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
-------
 Unique  (cost=2665.04..2875.04 rows=16800 width=68) (actual time=1630.148..4123.738 rows=1000000 loops=1)
   Group Key: value, (ROW(item_id, price, count))
   ->  Gather Motion 4:1  (slice2; segments: 4)  (cost=2665.04..2791.04 rows=16800 width=68) (actual time=1630.147..3624.408 rows=1000000 loops=1)
         Merge Key: value, (ROW(item_id, price, count))
         ->  Unique  (cost=2665.04..2791.04 rows=4200 width=68) (actual time=1613.354..1982.876 rows=250461 loops=1)
               Group Key: value, (ROW(item_id, price, count))
               ->  Sort  (cost=2665.04..2707.04 rows=4200 width=68) (actual time=1613.352..1884.099 rows=250461 loops=1)
                     Sort Key (Distinct): value, (ROW(item_id, price, count))
                     Sort Method:  external merge  Disk: 71488kB
                     ->  Redistribute Motion 4:4  (slice1; segments: 4)  (cost=478.00..1486.00 rows=4200 width=68) (actual time=287.676..560.936 rows=250461 lo
ops=1)
                           Hash Key: value
                           ->  HashAggregate  (cost=478.00..814.00 rows=4200 width=68) (actual time=286.894..509.883 rows=250659 loops=1)
                                 Group Key: item_id
                                 Extra Text: (seg1)   249548 groups total in 32 batches; 1 overflows; 249548 spill groups.
 (seg1)   Hash chain length 3.9 avg, 14 max, using 72165 of 73728 buckets; total 1 expansions.

                                 ->  Seq Scan on items  (cost=0.00..436.00 rows=4200 width=100) (actual time=0.012..130.936 rows=250659 loops=1)
 Planning time: 0.139 ms
   (slice0)    Executor memory: 884863K bytes.
 * (slice1)    Executor memory: 3671K bytes avg x 4 workers, 3671K bytes max (seg0).  Work_mem: 2511K bytes max, 19201K bytes wanted.
 * (slice2)    Executor memory: 4753K bytes avg x 4 workers, 4753K bytes max (seg0).  Work_mem: 4698K bytes max, 73909K bytes wanted.
 Memory used:  5120kB
 Memory wanted:  148316kB
 Optimizer: Postgres query optimizer
 Execution time: 4246.067 ms
(25 rows)

If spill files were created during the query execution, they are shown below the query plan in lines that start with asterisks (* (slice1) and * (slice2). The Memory used and Memory wanted values show how much memory was used by the query and how much it needed. Typically, when spill files are created, Memory wanted exceeds Memory used, which in turn depends on statement_mem (memory limit per query) and memory_spill_ratio (memory usage threshold).

While spill files prevent queries from terminating when memory is exhausted, they can significantly degrade database performance and consume disk space. Therefore, minimizing or avoiding their creation is considered a best practice whenever possible.

Always try to reduce the number of spill files by changing queries, distribution policy, or memory configuration. Follow these recommendations to reduce or eliminate spill file creation:

Keep statistics up to date. Ensure that table and index statistics are regularly updated. Accurate statistics enable the query planner to generate more efficient execution plans. To learn more, see Collect statistics via ANALYZE.
Optimize queries. Review and refine query logic to enhance performance:
- Use filters. Apply filters as early as possible to reduce the amount of data processed in subsequent operations.
- Use the correct order of tables in joins. When performing LEFT JOIN, Greengage DB builds hash tables based on the right table in the join clause. If the right table is larger, this can lead to memory exhaustion. Whenever possible, organize your joins so that the larger dataset appears on the left in LEFT JOIN operations. Additionally, limit the number of fields to avoid processing excessive data.
- Avoid using ORDER BY. It is generally recommended to avoid using ORDER BY clause in Greengage DB. If sorting is necessary, try to reduce the number of rows involved or sort the results on the client side.
- Minimize fields in DISTINCT. Reduce the number of fields used in the DISTINCT clause to decrease overhead. Each additional field used in DISTINCT rapidly increases the number of unique combinations, leading to memory exhaustion.
Ensure even data distribution. Verify that data is evenly distributed across cluster segments. Balanced data ensures that all segments handle similar workloads, preventing uneven resource strain. You can learn how to check data distribution in Distribution.
Allocate enough resources. If your regular workloads include queries that produce spill files and can’t be optimized further, allocate more resources to handle them efficiently:
- Increase memory limits defined by the statement_mem and max_statement_mem configuration parameters.
- Adjust available resources with resource groups or resource queues to better manage memory and CPU usage.

Finally, to prevent disruptions caused by excessive spill file creation, configure server parameters to restrict their usage: gp_workfile_limit_files_per_query, gp_workfile_limit_per_query, and gp_workfile_limit_per_segment.

Remove expired table rows via VACUUM

Databases

Manage spill files

When spill files appear

Configure spill files

Location

Limits

Spilling threshold

Compression

Check spill files

gp_toolkit.gp_workfile_* views

EXPLAIN ANALYZE command

Reduce spill file usage