GitHub

Tablespaces

Andrey Aksenov

Tablespaces allow you to specify file system locations where database object files, like tables and indexes, are stored. This lets you optimize performance by placing frequently accessed data on high-speed storage while storing historical or less critical data on lower-tier storage.

A tablespace is a global object, so you can use it from any database if you have the appropriate privileges.

Predefined tablespaces

A newly initialized Greengage DB cluster includes two predefined tablespaces:

  • pg_global

    A tablespace used for cluster-wide system catalogs shared across all databases.

  • pg_default

    The default tablespace. For example, the template0 and template1 databases use this tablespace. The pg_default tablespace also serves as the default storage location for new database objects unless another tablespace is explicitly specified.

Create tablespaces

Create storage directories

Before creating a tablespace, you need to set up the necessary storage directories. These directories should meet the following requirements:

  • Exist on all cluster hosts, including the primary, standby master, and segment hosts.

  • Be empty.

  • Be owned by the Greengage DB administrative user (typically gpadmin).

  • Not reside on removable or transient storage. If a tablespace is lost due to directory deletion or disk failure, the entire cluster may become unusable.

To execute commands described in the following sections, connect to the Greengage DB master host as the gpadmin user as described in Connect to Greengage DB via psql. You must create the /fast_storage directory on all cluster hosts. The example script below shows how to do this using the gpssh utility:

#!/bin/bash

sudo -n mkdir -p /fast_storage && sudo chown gpadmin:gpadmin /fast_storage
gpssh -h smdw -e "sudo -n mkdir -p /fast_storage && sudo chown gpadmin:gpadmin /fast_storage"
gpssh -f hostfile_segment_hosts -e "sudo mkdir -p /fast_storage && sudo chown -R gpadmin:gpadmin /fast_storage"

Create a tablespace

To create a tablespace, use the CREATE TABLESPACE command. The LOCATION clause specifies the path to the tablespace directory:

CREATE TABLESPACE fast_storage LOCATION '/fast_storage';

You can optionally add a comment to the created tablespace:

COMMENT ON TABLESPACE fast_storage IS 'High-speed local SSD storage';

Then, you can allow end users to utilize the created tablespace by granting them the CREATE privilege. For example, users can assign new objects to this tablespace or move existing ones to it.

View information about tablespaces

To show the list of tablespaces, use the \db meta-command:

\db+

The result might look like this:

                                         List of tablespaces
     Name     |  Owner  |   Location    | Access privileges | Options |         Description
--------------+---------+---------------+-------------------+---------+------------------------------
 fast_storage | gpadmin | /fast_storage |                   |         | High-speed local SSD storage
 pg_default   | gpadmin |               |                   |         |
 pg_global    | gpadmin |               |                   |         |

Note the following for the Location column:

  • For user-defined tablespaces, this column shows the tablespace’s physical location.

  • For predefined tablespaces, this column is empty. The location of the predefined tablespaces is the same as the data directory configured when the cluster is initialized.

You can also query the pg_tablespace system catalog table to view the list of tablespaces:

SELECT oid, *
FROM pg_tablespace;

The result looks as follows:

  oid  |   spcname    | spcowner | spcacl | spcoptions
-------+--------------+----------+--------+------------
  1663 | pg_default   |       10 |        |
  1664 | pg_global    |       10 |        |
 41003 | fast_storage |       10 |        |

You can call the gp_tablespace_location() function to get the tablespace’s location by the oid value:

SELECT * FROM gp_tablespace_location(41003);

The result looks as follows:

 gp_segment_id |  tblspc_loc
---------------+---------------
             3 | /fast_storage
             0 | /fast_storage
             1 | /fast_storage
             2 | /fast_storage
            -1 | /fast_storage

The query below uses the tablespace name, the gp_tablespace_location() function, and the gp_segment_configuration catalog table to display segment instance information along with file system locations for the fast_storage tablespace:

SELECT segment_config.role,
       tablespace_location.gp_segment_id AS segment_id,
       segment_config.hostname,
       segment_config.datadir,
       tablespace_location.tblspc_loc
FROM gp_tablespace_location(
             (SELECT oid
              FROM pg_tablespace
              WHERE spcname = 'fast_storage')) AS tablespace_location
         JOIN gp_segment_configuration AS segment_config ON tablespace_location.gp_segment_id = segment_config.content
ORDER BY segment_id;

The result looks like this:

 role | segment_id | hostname |        datadir        |  tblspc_loc
------+------------+----------+-----------------------+---------------
 p    |         -1 | mdw      | /data1/master/gpseg-1 | /fast_storage
 m    |         -1 | smdw     | /data1/master/gpseg-1 | /fast_storage
 p    |          0 | sdw1     | /data1/primary/gpseg0 | /fast_storage
 m    |          0 | sdw2     | /data1/mirror/gpseg0  | /fast_storage
 p    |          1 | sdw1     | /data1/primary/gpseg1 | /fast_storage
 m    |          1 | sdw2     | /data1/mirror/gpseg1  | /fast_storage
 p    |          2 | sdw2     | /data1/primary/gpseg2 | /fast_storage
 m    |          2 | sdw1     | /data1/mirror/gpseg2  | /fast_storage
 m    |          3 | sdw1     | /data1/mirror/gpseg3  | /fast_storage
 p    |          3 | sdw2     | /data1/primary/gpseg3 | /fast_storage
NOTE

Internally, user-defined tablespaces are stored as symbolic links in the pg_tblspc directory within each segment’s data directory. These links point to the actual tablespace directory on the filesystem.

Assign objects to a tablespace

To assign objects to a specific tablespace, use the CREATE or ALTER commands with the TABLESPACE clause. By default, the following tablespaces are used when creating objects using CREATE:

  • CREATE DATABASE creates a database with a tablespace used by the template database. By default, it is the template1 database unless a different database is specified in the TEMPLATE clause.

  • CREATE TABLE, CREATE INDEX, and CREATE MATERIALIZED VIEW use a tablespace specified in the default_tablespace server configuration parameter. Learn how to change it in the Configure default tablespaces section.

The commands in this section demonstrate how to assign objects to a tablespace during their creation.

Databases

To specify a tablespace, use the TABLESPACE clause when creating the database:

CREATE DATABASE books_store TABLESPACE fast_storage;

The \l+ meta-command allows you to see tablespaces used by different databases in the Tablespace column:

\l+

The result looks like this:

                                                                  List of databases
    Name     |  Owner  | Encoding |  Collate   |   Ctype    |  Access privileges  | Size  |  Tablespace  |                Description
-------------+---------+----------+------------+------------+---------------------+-------+--------------+--------------------------------------------
 books_store | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 |                     | 81 MB | fast_storage |
 postgres    | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 |                     | 83 MB | pg_default   | default administrative connection database
 template0   | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +| 81 MB | pg_default   | unmodifiable empty database
             |         |          |            |            | gpadmin=CTc/gpadmin |       |              |
 template1   | gpadmin | UTF8     | en_US.utf8 | en_US.utf8 | =c/gpadmin         +| 81 MB | pg_default   | default template for new databases
             |         |          |            |            | gpadmin=CTc/gpadmin |       |              |

Note that all objects created in this database use its tablespace. If required, you can explicitly set another tablespace for a database object as described in the following sections.

Tables

The command below shows how to specify a tablespace when creating a new table:

CREATE TABLE movies
(
    movie_id SERIAL PRIMARY KEY,
    genre    TEXT NOT NULL
)
    TABLESPACE fast_storage
    DISTRIBUTED BY (movie_id);

To determine a tablespace used by the table, query the pg_tables view:

SELECT *
FROM pg_tables
WHERE tablename = 'movies';

The tablespace column shows the used tablespace:

 schemaname | tablename | tableowner |  tablespace  | hasindexes | hasrules | hastriggers
------------+-----------+------------+--------------+------------+----------+-------------
 public     | movies    | gpadmin    | fast_storage | t          | f        | f

Indexes

Specify a tablespace when creating an index as follows:

CREATE INDEX movies_genre_idx
    ON movies (genre)
    TABLESPACE fast_storage;

To determine a tablespace used by the index, query the pg_indexes view:

SELECT *
FROM pg_indexes
WHERE tablename = 'movies';

The result looks like this:

 schemaname | tablename |    indexname     |  tablespace  |                                indexdef
------------+-----------+------------------+--------------+-------------------------------------------------------------------------
 public     | movies    | movies_pkey      |              | CREATE UNIQUE INDEX movies_pkey ON public.movies USING btree (movie_id)
 public     | movies    | movies_genre_idx | fast_storage | CREATE INDEX movies_genre_idx ON public.movies USING btree (genre)

Materialized views

The command below shows how to specify a tablespace for a materialized view:

CREATE MATERIALIZED VIEW comedies TABLESPACE fast_storage AS
SELECT *
FROM movies
WHERE genre = 'comedy';

To determine a tablespace used by the materialized view, query the pg_matviews view:

SELECT *
FROM pg_matviews
WHERE matviewname = 'comedies';

The result looks as follows:

 schemaname | matviewname | matviewowner |  tablespace  | hasindexes | ispopulated |                definition
------------+-------------+--------------+--------------+------------+-------------+------------------------------------------
 public     | comedies    | gpadmin      | fast_storage | f          | t           |  SELECT movies.movie_id,                +
            |             |              |              |            |             |     movies.genre                        +
            |             |              |              |            |             |    FROM movies                          +
            |             |              |              |            |             |   WHERE (movies.genre = 'comedy'::text);

Move objects to another tablespace

To change a tablespace for the specified object, use the ALTER command with the SET TABLESPACE clause. The commands below assign objects created in the Assign objects to a tablespace section to the default pg_default tablespace:

ALTER DATABASE books_store
    SET TABLESPACE pg_default;
ALTER TABLE movies
    SET TABLESPACE pg_default;
ALTER INDEX movies_genre_idx
    SET TABLESPACE pg_default;
ALTER MATERIALIZED VIEW comedies
    SET TABLESPACE pg_default;

Configure default tablespaces

New objects

The default_tablespace server configuration parameter allows you to specify the default tablespace for database objects created using these commands:

  • CREATE TABLE

  • CREATE INDEX

  • CREATE MATERIALIZED VIEW

Note that default_tablespace does not apply to the CREATE DATABASE command. In this case, a newly created database inherits a tablespace used by the template database. By default, it is the template1 database unless a different database is specified in the TEMPLATE clause.

The command below shows how to set the default tablespace to fast_storage using gpconfig:

$ gpconfig -c default_tablespace -v 'fast_storage' --masteronly

To apply the changes, reload the configuration using gpstop:

$ gpstop -u

Temporary objects

Tablespaces are also used to store temporary objects that include:

  • Temporary tables and indexes that are created manually using the CREATE TEMP TABLE command.

  • Temporary and spill files that are created automatically to support query execution, including hash aggregates, hash joins, and sorting large datasets.

  • Temporary tables that are created automatically when refreshing the materialized view data.

Two server configuration parameters control where temporary objects are stored:

  • temp_tablespaces controls where all types of temporary objects are stored.

  • temp_spill_files_tablespaces allows temporary and spill files created automatically to be stored in separate tablespaces. When specified, this parameter takes precedence over temp_tablespaces for such files.

Both parameters accept a comma-separated list of tablespaces. A tablespace is randomly selected for each temporary object. This helps distribute the load across multiple tablespaces.

The default values for temp_tablespaces and temp_spill_files_tablespaces are empty strings corresponding to the current database’s default tablespace. The command below shows how to set the default tablespace for temporary objects at the cluster level using gpconfig:

$ gpconfig -c temp_tablespaces -v 'fast_storage' --masteronly

To apply the changes, reload the configuration using gpstop:

$ gpstop -u

Drop a tablespace

To remove a tablespace, use the DROP TABLESPACE command:

DROP TABLESPACE fast_storage;

You can drop a tablespace only if it is empty. Ensure that no database objects are using the tablespace before attempting to remove it. Additionally, the tablespace must not contain any temporary or transaction files.