Indexes
In traditional databases, indexes can significantly speed up data access. In distributed systems like Greengage DB (based on Greenplum), indexes should be used more sparingly. Greengage DB performs parallel sequential scans, with each segment scanning only its local data subset. Partitioning can further reduce scan size.
Since analytical queries often retrieve large datasets, using indexes is typically not efficient. Before creating indexes, test your queries without them to establish a performance baseline. Indexes might be helpful for OLTP workloads or highly selective queries, especially on compressed append-optimized tables where indexes limit decompression to relevant rows.
Overview
Index types
PostgreSQL index types
Greengage DB supports the following PostgreSQL index types:
-
B-tree
B-tree indexes can handle equality and range queries on data that can be sorted in some order. The query planner considers using a B-tree index whenever an indexed column is involved in a comparison using one of these operators:
<
,<=
,=
,>=
, and>
. -
GiST
GiST (Generalized Search Tree) indexes are designed to work with complex data types such as geometric objects, text, and arrays. GiST indexes enable efficient searching for spatial, textual, and hierarchical data.
-
SP-GiST
SP-GiST indexes (space-partitioned GiST), like GiST indexes, offer an infrastructure that supports various kinds of searches. SP-GiST allows you to implement a range of different unbalanced disk-based data structures, such as quadtrees, k-d trees, and radix trees (radix tries).
-
GIN
GIN (Generalized Inverted Index) indexes are inverted indexes suitable for data values with multiple component values. They are used for full-text search and search in arrays, JSON, and trigrams.
See Index Types in the PostgreSQL documentation for descriptions of these types.
In addition to PostgreSQL index types, Greengage DB also supports Bitmap indexes, which are optimized for analytic workloads.
Bitmap indexes
Bitmap indexes are best suited for data warehousing applications with large amounts of data, many ad hoc queries, and few data modification (DML) transactions. They can be much smaller than other index types while providing the same functionality.
Bitmap indexes are most effective for queries that contain multiple conditions in the WHERE
clause.
They allow filtering out rows that do not match all conditions before accessing the table.
Bitmap indexes perform best on columns with between 100 and 100000 distinct values, especially when the indexed column is frequently queried in conjunction with other indexed columns. However, they are not suitable for unique or highly selective columns (such as customer names or phone numbers), nor for OLTP systems with heavy concurrent data modifications.
Syntax overview
To create an index on the specified column of the specified table or materialized view, use the CREATE INDEX
command.
To modify the definition of an existing index, use ALTER INDEX
.
The following is a simplified syntax of the CREATE INDEX
command:
CREATE [UNIQUE] INDEX [<name>] ON <table_name> [USING <method>]
( {<column_name> | (<expression>)} [, ...] )
[ WHERE <predicate> ]
The main clauses and arguments for defining an index are the following:
-
UNIQUE
Whether to create a unique index. Checks for duplicate values in the table when the index is created (if data already exists) and whenever new data is inserted.
NOTENote that defining a unique constraint implicitly creates a unique index.
-
<name>
The name of the index to be created. The index is always created in the same schema as its parent table. If the name is omitted, Greengage DB chooses a suitable name based on the parent table’s name and the indexed column name.
-
ON <table_name>
The name (optionally schema-qualified) of the table or materialized view to be indexed.
-
USING <method>
The name of the index method to be used. Possible values:
btree
(default),bitmap
,gist
,spgist
, andgin
. -
<column_name>
The name of the column in the table on which the index is created. Multiple columns can be specified if the index method supports multicolumn indexes. See the Create a multicolumn index section below.
-
<expression>
An expression based on one or more columns of the table. See the Use index expressions section below.
-
WHERE <predicate>
Whether to create a partial index. A partial index includes entries for only a portion of a table — typically the ones that benefit most from indexing. See the Create a partial index section below.
Limitations
The table summarizes the limitations of the index types supported in Greengage DB.
Index method | Multicolumn indexes | Unique indexes | Supported by GPORCA |
---|---|---|---|
B-tree |
yes |
yes (heap tables only) |
yes |
Bitmap |
no |
no |
yes |
GiST |
yes |
no |
yes |
SP-GiST |
no |
no |
no |
GIN |
yes |
no |
yes |
GPORCA ignores indexes created with unsupported methods. The following features are unsupported when GPORCA is enabled:
Best practices for creating indexes
Consider the points below when you create indexes. See also: Examine index usage.
General indexing strategy
-
Your query workload.
Indexes improve performance for queries that return a single record or a small dataset.
-
Compressed tables.
Indexes can improve performance on compressed append-optimized tables for queries that return a targeted set of rows. For compressed data, an index access method means only the necessary rows are uncompressed.
-
Avoid indexes on frequently updated columns.
Creating an index on a frequently updated column increases the number of writes required when the column is updated.
Index types
-
Create selective B-tree indexes.
Index selectivity is the number of unique values in a column divided by the total number of rows in the table. For example, if a table has 1000 rows and a column has 800 distinct values, the selectivity of the index is 0.8, which is considered good. Unique indexes always have a selectivity ratio of 1.0, which is the best possible.
-
Use Bitmap indexes for low-selectivity columns.
See Bitmap indexes.
-
Avoid overlapping indexes.
Indexes with the same leading column are redundant. For example, if there is an index on
(category, product)
, you do not need another one on(category)
.
Indexing for query patterns
-
Index columns used in joins.
An index on a column used for frequent joins (such as a foreign key column) can improve join performance by enabling more join methods for the query optimizer to use.
-
Index columns frequently used in predicates.
Columns that are frequently referenced in
WHERE
clauses are good candidates for indexes. This applies not only toSELECT
queries but also toUPDATE
andDELETE
statements that use such predicates.
Index maintenance
-
Drop indexes for bulk loads.
For mass loads of data into a table, consider dropping the indexes and re-creating them after the load completes. This is often faster than updating the indexes. See the Rebuild indexes section below.
-
Consider a clustered index.
A clustered index physically orders records on disk based on the index. When data is stored sequentially (for example, by date), queries that scan a range can fetch data more efficiently using fast sequential reads. See the Cluster tables based on indexes section below.
Create indexes
Create a B-tree index
The command below creates a B-tree index on the title
column in the movies
table:
CREATE INDEX movies_title_idx ON movies (title);
This SELECT
query retrieves information about the specified movie by its title using the created index:
SELECT *
FROM movies
WHERE title = 'Movie 12345';
QUERY PLAN --------------------------------------------------------- Gather Motion 4:1 (slice1; segments: 4) -> Bitmap Heap Scan on movies Recheck Cond: (title = 'Movie 12345'::text) -> Bitmap Index Scan on movies_title_idx Index Cond: (title = 'Movie 12345'::text) Optimizer: Pivotal Optimizer (GPORCA) (6 rows)
See also: Examine index usage.
Create a bitmap index
The query below creates a bitmap index on the genre
column:
CREATE INDEX movies_genre_idx ON movies USING bitmap (genre);
The following SELECT
query is expected to use the created index:
SELECT AVG(rating) AS avg_rating
FROM movies
WHERE genre = 'Sci-Fi';
QUERY PLAN ---------------------------------------------------------------- Aggregate -> Gather Motion 4:1 (slice1; segments: 4) -> Aggregate -> Bitmap Heap Scan on movies Recheck Cond: (genre = 'Sci-Fi'::text) -> Bitmap Index Scan on movies_genre_idx Index Cond: (genre = 'Sci-Fi'::text) Optimizer: Pivotal Optimizer (GPORCA) (8 rows)
Create a multicolumn index
The command below creates an index named movies_genre_year_idx
on the genre
and year
columns:
CREATE INDEX movies_genre_year_idx ON movies (genre, year);
The created index should speed up queries that filter by genre
and year
, for example:
SELECT AVG(rating) AS avg_rating
FROM movies
WHERE genre = 'Sci-Fi'
AND year = 2010;
QUERY PLAN ------------------------------------------------------------------------------------ Aggregate -> Gather Motion 4:1 (slice1; segments: 4) -> Aggregate -> Bitmap Heap Scan on movies Recheck Cond: ((genre = 'Sci-Fi'::text) AND (year = 2010)) -> Bitmap Index Scan on movies_genre_year_idx Index Cond: ((genre = 'Sci-Fi'::text) AND (year = 2010)) Optimizer: Pivotal Optimizer (GPORCA) (8 rows)
Create a unique index
The command below creates a unique B-tree index on the id
column in the actors
heap table:
CREATE UNIQUE INDEX actors_id_idx ON actors (id);
The example request that uses the index looks as follows:
SELECT *
FROM actors
WHERE id = 123;
QUERY PLAN ------------------------------------------------ Gather Motion 1:1 (slice1; segments: 1) -> Index Scan using actors_id_idx on actors Index Cond: (id = 123) Optimizer: Pivotal Optimizer (GPORCA) (4 rows)
Create a partial index
Partial indexes are supported by the Postgres planner only.
To turn off the GPORCA query optimizer at the session level, set the optimizer
parameter to off
in the following way:
SET optimizer = off;
This request creates a partial index on the year
column to optimize queries that target movies released after 2000:
CREATE INDEX movies_year_post_2000_idx ON movies (year)
WHERE year > 2000;
The example SELECT
request might look like this:
SELECT AVG(rating) AS avg_rating
FROM movies
WHERE year = 2020;
QUERY PLAN ------------------------------------------------------------------------ Aggregate -> Gather Motion 4:1 (slice1; segments: 4) -> Aggregate -> Bitmap Heap Scan on movies Recheck Cond: (year = 2020) -> Bitmap Index Scan on movies_year_post_2000_idx Index Cond: (year = 2020) Optimizer: Postgres query optimizer (8 rows)
Use index expressions
Index expressions are supported by the Postgres planner only.
The query below creates an index that can be used to perform case-insensitive comparisons with the lower()
function:
CREATE INDEX movies_title_lower_idx ON movies (lower(title));
A sample query that uses this index might look like this:
SELECT *
FROM movies
WHERE lower(title) = lower('Movie 12345');
QUERY PLAN ---------------------------------------------------------------- Gather Motion 4:1 (slice1; segments: 4) -> Bitmap Heap Scan on movies Recheck Cond: (lower(title) = 'movie 12345'::text) -> Bitmap Index Scan on movies_title_lower_idx Index Cond: (lower(title) = 'movie 12345'::text) Optimizer: Postgres query optimizer (6 rows)
View the list of indexes
To view the list of indexes along with their sizes, use the \di+
meta-command:
\di+
The result should look like this:
List of relations Schema | Name | Type | Owner | Table | Size | Description --------+---------------------------+-------+---------+--------+---------+------------- public | actors_id_idx | index | gpadmin | actors | 288 kB | public | movies_genre_idx | index | gpadmin | movies | 960 kB | public | movies_genre_year_idx | index | gpadmin | movies | 3360 kB | public | movies_title_idx | index | gpadmin | movies | 3360 kB | public | movies_title_lower_idx | index | gpadmin | movies | 3360 kB | public | movies_year_post_2000_idx | index | gpadmin | movies | 928 kB | (6 rows)
You can also query the pg_indexes
system view to get information about indexes:
SELECT indexname, indexdef
FROM pg_indexes
WHERE tablename = 'movies';
The indexdef
column shows the definition of each index:
indexname | indexdef ---------------------------+------------------------------------------------------------------------------------------------ movies_title_idx | CREATE INDEX movies_title_idx ON public.movies USING btree (title) movies_genre_idx | CREATE INDEX movies_genre_idx ON public.movies USING bitmap (genre) movies_genre_year_idx | CREATE INDEX movies_genre_year_idx ON public.movies USING btree (genre, year) movies_year_post_2000_idx | CREATE INDEX movies_year_post_2000_idx ON public.movies USING btree (year) WHERE (year > 2000) movies_title_lower_idx | CREATE INDEX movies_title_lower_idx ON public.movies USING btree (lower(title)) (5 rows)
Maintain indexes
Collect statistics
After creating an index, you need to collect statistics via ANALYZE
:
ANALYZE;
This helps the query optimizer estimate row counts and choose efficient query plans. You can learn more about collecting statistics from Collect statistics via ANALYZE.
View index statistics
In Greengage DB, the pg_stat_*_indexes
views display access statistics for indexes only from the master instance.
To display index statistics that combine statistics from the master and the segment instances, create the following views:
-
Create a view providing access statistics for all indexes:
CREATE VIEW pg_stat_all_indexes_ggdb AS SELECT s.relid, s.indexrelid, s.schemaname, s.relname, s.indexrelname, m.idx_scan, m.idx_tup_read, m.idx_tup_fetch FROM (SELECT relid, indexrelid, schemaname, relname, indexrelname, sum(idx_scan) as idx_scan, sum(idx_tup_read) as idx_tup_read, sum(idx_tup_fetch) as idx_tup_fetch FROM gp_dist_random('pg_stat_all_indexes') WHERE relid >= 16384 GROUP BY relid, indexrelid, schemaname, relname, indexrelname UNION ALL SELECT * FROM pg_stat_all_indexes WHERE relid < 16384) m, pg_stat_all_indexes s WHERE m.relid = s.relid AND m.indexrelid = s.indexrelid;
-
Create a view providing statistics for system indexes:
CREATE VIEW pg_stat_sys_indexes_ggdb AS SELECT * FROM pg_stat_all_indexes_ggdb WHERE schemaname IN ('pg_catalog', 'information_schema') OR schemaname ~ '^pg_toast';
-
Create a view providing access statistics for user indexes:
CREATE VIEW pg_stat_user_indexes_ggdb AS SELECT * FROM pg_stat_all_indexes_ggdb WHERE schemaname NOT IN ('pg_catalog', 'information_schema') AND schemaname !~ '^pg_toast';
You can then query the pg_stat_user_indexes_ggdb
view to get access statistics for indexes created in the Create indexes section:
SELECT *
FROM pg_stat_user_indexes_ggdb
WHERE relname IN ('movies', 'actors');
The result should look like this:
relid | indexrelid | schemaname | relname | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch --------+------------+------------+---------+---------------------------+----------+--------------+--------------- 450662 | 450686 | public | movies | movies_genre_year_idx | 4 | 131 | 0 450672 | 450687 | public | actors | actors_id_idx | 1 | 1 | 1 450662 | 450688 | public | movies | movies_year_post_2000_idx | 4 | 997 | 0 450662 | 450689 | public | movies | movies_title_lower_idx | 4 | 1 | 0 450662 | 450682 | public | movies | movies_genre_idx | 0 | 4 | 0 450662 | 450678 | public | movies | movies_title_idx | 4 | 1 | 0 (6 rows)
Rebuild indexes
Use the REINDEX
command to rebuild a poorly performing index.
REINDEX
rebuilds an index using the data stored in the index’s table, replacing the old copy of the index.
The query below recreates all indexes of the movies
table:
REINDEX TABLE movies;
The next request recreates the specified index:
REINDEX INDEX movies_genre_idx;
After running REINDEX
, the index statistics are reset to zero.
To get actual statistics, run ANALYZE
on the table after reindexing:
ANALYZE movies;
For bulk data loads, it’s often faster to drop indexes before loading and recreate them afterward rather than updating the indexes during the load:
DROP INDEX movies_genre_idx;
CREATE INDEX movies_genre_idx ON movies USING bitmap (genre);
Cluster tables based on indexes
Storing table data in the same order as an index improves query performance for range scans. When records are physically ordered on disk, range queries can take advantage of fast sequential reads. This can be achieved by creating an intermediate table and loading the data in the desired order:
-
Create a new table:
CREATE TABLE actors_new ( LIKE actors ) DISTRIBUTED BY (id);
-
Load the data from the original table into the new table in the desired order:
INSERT INTO actors_new SELECT * FROM actors ORDER BY id;
-
Drop the original table:
DROP TABLE actors;
-
Rename the new table with the original table’s name:
ALTER TABLE actors_new RENAME TO actors;
-
Recreate the index:
CREATE UNIQUE INDEX actors_id_idx ON actors (id);
-
Run
VACUUM ANALYZE
for the new table:VACUUM ANALYZE actors;
Examine index usage
Use the EXPLAIN
command to examine index usage for a query.
The query below shows a detailed breakdown of the SELECT
query from Create a bitmap index:
EXPLAIN (COSTS OFF)
SELECT AVG(rating) AS avg_rating
FROM movies
WHERE genre = 'Sci-Fi';
The result should include Bitmap Heap Scan
on the movies
table and Bitmap Index Scan
on the movies_genre_idx
index:
QUERY PLAN ---------------------------------------------------------------- Aggregate -> Gather Motion 4:1 (slice1; segments: 4) -> Aggregate -> Bitmap Heap Scan on movies Recheck Cond: (genre = 'Sci-Fi'::text) -> Bitmap Index Scan on movies_genre_idx Index Cond: (genre = 'Sci-Fi'::text) Optimizer: Pivotal Optimizer (GPORCA) (8 rows)
Consider the following points when experimenting with indexes:
-
Test with real data whenever possible. Indexes that work well for test data may not be effective in production.
-
Avoid very small test sets — they can produce misleading results.
-
Design test data carefully. Uniform, random, or sorted values can distort statistics and do not reflect real-world usage.
-
Force index usage for testing by turning off certain plan types (for example, using the
enable_seqscan
andenable_nestloop
configuration parameters). -
Compare performance with and without indexes using
EXPLAIN ANALYZE
.
Drop indexes
To drop indexes, use the DROP INDEX
command:
DROP INDEX movies_genre_idx;
Note that DROP TABLE
always removes any existing indexes for the target table.