Greengage DB architecture

Pavel Semyonov

Contents

Massively parallel processing
Components
PostgreSQL-based storage
ETL

Greengage DB (based on Greenplum) is an open-source distributed database management system. It is intended for building data warehouses and analytical data platforms that operate on large data volumes, typically tens to hundreds of terabytes. Its massively parallel processing (MPP) architecture allows executing analytical workloads (OLAP) across multiple nodes in parallel. This architecture is well-suited for complex queries involving large tables, extensive joins, aggregations, and other compute-intensive operations.

Greengage DB uses PostgreSQL as the underlying storage and execution engine on each segment. Because of this foundation, it inherits PostgreSQL’s data formats, SQL semantics, and ecosystem compatibility, allowing standard BI, reporting, and ETL tools to interact with the system through familiar PostgreSQL interfaces.

Greengage DB uses the massively parallel processing (MPP) architecture, also known as shared-nothing. In contrast to another multiprocessing model — symmetric multiprocessing (SMP) — each node in an MPP system uses its own CPU, memory, and storage resources. Computing nodes never use each other’s disks or memory directly; they coordinate only through a high-speed interconnect network.

MPP and SMP multiprocessing models

MPP and SMP comparison

Characteristic	MPP (Massively Parallel Processing)	SMP (Symmetric Multiprocessing)
Resource model	Shared-nothing: independent CPU, memory, and storage	Multiple CPUs operating on shared memory and storage
Scalability	Horizontal: add new nodes	Vertical: add more resources (CPU/RAM) to a single server
Query parallelism	Parallel execution on all nodes, each processing local data	Limited by the resources of one machine
Typical use cases	OLAP: BI, analytics, data warehouses, complex queries over big datasets	OLTP, mixed workloads, small-to-medium datasets
Fault tolerance	Failures affect individual nodes	A single failure can affect the entire system
Hardware cost	Commodity hardware; cost grows linearly with cluster size	High-end hardware is required for scaling up

Characteristic

MPP (Massively Parallel Processing)

SMP (Symmetric Multiprocessing)

Resource model

Shared-nothing: independent CPU, memory, and storage

Multiple CPUs operating on shared memory and storage

Scalability

Horizontal: add new nodes

Vertical: add more resources (CPU/RAM) to a single server

Query parallelism

Parallel execution on all nodes, each processing local data

Limited by the resources of one machine

Typical use cases

OLAP: BI, analytics, data warehouses, complex queries over big datasets

OLTP, mixed workloads, small-to-medium datasets

Fault tolerance

Failures affect individual nodes

A single failure can affect the entire system

Hardware cost

Commodity hardware; cost grows linearly with cluster size

High-end hardware is required for scaling up

In the MPP architecture, each segment stores a portion of the dataset and primarily performs computations on its local data. Queries are executed in parallel across segments, and segments exchange intermediate results when needed. When a workload is designed with data distribution and parallelism in mind, the MPP approach provides significant advantages for analytical processing.

The key benefits of MPP architecture include:

Near-linear horizontal scalability

System capacity and throughput increase by adding new segment hosts with the same configuration. For many workloads, this provides close to linear performance growth as the cluster size increases.
Fault isolation and resilience

If a segment or an entire host is lost, this does not stop the cluster. Data availability is maintained through mirroring: the mirrors of failed segments step in to handle their workload, redistributing it across working nodes.
Cost efficiency

MPP systems achieve high storage capacity and throughput using relatively inexpensive commodity hardware. Storage and computational resources can be scaled by adding standard servers, which is often more cost-effective than buying new high-end hardware to scale SMP systems.

The diagram below shows the high-level architecture of Greengage DB. It includes the following key components:

Master
Segments
Standby master
Interconnect

Greengage DB architecture

The master instance is the entry point to all Greengage DB databases. It is the single point of client connection: all queries, database object management, and administrative operations are issued through the master.

Master is a PostgreSQL instance that maintains metadata for the Greengage DB cluster, such as system catalog tables and built-in administrative schemas. This metadata describes the cluster configuration and the structure and properties of user-defined objects. When a user creates a table, the master records its structure, ownership, access privileges, distribution policy, partitioning layout, and other attributes. The table data itself is stored on the segments.

The master performs the following core functions:

Client authentication

Access to data is controlled through PostgreSQL’s authentication mechanism based on roles and privileges. Authentication rules are stored in the master’s configuration files (such as pg_hba.conf), and authorization metadata is maintained in system catalog tables.
Query parsing and planning

The master receives all SQL statements from clients. It parses queries and generates their execution plans based on metadata about cluster configuration and data distribution. Query optimization is performed by one of two built-in planners: the standard PostgreSQL planner or GPORCA, the cost-based optimizer tailored to the MPP architecture of Greengage DB.
Query execution coordination and result aggregation

After planning, the master dispatches plan fragments to all segments involved in execution. Each segment processes its portion of the data using its own query executor and returns intermediate or final results to the master. The master orchestrates execution, monitors progress, and assembles the final result set before returning it to the client.
Administrative operations

Management tasks, such as creating databases, altering the data model, performing maintenance, or changing system configuration, are issued on the master. Master automatically propagates the operations to segments, ensuring system consistency across the cluster.

Standby master serves as a warm standby instance that can replace the primary master in case of a failure. It is an optional component but strongly recommended for production clusters to avoid downtime caused by a master outage. For fault tolerance, the standby master should run on a separate host so that a single hardware failure cannot affect both master instances.

When a standby master is present in a cluster, Greengage DB continuously synchronizes it with the primary master using write-ahead log (WAL) replication. Only the master’s own data directory is replicated; segments are not involved in this process. Thus, the standby master maintains an up-to-date copy of the master’s system catalog and metadata state.

Failover is manual: if the primary master becomes unavailable, an administrator performs a manual switch to the standby master according to their recovery procedures. After activation, the standby master replays any remaining WAL records to restore the system catalog state and takes the role of the primary master. Learn more about master failover in Recover a failed master.

Segments store user data in a Greengage DB cluster. Each segment is an independent PostgreSQL instance that maintains a subset of table data and participates in query execution. Segments communicate with the master and with one another when a query requires data redistribution or cross-segment operations.

Segments run on segment hosts of the cluster. One segment host can run several segments, typically from 2 to 8, depending on its hardware resources and operational requirements.

Data distribution across segments is one of the key factors influencing query performance and cluster scalability.

There are two types of segments:

Primary segments store and serve user data during normal operation.
Mirror segments maintain a synchronized copy of a primary segment’s data.

If a primary segment fails, its mirror automatically takes over, ensuring continued access to all data. To learn more, see Check and recover segments.

Mirrors run on the cluster’s segment hosts, but different ones than their corresponding primary segments. This way, a single hardware failure cannot affect both a primary segment and its mirror. The exact placement of mirrors on segment hosts is defined by the mirroring strategy used in the cluster: group or spread. With mirroring, each data segment resides on two different hosts. Thus, the cluster must include an even number of segment hosts for balanced load.

When the master receives a query, it generates a distributed execution plan using metadata about data distribution. It then dispatches plan fragments to all segments. Each segment runs its assigned steps using its own Query Executor (QE). Some operations may require data exchange between segments. Such exchange is implemented through motion operations over the interconnect. After execution, segments return intermediate or final results to the master.

Parallel execution across segments determines the performance of the entire cluster. The system performs best when:

data is evenly distributed across segments;
all segment hosts have equivalent hardware and configuration.

Uneven hardware or skewed data distribution can lead to situations where some segments finish early while others become bottlenecks, delaying the master from assembling final results.

Interconnect is the network layer that enables data exchange between segments during query execution. It is used for operations such as repartitioning, broadcasting, and merging intermediate results across nodes.

A high-throughput, low-latency network is required to support distributed execution. A minimum of 10-Gigabit Ethernet is recommended; faster networks such as 25- or 100-Gigabit Ethernet improve performance for workloads with heavy data motion.

By default, the interconnect uses UDP with reliability and flow-control mechanisms implemented at the DBMS level. The system can also operate in TCP mode or use interconnect proxies to reduce the number of point-to-point connections.

PostgreSQL-based storage

Greengage DB 6 is based on PostgreSQL 9.4, and both master and segments run as PostgreSQL instances. Each instance manages its own storage, buffer cache, WAL, and system catalog. However, the internals of these PostgreSQL instances are modified and extended, turning a collection of independent instances into a distributed MPP system. Such modifications include the coordination layer implemented in the master (query planner and dispatcher, distributed transaction manager) and the data distribution model across segments.

Although Greengage DB extends PostgreSQL to support distributed query execution and parallel storage, it retains the core PostgreSQL storage format, MVCC model, and relational semantics. This section describes the main PostgreSQL-based components that form the storage and execution foundation of Greengage DB.

On-disk storage

Each master and segment instance stores its data locally using PostgreSQL’s on-disk layout. This includes:

the base directory containing per-database and per-table files;
write-ahead logs (pg_wal);
system catalog tables;
tablespaces;
temporary files (used for query execution and spill operations);
configuration files such as postgresql.conf and pg_hba.conf.

The shared-nothing model implies that segments never share disks or storage volumes. All I/O operations — table scans, index lookups, WAL writes, and others — occur locally on each segment host. This isolation enables horizontal scaling but also requires careful data distribution to avoid storage skew.

Relational data model

Greengage DB uses the same relational model as PostgreSQL, including the object hierarchy: databases, schemas, tables, and columns. In this hierarchy, databases, schemas, and columns have no visible differences from the PostgreSQL data model. At the table level, Greengage DB introduces extensions that support distributed storage and parallel execution:

Storage types

In addition to PostgreSQL’s heap storage, Greengage DB supports append-optimized (AO) storage. AO tables are optimized for read-only queries and analytical workloads: they provide faster sequential reads and efficient bulk data loading.
Data orientation

Two orientations are available for AO tables: row (similar to traditional relational DBMS tables) and column. Column-oriented AO tables offer better performance for workloads that access only a subset of columns.
Table distribution

Every table has a distribution policy that determines how its rows are placed across segments. Choosing an appropriate distribution policy is critical for minimizing data movement during joins and aggregations. The following policies are available:
- hash distribution on one or more columns
- random distribution
- replicated distribution
Partitioning

Tables can be horizontally partitioned, which effectively splits them into smaller logical tables. This allows the system to prune unnecessary partitions at query time and makes maintenance operations more efficient. Partitioning is fully compatible with distribution: each partition is independently distributed across segments.

Together, distribution and partitioning define how relational tables map onto the cluster’s physical topology.

SQL support

Greengage DB maintains broad SQL compatibility with PostgreSQL. It supports standard DDL, DML, and query constructs, including:

CREATE, ALTER, and DROP statements for databases, schemas, tables, and indexes;
INSERT, UPDATE, DELETE, and SELECT queries;
joins, subqueries, common table expressions (CTE), window functions, and aggregates;
user-defined functions and operators (with some limitations in distributed execution).

Certain features behave differently in a distributed system. For example, sequential scans run on all segments in parallel, and some operations require data redistribution over the interconnect. However, in most cases, users can write queries the same way as they would in PostgreSQL.

Administration and maintenance

Most administrative concepts from PostgreSQL are available in Greengage DB in the same form. However, they are internally modified to execute on all the cluster instances in a consistent manner. Examples include:

GUC configuration

Greengage DB uses PostgreSQL GUCs mechanism for cluster configuration. It automatically ensures GUC values consistency across the cluster instances, except for certain parameters that can differ on master by design.
Role-based access control

The master maintains all roles and privileges using standard PostgreSQL catalogs such as pg_authid. Segments rely on the master’s catalog state propagated during query dispatch.
Statistics collection

Table and column statistics are collected using ANALYZE, similar to PostgreSQL but extended for distributed storage.
Vacuuming

Greengage DB reclaims disk space occupied by expired rows (logically deleted) using VACUUM or vacuumdb and follows PostgreSQL semantics. Append-optimized tables use a different vacuum model that tracks row visibility metadata separately.
Temporary and spill file cleanup

Segments generate temporary and spill files during large query executions. Their cleanup uses PostgreSQL’s cleanup mechanisms with centralized management from the master.

Greengage DB provides several mechanisms for integration with external systems, which allows using it in Extract–Transform–Load (ETL) pipelines. ETL pipelines are a common part of data warehouse workflows. They are used, for example, to load large volumes of data from various sources into the warehouse. Greengage DB includes components designed for high-throughput and scalable data load.

A key component is gpfdist — a lightweight file distribution server used together with external tables. External tables allow Greengage DB to read and write data without storing it locally on segments. They support common data formats such as CSV or text, and custom formats via user-defined transforms. When using gpfdist, data is streamed directly from the ETL host to segment hosts bypassing the master. Data is loaded in parallel: each segment reads its own portion of the file stream.

Loading data into Greengage DB using gpfdist

For best performance, it is recommended to run gpfdist and other ETL tools on dedicated hosts rather than on segment hosts, to avoid resource contention. For large datasets, you can use multiple gpfdist instances to increase the level of parallelism.

Additionally, Greengage DB includes the gpload utility, which allows managing and orchestrating the entire workflow of data loading via gpfdist, for example:

create tables for the data being loaded;
transform data;
clean up after data load.

ETL pipelines that use gpfdist and gpload can also integrate with tools such as Apache Airflow, Apache NiFi, or use custom scripts.

Greengage DB

Software

Greengage DB architecture

Massively parallel processing

Components

Master

Standby master

Segments

Interconnect

PostgreSQL-based storage

On-disk storage

Relational data model

SQL support

Administration and maintenance

ETL