JOIN clause
A SELECT statement allows you to retrieve data from one or more tables.
To query related data across multiple tables, you can use JOIN
operations, which merge rows based on matching values in specified columns.
Syntax overview
The CROSS JOIN
clause returns all possible combinations of rows from <table1>
and <table2>
, also known as the Cartesian product of the two tables.
Its syntax is shown below:
-- CROSS JOIN
<table1> CROSS JOIN <table2>
-- or implicit CROSS JOIN
<table1>, <table2>
The comma-separated form is equivalent to an explicit CROSS JOIN
.
This is the only join type that does not require a join condition.
Other join types allow you to specify which rows appear in the result set and how to handle unmatched rows.
The syntax of JOIN
clauses is shown below:
-- ON
<table1> { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN <table2> ON <join_condition>
-- USING
<table1> { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN <table2> USING (join_column [, ...])
-- NATURAL
<table1> NATURAL { [INNER] | { LEFT | RIGHT | FULL } [OUTER] } JOIN <table2>
The supported join types are:
-
[INNER] JOIN
— returns only rows with matching values in both tables. -
LEFT [OUTER] JOIN
— returns all rows from the left table with matching rows from the right table; unmatched right-side columns are filled withNULL
values. -
RIGHT [OUTER] JOIN
— returns all rows from the right table with matching rows from the left table; unmatched left-side columns are filled withNULL
values. -
FULL [OUTER] JOIN
— returns all matching and non-matching rows from both tables; unmatched columns on either side are filled withNULL
values.
The INNER
and OUTER
keywords are optional.
JOIN
alone defaults to INNER JOIN
, while LEFT JOIN
, RIGHT JOIN
, and FULL JOIN
behave the same as their OUTER
variants.
These join types require specifying how rows from the tables match by using one of the following join conditions:
-
ON
— uses a boolean expression to define which rows match, similar to aWHERE
clause. -
USING
— matches rows based on one or more columns with the same name in both tables. -
NATURAL
— automatically joins tables using all columns with the same names without explicitly listing them.
Prerequisites
To follow the examples in this topic, connect to the Greengage DB master host using psql
as described in Connect to Greengage DB via psql.
Create a new database and connect to it:
CREATE DATABASE marketplace;
\c marketplace
Then, create three related tables and populate them with data:
Create a table to store the different product categories:
CREATE TABLE categories
(
category_code TEXT PRIMARY KEY,
category_name TEXT
)
DISTRIBUTED REPLICATED;
Insert data:
INSERT INTO categories
VALUES ('elec', 'Electronics'),
('furn', 'Furniture');
Create a table to hold information about each product, including the category it belongs to:
CREATE TABLE products
(
id INT,
product_name TEXT,
category_code TEXT REFERENCES categories (category_code)
)
WITH (appendoptimized = true)
DISTRIBUTED BY (id);
Insert data:
INSERT INTO products
VALUES (1, 'Laptop', 'elec'),
(2, 'Headphones', 'elec'),
(3, 'Smartphone', 'elec'),
(4, 'Desk Chair', 'furn'),
(5, 'Dining Table', 'furn');
-- product with id = 5 does not exist in the 'sales' table
Create a table holding individual sales transactions:
CREATE TABLE sales
(
id INT,
product_id INT,
date DATE
)
WITH (appendoptimized = true)
DISTRIBUTED BY (product_id);
Insert data:
INSERT INTO sales
VALUES (101, 1, '2025-01-05'),
(102, 2, '2025-01-10'),
(103, 4, '2025-01-15'),
(104, 3, '2025-01-20'),
(105, 6, '2025-02-01');
-- product with id = 6 does not exist in the 'products' table
Join types
CROSS JOIN
This query demonstrates how to use CROSS JOIN
:
SELECT categories.category_name,
products.product_name
FROM categories
CROSS JOIN products;
SELECT categories.category_name,
products.product_name
FROM categories,
products;
In the output, each row from categories
is combined with every row from products
:
category_name | product_name ---------------+-------------- Electronics | Laptop Furniture | Laptop Electronics | Smartphone Electronics | Desk Chair Furniture | Smartphone Furniture | Desk Chair Electronics | Dining Table Furniture | Dining Table Electronics | Headphones Furniture | Headphones (10 rows)
INNER JOIN
This query uses INNER JOIN
to return only the products that have matching sales records:
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
JOIN sales ON products.id = sales.product_id;
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
INNER JOIN sales ON products.id = sales.product_id;
It combines rows from the products
and sales
tables based on the shared product_id
column, excluding products with no corresponding sales.
The result looks like this:
id | product_name | sale_id | sale_date ----+--------------+---------+------------ 3 | Smartphone | 104 | 2025-01-20 4 | Desk Chair | 103 | 2025-01-15 1 | Laptop | 101 | 2025-01-05 2 | Headphones | 102 | 2025-01-10 (4 rows)
LEFT OUTER JOIN
This query uses LEFT JOIN
to return all products, including those without any matching sales records:
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
LEFT JOIN sales ON products.id = sales.product_id;
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
LEFT OUTER JOIN sales ON products.id = sales.product_id;
The result shows all products with NULL
values for those without sales:
id | product_name | sale_id | sale_date ----+--------------+---------+------------ 3 | Smartphone | 104 | 2025-01-20 4 | Desk Chair | 103 | 2025-01-15 1 | Laptop | 101 | 2025-01-05 5 | Dining Table | | 2 | Headphones | 102 | 2025-01-10 (5 rows)
To find products that have no sales, filter for rows where sales.id
is NULL
:
SELECT products.id,
products.product_name
FROM products
LEFT JOIN sales ON products.id = sales.product_id
WHERE sales.id IS NULL;
The result looks like this:
id | product_name ----+-------------- 5 | Dining Table (1 row)
RIGHT OUTER JOIN
In the example below, RIGHT JOIN
is used to return all sales
records, including those without matching products:
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
RIGHT JOIN sales ON products.id = sales.product_id;
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
RIGHT OUTER JOIN sales ON products.id = sales.product_id;
The result shows that there is no product with sale_id
equal to 105
:
id | product_name | sale_id | sale_date ----+--------------+---------+------------ 1 | Laptop | 101 | 2025-01-05 4 | Desk Chair | 103 | 2025-01-15 3 | Smartphone | 104 | 2025-01-20 | | 105 | 2025-02-01 2 | Headphones | 102 | 2025-01-10 (5 rows)
FULL OUTER JOIN
In the example below, FULL JOIN
returns all products and all sales, including unmatched rows from either table:
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
FULL JOIN sales ON products.id = sales.product_id;
SELECT products.id,
products.product_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
FULL OUTER JOIN sales ON products.id = sales.product_id;
The result should look like this:
id | product_name | sale_id | sale_date ----+--------------+---------+------------ 3 | Smartphone | 104 | 2025-01-20 4 | Desk Chair | 103 | 2025-01-15 1 | Laptop | 101 | 2025-01-05 5 | Dining Table | | | | 105 | 2025-02-01 2 | Headphones | 102 | 2025-01-10 (6 rows)
Join conditions
Join conditions define which rows from two source tables correspond to each other.
You can specify this condition using the ON
, USING
, or NATURAL
clause.
ON
The ON
clause allows you to specify a condition that determines how rows from two tables are matched during a join.
In the example below, values from the products.id
and sales.product_id
columns are matched:
SELECT products.product_name,
sales.date AS sale_date
FROM products
JOIN sales ON products.id = sales.product_id;
The result should look like this:
product_name | sale_date --------------+------------ Headphones | 2025-01-10 Smartphone | 2025-01-20 Desk Chair | 2025-01-15 Laptop | 2025-01-05 (4 rows)
You can also include multiple conditions in the ON
clause.
In this example, an additional filter on the sales.date
column is applied using AND
:
SELECT products.product_name,
sales.date AS sale_date
FROM products
JOIN sales ON products.id = sales.product_id AND sales.date >= '2025-01-15';
The result is as follows:
product_name | sale_date --------------+------------ Smartphone | 2025-01-20 Desk Chair | 2025-01-15 (2 rows)
USING
The USING
clause lets you join tables based on one or more columns with the same name:
SELECT categories.category_name AS category_name,
products.product_name AS product_name
FROM categories
JOIN products USING (category_code);
The result looks like this:
category_name | product_name ---------------+-------------- Electronics | Smartphone Furniture | Desk Chair Electronics | Laptop Electronics | Headphones Furniture | Dining Table (5 rows)
NATURAL
NATURAL JOIN
automatically joins tables using all columns with matching names, eliminating the need to specify the join condition explicitly:
SELECT categories.category_name AS category_name,
products.product_name AS product_name
FROM categories
NATURAL JOIN products;
The result should look as follows:
category_name | product_name ---------------+-------------- Electronics | Smartphone Furniture | Desk Chair Electronics | Laptop Electronics | Headphones Furniture | Dining Table (5 rows)
NATURAL JOIN
automatically joins all columns with matching names at query execution, so adding new matching columns to the tables can change the query result unexpectedly.
Behavior of SELECT * with joins
The columns returned by SELECT *
in a join depend on the type of join condition used:
-
ON
— the result set includes all columns from both tables, including the columns used in the join condition. -
USING
— the result set includes all columns from both tables, but the column used in the join condition appears only once. -
NATURAL JOIN
— the result set includes all columns from both tables, but any columns with the same name in both tables appear only once.
Self-join
A self-join is a join where a table is joined with itself.
In the example below, the products
table is joined to itself to find pairs of products that belong to the same category:
SELECT p1.product_name AS product_1,
p2.product_name AS product_2,
p1.category_code
FROM products p1
JOIN products p2 ON p1.category_code = p2.category_code AND p1.id < p2.id;
Note that table aliases (p1
and p2
) are used to differentiate between the two instances of the same products
table.
The result is as follows:
product_1 | product_2 | category_code ------------+--------------+--------------- Headphones | Smartphone | elec Desk Chair | Dining Table | furn Laptop | Headphones | elec Laptop | Smartphone | elec (4 rows)
Join multiple tables
You can join multiple tables in a single query to combine related data. The following query joins three tables to return product, category, and sales information:
SELECT products.id,
products.product_name,
categories.category_name AS category_name,
sales.id AS sale_id,
sales.date AS sale_date
FROM products
JOIN categories USING (category_code)
LEFT JOIN sales ON products.id = sales.product_id;
The result looks like this:
id | product_name | category_name | sale_id | sale_date ----+--------------+---------------+---------+------------ 3 | Smartphone | Electronics | 104 | 2025-01-20 4 | Desk Chair | Furniture | 103 | 2025-01-15 5 | Dining Table | Furniture | | 2 | Headphones | Electronics | 102 | 2025-01-10 1 | Laptop | Electronics | 101 | 2025-01-05 (5 rows)