pg_mooncake is a PostgreSQL extension that adds native columnstore tables with DuckDB execution. Columnstore tables are stored as Iceberg or Delta Lake tables in object storage.
The extension is maintained by Mooncake Labs and is available on Neon Postgres.
-- Create a columnstore table in PostgreSQL
CREATE TABLE user_activity (....) USING columnstore;
-- Insert data into a columnstore table
INSERT INTO user_activity VALUES ....;
-- Query a columnstore table in PostgreSQL
SELECT * FROM user_activity LIMIT 5;
- Table Semantics: Columnstore tables support transactional and batch inserts, updates, and deletes, as well as joins with regular PostgreSQL tables.
- DuckDB Execution: Run analytic queries up to 1000x faster than on regular PostgreSQL tables, with performance similar to DuckDB on Parquet.
- Iceberg and Delta Lake: Columnstore tables are stored as Parquet files with Iceberg or Delta metadata, allowing external engines (e.g., Snowflake, DuckDB, Pandas, Polars) to query them as native tables.
You can install pg_mooncake using our Docker image, from source, or on Neon Postgres.
To quickly get a PostgreSQL instance with pg_mooncake extension up and running, pull and run the latest Docker image:
docker pull mooncakelabs/pg_mooncake
docker run --name mooncake-demo -e POSTGRES_HOST_AUTH_METHOD=trust -d mooncakelabs/pg_mooncake
This will start a PostgreSQL 17 instance with pg_mooncake extension. You can then connect to it using psql:
docker run -it --rm --link mooncake-demo:postgres mooncakelabs/pg_mooncake psql -h postgres -U postgres
You can compile and install pg_mooncake extension to add it to your PostgreSQL instance. PostgreSQL versions 15, 16, and 17 are currently supported.
git submodule update --init --recursive
make release
make install
To quickly install the pg_mooncake extension on Neon, create a Neon project and run the following command from the Neon SQL Editor or a connected SQL client such as psql before you enable the extension.
SET neon.allow_unstable_extensions='true';
CREATE EXTENSION pg_mooncake;
-
Run Analytics on Live Application Data in PostgreSQL Run transactions on columnstore tables, ensuring up-to-date analytics without managing individual Parquet files.
-
Write PostgreSQL Tables to Your Lake or Lakehouse Make application data accessible as native tables for data engineering and science outside of PostgreSQL, without complex ETL, CDC or stitching of files.
-
Query and Update existing Lakehouse Tables Natively in PostgreSQL (coming soon) Connect existing Lakehouse catalogs and expose them directly as columnstore tables in PostgreSQL.
This will be where your columnstore tables are stored. If no S3 configurations are specified, these tables will be created in your local file system.
Note: If you are using pg_mooncake on Neon, you will need to bring your own S3 bucket for now. We’re working to improve this DX.
-- for S3 buckets
SELECT mooncake.create_secret('<name>', 'S3', '<key_id>', '<secret>', '{"REGION": "<s3-region>"}');
-- for R2 buckets
SELECT mooncake.create_secret('<name>', 'S3', '<key_id>', '<secret>', '{"ENDPOINT":"<ACCOUNT_ID>.r2.cloudflarestorage.com/"}');
SET mooncake.default_bucket = 's3://<bucket>';
SET mooncake.enable_local_cache = false; -- (if you are using Neon)
Create a columnstore table:
CREATE TABLE user_activity(
user_id BIGINT,
activity_type TEXT,
activity_timestamp TIMESTAMP,
duration INT
) USING columnstore;
Insert data:
INSERT INTO user_activity VALUES
(1, 'login', '2024-01-01 08:00:00', 120),
(2, 'page_view', '2024-01-01 08:05:00', 30),
(3, 'logout', '2024-01-01 08:30:00', 60),
(4, 'error', '2024-01-01 08:13:00', 60);
SELECT * FROM user_activity;
You can also insert data directly from a parquet file (can also be from S3):
COPY user_activity FROM '<parquet_file>'
Update and delete rows:
UPDATE user_activity
SET activity_timestamp = '2024-01-01 09:50:00'
WHERE user_id = 3 AND activity_type = 'logout';
DELETE FROM user_activity
WHERE user_id = 4 AND activity_type = 'error';
SELECT * from user_activity;
Run transactions:
BEGIN;
INSERT INTO user_activity VALUES
(5, 'login', '2024-01-02 10:00:00', 200),
(6, 'login', '2024-01-02 10:30:00', 90);
ROLLBACK;
SELECT * FROM user_activity;
Run aggregates and groupbys:
SELECT
user_id,
activity_type,
SUM(duration) AS total_duration,
COUNT(*) AS activity_count
FROM
user_activity
GROUP BY
user_id, activity_type
ORDER BY
user_id, activity_type;
Joins with regular PostgreSQL tables:
CREATE TABLE users (
user_id BIGINT,
username TEXT,
email TEXT
);
INSERT INTO users VALUES
(1,'alice', '[email protected]'),
(2, 'bob', '[email protected]'),
(3, 'charlie', '[email protected]');
SELECT * FROM users u
JOIN user_activity a ON u.user_id = a.user_id;
Find path where the columnstore table was created in:
SELECT * FROM mooncake.columnstore_tables;
Create a Polars dataframe directly from this table:
import polars as pl
from deltalake import DeltaTable
delta_table_path = '<path>'
delta_table = DeltaTable(delta_table_path)
df = pl.DataFrame(delta_table.to_pyarrow_table())
- Support for local file system and Object Store (S3)
- Transactional select, insert, copy, updates, and deletes
- Real-time and mini-batch inserts
- Join with regular Postgres tables
- Write Delta Lake format
- Directly insert Parquet files into columnstore tables
- Write Iceberg format
- Secondary indexes and constraints (primary, unique, and foreign keys)
- Read existing Iceberg or Delta Lake tables
- Partitioned tables
Caution
pg_mooncake is currently in preview and actively under development. For inquiries about production use cases, please reach out to us on our Slack community! Some functionality may not continue to be supported and some key features are not yet implemented.
pg_mooncake is the first project from our open-source software studio, Mooncake Labs. Mooncake is building a managed lakehouse with a clean Postgres and Python experience. With Mooncake, developers ship analytics and AI to their applications without complex ETL, CDC and pipelines.
This project is licensed under the MIT License. See the LICENSE file for details.