Materialize is a streaming database powered by Timely and Differential Dataflow, purpose-built for low-latency applications. It lets you use SQL to ask complex questions about your data, and then updates the results of your SQL queries incrementally as the underlying data changes.
We are rolling out Early Access to the new, cloud-native version of Materialize. Sign up to get on the list! 🚀
Materialize is designed to help you interactively explore your streaming data, perform analytics against live relational data, or increase data freshness while reducing the load of your dashboard and monitoring tasks. The moment you need a refreshed answer, you can get it in milliseconds.
It focuses on providing correct and consistent answers with minimal latency, and does not ask you to accept either approximate answers or eventual consistency. Whenever Materialize answers a query, that answer is the correct result on some specific (and recent) version of your data. Materialize does all of this by recasting your SQL queries as dataflows, which can react efficiently to changes in your data as they happen.
We support a large fraction of PostgreSQL, and are actively working on supporting more built-in PostgreSQL functions. Please file an issue if something doesn't work as expected!
Materialize can read data from Kafka and Redpanda, as well as directly from a PostgreSQL replication stream. It also supports regular database tables to which you can insert, update, and delete rows.
Once you've got the data in, define views and perform reads via the PostgreSQL protocol. Use your favorite PostgreSQL CLI, including the psql
you probably already have on your system.
Materialize supports a comprehensive variety of SQL features, all using the PostgreSQL dialect and protocol:
- Joins, Joins, Joins! Materialize supports multi-column join conditions, multi-way joins, self-joins, cross-joins, inner joins, outer joins, etc.
- Delta-joins avoid intermediate state blowup compared to systems that can only plan nested binary joins - tested on joins of up to 64 relations.
- Support for subqueries. Materialize's SQL optimizer performs subquery decorrelation out-of-the-box, avoiding the need to manually rewrite subqueries into joins.
- Materialize supports streams that contain CDC data (currently supporting the Debezium format). Materialize can incrementally maintain views in the presence of arbitrary inserts, updates, and deletes. No asterisks.
- All the aggregations.
GROUP BY
,MIN
,MAX
,COUNT
,SUM
,STDDEV
,HAVING
, etc. ORDER BY
LIMIT
DISTINCT
- JSON support in the PostgreSQL dialect including operators and functions like
->
,->>
,@>
,?
,jsonb_array_element
,jsonb_each
. Materialize automatically plans lateral joins for efficientjsonb_each
support. - Nest views on views on views!
- Multiple views that have overlapping subplans can share underlying indices for space and compute efficiency, so just declaratively define what you want, and we'll worry about how to efficiently maintain them.
Here's an example join query that works fine in Materialize, TPC-H
query 15:
-- Views define commonly reused subqueries.
CREATE VIEW revenue (supplier_no, total_revenue) AS
SELECT
l_suppkey,
SUM(l_extendedprice * (1 - l_discount))
FROM
lineitem
WHERE
l_shipdate >= DATE '1996-01-01'
AND l_shipdate < DATE '1996-01-01' + INTERVAL '3' month
GROUP BY
l_suppkey;
-- The MATERIALIZED keyword is the trigger to begin
-- eagerly, consistently, and incrementally maintaining
-- results that are stored directly in durable storage.
CREATE MATERIALIZED VIEW tpch_q15 AS
SELECT
s_suppkey,
s_name,
s_address,
s_phone,
total_revenue
FROM
supplier,
revenue
WHERE
s_suppkey = supplier_no
AND total_revenue = (
SELECT
max(total_revenue)
FROM
revenue
)
ORDER BY
s_suppkey;
-- Creating an index keeps results always up to date and in memory.
-- In this example, the index will allow for fast point lookups of
-- individual supply keys.
CREATE INDEX tpch_q15_idx ON tpch_q15 (s_suppkey);
Stream inserts, updates, and deletes on the underlying tables (lineitem
and supplier
), and Materialize keeps the materialized view incrementally updated. You can type SELECT * FROM tpch_q15
and expect to see the current results immediately!
Pull based: Use any PostgreSQL-compatible driver in any language/environment to make SELECT
queries against your views. Tell them they're talking to a PostgreSQL database, they don't ever need to know otherwise.
Push based: Listen to changes directly using SUBSCRIBE
or configure Materialize to stream results to a Kafka topic as soon as the views change.
If you want to use an ORM, chat with us. They're surprisingly tricky.
Check out our documentation.
Materialize is source-available and licensed under the BSL 1.1, converting to the open-source Apache 2.0 license after 4 years. As stated in the BSL, Materialize is free forever on a single node.
Materialize is also available as a paid cloud service with additional features such as high availability via multi-active replication.
Materialize depends upon many open source Rust crates. We maintain a list of these crates and their licenses, including links to their source repositories.
Materialize is primarily written in Rust.
Developers can find docs at doc/developer, and Rust API documentation is hosted at https://dev.materialize.com/api/rust/. The Materialize development roadmap is divided up into roughly month-long milestones, and managed in GitHub.
Contributions are welcome. Prospective code contributors might find the good first issue tag useful. We value all contributions equally, but bug reports are more equal.
Materialize is lovingly crafted by a team of developers and one bot. Join us.