-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iceberg): support iceberg engine table (in local env) #19577
Conversation
let catalog_writer = session.catalog_writer()?; | ||
// TODO(iceberg): make iceberg engine table creation ddl atomic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a critical issue... especially if we create tables first, before source and sink. This is because table is self-contained, while create source
employs a validation stage to check whether the upstream system really work, so it has a high chance to fail. Shall we create the source/sink before table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some dependencies here. To create an iceberg source, we first need to have an iceberg table. To create an iceberg table we need to create an iceberg sink (with create_table_if_not_exists
). To create an iceberg sink we need to create a hummock table first. Finally, we have this order hummock table -> iceberg sink -> iceberg source
let catalog_writer = session.catalog_writer()?; | ||
// TODO(iceberg): make iceberg engine table creation ddl atomic | ||
catalog_writer | ||
.create_table(source, table, graph, job_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the source
is passed as well, can it work? 🤔
What I am thinking is that, for a common table with connector, the corresponding source
is supposed to generate changes that will be applied to the table
. But here the Iceberg source is just for batch read, which I think is actually irrelevant/unconnected to the iceberg table internally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The source here is something like kafka
and postgres cdc
connector instead of the iceberg source. For example
create table t (a int) with (connector = 'kafka' ...) engine = iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for Cargo.lock
meta_store_database.clone() | ||
) | ||
} | ||
MetaBackend::Sqlite | MetaBackend::Sql | MetaBackend::Mem => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetaBackend::Sql
will be widely adopted since #19560. Shall we support it as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I previously thought it was deprecated. For iceberg jdbc right now, we need to know the underlying database implementation to choose the right driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can simply add a jdbc:
prefix to the database URL? 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When getting the endpoint from meta, it has already been converted into a form with postgres:
, mysql:
as prefixes, including configure as MetaBackend::Sql
. So MetaBackend::Sql
should be unreachable.
Unrelated to this PR, 🤔 I think we'd better use sql
config in risedev
only for testing purpose. To support scenarios where user and password contain special characters, it's best to specify them separately through env in both production environment and cloud. That's the reason why a subdivided backend was introduced in #17530 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the underlying database is Oracle or SQL Server, I think that's acceptable. However, I still want to verify the underlying database type. For instance, SQLite is not a suitable catalog for Iceberg. Concurrent updates to SQLite by both the metadata service and Iceberg can easily cause SQLite to become unresponsive.
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
SupportUse./risedev d iceberg-engine
to set up a local environment to run the iceberg engine table../risedev d full
directly.This PR will retrieve meta backend connection info and S3 warehouse info from the environment variable for simplicity, but we will improve it by fetching this info from meta in a later PR.enable_config_load
field to load credentials from the default credential provided chain.Example
Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.