discussion: Support creating lakehouse catalog in risingwave. #8603

liurenjie1024 · 2023-03-16T11:36:44Z

Is your feature request related to a problem? Please describe.

Currently, before risingwave can sink to iceberg, we need to rely on some external system(flink, spark, etc) to create table for us. There are two drawbacks with this problem:

It would be quite inconvenient for users to try our solution. For example, if user want just want to store everything in s3, they need to download flink, hadoop, and get familiar with flink SQL to create catalog/tables.
DDLs in two different systems is quite an error prone and misleading.

Describe the solution you'd like

Support creating iceberg catalog in risingwave as following:

CREATE CATALOG fs_catalog WITH (
  'type'='iceberg',
  'catalog-type'='hadoop',
  'warehouse' = 'file:///Users/renjieliu/Downloads/a'
);

USE fs_catalog;

CREATE DATABASE demo_db;

USE demo_db;

CREATE TABLE demo_table (
  v1 int,
  v2 bigint,
  v3 varchar
);

With this enabled, user can ingesting our solution to iceberg in one shot without any other dependencies.

Describe alternatives you've considered

No response

Additional context

No response

StrikeW · 2023-03-16T12:01:15Z

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

neverchanje · 2023-03-16T12:14:09Z

The reason why RisingWave supports Iceberg is to fit into users' existing tech stack where they have used Spark and Iceberg already. Iceberg is not a out-of-the-box system that you can use without any other dependencies. It's still required a separate process to run compaction, serve ad-hoc queries, and to manage catalogs. It will be non-trivial to build a management system for Iceberg. Therefore, from the product's perspective, I would view lakehouse as a completely distinct product line that RisingWave should only provide minimal functions to integrate, instead of combining them together.

Furthermore, you also need to consider what if the user's Iceberg is hosted by Tabular, Dremio, or any full-fledged lakehouse. It will then be unnecessary to create the lake in RisingWave.

liurenjie1024 · 2023-03-17T02:00:33Z

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

I think reading without optimization will not require much effort and is worth doing. DML statement maybe complicated to implement, so I don't think we should do it.

liurenjie1024 · 2023-03-17T02:10:36Z

The reason why RisingWave supports Iceberg is to fit into users' existing tech stack where they have used Spark and Iceberg already. Iceberg is not a out-of-the-box system that you can use without any other dependencies. It's still required a separate process to run compaction, serve ad-hoc queries, and to manage catalogs. It will be non-trivial to build a management system for Iceberg. Therefore, from the product's perspective, I would view lakehouse as a completely distinct product line that RisingWave should only provide minimal functions to integrate, instead of combining them together.

Agreed. However, our goal is not to build a fully managed solution for Iceberg. Rather, we want to make RisingWave easier to use and experiment with. Adding support for DDL statements will simplify integration tests, as it eliminates the need to download other systems like Spark, Flink, or Hadoop to create a catalog for us. This will also make it easier for beginners who want to try out RisingWave with Iceberg. This is similar to other DML statements in our system, which are not intended for production use, but rather make tests and experimentation easier. cc @neverchanje

fuyufjh · 2023-03-20T15:57:35Z

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

I think reading without optimization will not require much effort and is worth doing. DML statement maybe complicated to implement, so I don't think we should do it.

@StrikeW's comment is also my major concern. If we support these CREATE CATALOG ... commands as you mentioned, it looks like RisingWave has a full support for Iceberg including DDL & CRUD, but that's not the truth.

github-actions · 2023-05-21T02:00:12Z

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

liurenjie1024 added type/feature needs-discussion labels Mar 16, 2023

github-actions bot added this to the release-0.1.18 milestone Mar 16, 2023

liurenjie1024 removed this from the release-0.18 milestone Mar 21, 2023

BugenZhao mentioned this issue May 16, 2023

frontend: refactor source schema resolution #9828

Open

github-actions bot added the no-issue-activity label May 21, 2023

liurenjie1024 closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussion: Support creating lakehouse catalog in risingwave. #8603

discussion: Support creating lakehouse catalog in risingwave. #8603

liurenjie1024 commented Mar 16, 2023

StrikeW commented Mar 16, 2023 •

edited

Loading

neverchanje commented Mar 16, 2023

liurenjie1024 commented Mar 17, 2023

liurenjie1024 commented Mar 17, 2023 •

edited

Loading

fuyufjh commented Mar 20, 2023

github-actions bot commented May 21, 2023

discussion: Support creating lakehouse catalog in risingwave. #8603

discussion: Support creating lakehouse catalog in risingwave. #8603

Comments

liurenjie1024 commented Mar 16, 2023

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

StrikeW commented Mar 16, 2023 • edited Loading

neverchanje commented Mar 16, 2023

liurenjie1024 commented Mar 17, 2023

liurenjie1024 commented Mar 17, 2023 • edited Loading

fuyufjh commented Mar 20, 2023

github-actions bot commented May 21, 2023

StrikeW commented Mar 16, 2023 •

edited

Loading

liurenjie1024 commented Mar 17, 2023 •

edited

Loading