Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: Support creating lakehouse catalog in risingwave. #8603

Closed
liurenjie1024 opened this issue Mar 16, 2023 · 6 comments
Closed

discussion: Support creating lakehouse catalog in risingwave. #8603

liurenjie1024 opened this issue Mar 16, 2023 · 6 comments

Comments

@liurenjie1024
Copy link
Contributor

Is your feature request related to a problem? Please describe.

Currently, before risingwave can sink to iceberg, we need to rely on some external system(flink, spark, etc) to create table for us. There are two drawbacks with this problem:

  1. It would be quite inconvenient for users to try our solution. For example, if user want just want to store everything in s3, they need to download flink, hadoop, and get familiar with flink SQL to create catalog/tables.
  2. DDLs in two different systems is quite an error prone and misleading.

Describe the solution you'd like

Support creating iceberg catalog in risingwave as following:

CREATE CATALOG fs_catalog WITH (
  'type'='iceberg',
  'catalog-type'='hadoop',
  'warehouse' = 'file:///Users/renjieliu/Downloads/a'
);

USE fs_catalog;

CREATE DATABASE demo_db;

USE demo_db;

CREATE TABLE demo_table (
  v1 int,
  v2 bigint,
  v3 varchar
);

With this enabled, user can ingesting our solution to iceberg in one shot without any other dependencies.

Describe alternatives you've considered

No response

Additional context

No response

@StrikeW
Copy link
Contributor

StrikeW commented Mar 16, 2023

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

@neverchanje
Copy link
Contributor

The reason why RisingWave supports Iceberg is to fit into users' existing tech stack where they have used Spark and Iceberg already. Iceberg is not a out-of-the-box system that you can use without any other dependencies. It's still required a separate process to run compaction, serve ad-hoc queries, and to manage catalogs. It will be non-trivial to build a management system for Iceberg. Therefore, from the product's perspective, I would view lakehouse as a completely distinct product line that RisingWave should only provide minimal functions to integrate, instead of combining them together.

Furthermore, you also need to consider what if the user's Iceberg is hosted by Tabular, Dremio, or any full-fledged lakehouse. It will then be unnecessary to create the lake in RisingWave.

@liurenjie1024
Copy link
Contributor Author

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

I think reading without optimization will not require much effort and is worth doing. DML statement maybe complicated to implement, so I don't think we should do it.

@liurenjie1024
Copy link
Contributor Author

liurenjie1024 commented Mar 17, 2023

The reason why RisingWave supports Iceberg is to fit into users' existing tech stack where they have used Spark and Iceberg already. Iceberg is not a out-of-the-box system that you can use without any other dependencies. It's still required a separate process to run compaction, serve ad-hoc queries, and to manage catalogs. It will be non-trivial to build a management system for Iceberg. Therefore, from the product's perspective, I would view lakehouse as a completely distinct product line that RisingWave should only provide minimal functions to integrate, instead of combining them together.

Agreed. However, our goal is not to build a fully managed solution for Iceberg. Rather, we want to make RisingWave easier to use and experiment with. Adding support for DDL statements will simplify integration tests, as it eliminates the need to download other systems like Spark, Flink, or Hadoop to create a catalog for us. This will also make it easier for beginners who want to try out RisingWave with Iceberg. This is similar to other DML statements in our system, which are not intended for production use, but rather make tests and experimentation easier. cc @neverchanje

@fuyufjh
Copy link
Member

fuyufjh commented Mar 20, 2023

After DDL is supported, do you plan to support reading and writing to the demo_table (lakehouse tables) directly in RisingWave?

I think reading without optimization will not require much effort and is worth doing. DML statement maybe complicated to implement, so I don't think we should do it.

@StrikeW's comment is also my major concern. If we support these CREATE CATALOG ... commands as you mentioned, it looks like RisingWave has a full support for Iceberg including DDL & CRUD, but that's not the truth.

@github-actions
Copy link
Contributor

This issue has been open for 60 days with no activity. Could you please update the status? Feel free to continue discussion or close as not planned.

@liurenjie1024 liurenjie1024 closed this as not planned Won't fix, can't repro, duplicate, stale May 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants