-
Notifications
You must be signed in to change notification settings - Fork 603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(risingwave): init impl for Risingwave #7954
feat(risingwave): init impl for Risingwave #7954
Conversation
Just a couple views on your questions, didn't answer everything:
I assume you could just include it in the backend docs, perhaps similar to how you have some notes for Oracle here: https://ibis-project.org/backends/oracle#connecting-to-older-oracle-databases
Yes, it would require relevant Docker Compose configuration be added to |
@deepyaman Hi Deepyaman, can you help invite more reviewers to push this PR forward. It seems that I have no access to invite. |
@KeXiangWang Thanks for the PR and welcome to ibis 👋🏻, exciting work! I'll respond to your comments and questions inline.
We'll get into this a bit later, but ideally we don't need to crowd source testing because we have the backend test suite to help us out!
Sounds good! FYI we'll be moving away from sqlalchemy in two releases (version 9.0.0) and we'll be using sqlglot instead. In the case of risingwave I suspect that it will maintain a similar relationship to the postgres implementation, which is that a some of or a good chunk of the code can be shared.
Since RisingWave would be the first backend to support this, I suggest we avoid a new DDL object (these are in need of refactoring) for the time being and simply implement a
Totally fine.
I think this is probably best communicated in our documentation.
Strap in 😂 Generally speaking the tests in There are also usually backend-specific tests, those go in To run tests you'd write:
This will setup the data and find all the right tests (every test under The other piece of the puzzle is running RisingWave itself. I understand y'all publish a container, so that needs to be set up in our When the PR is closer to ready (local tests all passing for at least one person 😅) we can roll it into CI, which in theory won't be that different from any of our other service-based backends like Trino or ClickHouse. Excited to have you on board! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, from a maintainability perspective, I'm wondering if RisingWave intends to continue to maintain/increase compatibility with Postgres? @KeXiangWang If so, would it make sense to try to reuse functions and classes from the postgres backend as much as possible so that the same code doesn't have to be maintained in two places?
Hi @cpcloud . Thank you for the detailed explanation. I now understand how to enhance this PR and will make the necessary improvements ASAP. |
faa655a
to
bf5cb57
Compare
1dbaac2
to
30cd17b
Compare
😮 Extremely impressive, you've already got the backend working in CI (there's an XPASS, but that's generally good news and easily addressed by removing |
5c40347
to
534374e
Compare
All the tests have passed now. Can we first merge this PR? I prefer to continue on the critical features like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good overall, @KeXiangWang !
I'd like to not merge in a bunch of commented-out tests. They should either be removed (especially if they've duplicative of the tests in the general backend test suite) or xfailed using the appropriate markers.
can we include this in the v8 release assuming it is merged today or tomorrow? that would be better from a "we're announcing streaming support in Ibis" perspective, but also not critical (if it doesn't get in we'll just announce it's in progress for RisingWave) |
I'm going to add the risingwave sqlalchemy dialect to conda-forge so we have the option of releasing a conda-forge version of this backend sooner rather than later |
Thanks! Anything we can help? |
Hey @KeXiangWang -- I'm adding you as a feedstock maintainer, so if you can comment in conda-forge/staged-recipes#25146 saying that you agree to be a maintainer that will move this along. |
Got it. It seems the name in conda-forge/staged-recipes#25146 is incorrect. My github handle is KeXiangWang instead of KeiXiangWang |
Yeah, typo on my part, should be fixed now! |
I'm able to run the backend test suite locally 🎉 I'm going to push up a few changes:
I'll rebase on |
@KeXiangWang Can you enable allowing maintainers to edit the pull request? |
After removing the |
Sorry I cannot find a button for that. Alternatively, can you push the updates to a new branch? I can first rebase my branch on it, and then push the updates to this PR. |
You should see it on the right sidebar of the PR; I checked on one of mine. |
I came across the same page. It's weird but there's no such a button for this PR.🙁 |
Actually, I think I'll just follow up with the changes in a separate PR. @KeXiangWang Can you rebase on |
cfa40e8
to
37825e0
Compare
Done |
Thanks @KeXiangWang great work! |
This is the initial PR for ibis to support risingwave. [RisingWave](https://github.com/risingwavelabs/risingwave) is a distributed SQL streaming database engineered to provide the simplest and most cost-efficient approach for processing and managing streaming data with utmost reliability. After a few weeks of investigation, here's my phasic results. 1. As Risingwave is largely compatible with Postgres, Ibis can be easily extended to support Risingwave. The main work is to add a new dialect for Risingwave, which is similar to the `postgres` dialect. I've almost finished this part in this PR. With this PR, Ibis can be used to connect to Risingwave and run some basic queries. I've manually tested some queries which works well and add some tests imitating Postgres Backend. I would appreciate if you can help test more queries. 2. As ibis relies on SQLalchemy to support Postgres, we follow its implementation to support Risingwave. However, there are also some differences in semantics between Risingwave and Postgres, which require some modification in either ibis or SQLalchemy. `sqlalchemy-risingwave` is designed to reduce this mismatch. So in this PR, I introduce the new dependencies `sqlalchemy-risingwave` to ibis. 3. Ibis has no support for Materialized View natively. However Materialized View is a core concept in RW, people use RW because of its convenient auto-updated Materialized View. Now, if a user wants to create a new MV, he needs to use a raw SQL. Adding DDLs like `CreateMaterializedView`, `CreateSource` and `CreateSink` in the [base ddl file](https://github.com/ibis-project/ibis/blob/main/ibis/backends/base/sql/ddl.py) may help. We would appreciate it if you can help offer some suggestions. Besides this, I also met some obstacle that may need your help. 1. Risingwave hasn't supported `TEMPORARY VIEW` yet, so I changed some implementations relying on `TEMPORARY VIEW` to a normal view. For example, for the `_metadata()` functions, RW backend's implementation is `con.exec_driver_sql(f"CREATE VIEW IF NOT EXISTS {name} AS {query}")`. While in PG it's `con.exec_driver_sql(f"CREATE TEMPORARY VIEW {name} AS {query}")`. Do you have any suggestions for other ways to work around this? Besides, I didn't quite get what `_metadata()` is doing here. I would appreciate it if you could explain it a bit. 2. There's some mismatch between the `postgres` dialect and `risingwave` dialect, which are still not fully tested in this PR. We'll continue to work on it. 3. ~~This PR requires some new features of Risingwave v1.6.0 and sqlalchemy-risingwave v1.0.0 which are not released yet. They'll be released soon.~~ Done. BTW, how should I indicate this backend is only for risingwave > 1.6? 4. I don't quite understand the test pipeline of Ibis. I copied the test cases from the `postgres` dialect and modified them to fit the `risingwave` dialect, and some of them are commented temporarily due to the lack of support. I also added an SQL script to help set up the test environment, which creates tables and loads data. But I don't know how to run it in the test pipeline. Any suggestions or guidance are welcomed. I suppose the test pipeline would require a docker image of Risingwave. We can provide one if needed. 5. I'm a newbie in the ibis community, this PR may not be perfect considering others. Any comments are welcomed and I sincerely appreciate your time and patience. closes ibis-project#8038
This is the initial PR for ibis to support risingwave. [RisingWave](https://github.com/risingwavelabs/risingwave) is a distributed SQL streaming database engineered to provide the simplest and most cost-efficient approach for processing and managing streaming data with utmost reliability. After a few weeks of investigation, here's my phasic results. 1. As Risingwave is largely compatible with Postgres, Ibis can be easily extended to support Risingwave. The main work is to add a new dialect for Risingwave, which is similar to the `postgres` dialect. I've almost finished this part in this PR. With this PR, Ibis can be used to connect to Risingwave and run some basic queries. I've manually tested some queries which works well and add some tests imitating Postgres Backend. I would appreciate if you can help test more queries. 2. As ibis relies on SQLalchemy to support Postgres, we follow its implementation to support Risingwave. However, there are also some differences in semantics between Risingwave and Postgres, which require some modification in either ibis or SQLalchemy. `sqlalchemy-risingwave` is designed to reduce this mismatch. So in this PR, I introduce the new dependencies `sqlalchemy-risingwave` to ibis. 3. Ibis has no support for Materialized View natively. However Materialized View is a core concept in RW, people use RW because of its convenient auto-updated Materialized View. Now, if a user wants to create a new MV, he needs to use a raw SQL. Adding DDLs like `CreateMaterializedView`, `CreateSource` and `CreateSink` in the [base ddl file](https://github.com/ibis-project/ibis/blob/main/ibis/backends/base/sql/ddl.py) may help. We would appreciate it if you can help offer some suggestions. Besides this, I also met some obstacle that may need your help. 1. Risingwave hasn't supported `TEMPORARY VIEW` yet, so I changed some implementations relying on `TEMPORARY VIEW` to a normal view. For example, for the `_metadata()` functions, RW backend's implementation is `con.exec_driver_sql(f"CREATE VIEW IF NOT EXISTS {name} AS {query}")`. While in PG it's `con.exec_driver_sql(f"CREATE TEMPORARY VIEW {name} AS {query}")`. Do you have any suggestions for other ways to work around this? Besides, I didn't quite get what `_metadata()` is doing here. I would appreciate it if you could explain it a bit. 2. There's some mismatch between the `postgres` dialect and `risingwave` dialect, which are still not fully tested in this PR. We'll continue to work on it. 3. ~~This PR requires some new features of Risingwave v1.6.0 and sqlalchemy-risingwave v1.0.0 which are not released yet. They'll be released soon.~~ Done. BTW, how should I indicate this backend is only for risingwave > 1.6? 4. I don't quite understand the test pipeline of Ibis. I copied the test cases from the `postgres` dialect and modified them to fit the `risingwave` dialect, and some of them are commented temporarily due to the lack of support. I also added an SQL script to help set up the test environment, which creates tables and loads data. But I don't know how to run it in the test pipeline. Any suggestions or guidance are welcomed. I suppose the test pipeline would require a docker image of Risingwave. We can provide one if needed. 5. I'm a newbie in the ibis community, this PR may not be perfect considering others. Any comments are welcomed and I sincerely appreciate your time and patience. closes ibis-project#8038
Risingwave plays a role as a computing engine and storage system in Streaming ecosystems. Usually, a streaming workload will include two/three systems like this: `upstream data source --> Risingwave` Or, `upstream data source --> Risingwave --> downstream data sink` This PR adds streaming support, mainly new relations related to streaming, for the Risingwave backend. To be specific, Four types of new relations are introduced: - `Source`, data sources that encompass a connector connected to an upstream data system like Kafka. A source works like an extraction or a placeholder for an upstream system, and although it has columns, it does not store any data itself. - `Materialized View`, which is Risingwave's core concept for streaming. One can create an MV with a query on existing tables or sources. The data in MV is automatically updated in a real-time way. Users can access the data with a `select` statement just like accessing a table. - `Table` with connector, works like a combination of a source and a normal table. The difference between `Table` with connector and `Source` is that, once the table is created, it will automatically start to consume data from upstream systems and update it into Risingwave. - `Sink`, unlink a `Materialized View` which stores the result of a query in RW, users can choose to sink the result to a downstream system, e.g. Redis, and then read the data in the downstream system. Unlike an MV, User cannot access the data directly from a sink. Some minor changes: 1. Bump the image version to a new nightly build, as the new image solves some issues found in the [previous PR](#7954). 2. Mark one test, `test_semi_join`, as skipped for risingwave backend. The test sometimes stuck and it seems to be Risingwave's fault. I'll continue to investigate.
Description of changes
This is the initial PR for ibis to support risingwave.
RisingWave is a distributed SQL streaming database engineered to provide the simplest and most cost-efficient approach for processing and managing streaming data with utmost reliability.
After a few weeks of investigation, here's my phasic results.
As Risingwave is largely compatible with Postgres, Ibis can be easily extended to support Risingwave. The main work is to add a new dialect for Risingwave, which is similar to the
postgres
dialect. I've almost finished this part in this PR. With this PR, Ibis can be used to connect to Risingwave and run some basic queries. I've manually tested some queries which works well and add some tests imitating Postgres Backend. I would appreciate if you can help test more queries.As ibis relies on SQLalchemy to support Postgres, we follow its implementation to support Risingwave. However, there are also some differences in semantics between Risingwave and Postgres, which require some modification in either ibis or SQLalchemy.
sqlalchemy-risingwave
is designed to reduce this mismatch. So in this PR, I introduce the new dependenciessqlalchemy-risingwave
to ibis.Ibis has no support for Materialized View natively. However Materialized View is a core concept in RW, people use RW because of its convenient auto-updated Materialized View. Now, if a user wants to create a new MV, he needs to use a raw SQL. Adding DDLs like
CreateMaterializedView
,CreateSource
andCreateSink
in the base ddl file may help. We would appreciate it if you can help offer some suggestions.Besides this, I also met some obstacle that may need your help.
Risingwave hasn't supported
TEMPORARY VIEW
yet, so I changed some implementations relying onTEMPORARY VIEW
to a normal view. For example, for the_metadata()
functions, RW backend's implementation iscon.exec_driver_sql(f"CREATE VIEW IF NOT EXISTS {name} AS {query}")
. While in PG it'scon.exec_driver_sql(f"CREATE TEMPORARY VIEW {name} AS {query}")
. Do you have any suggestions for other ways to work around this?Besides, I didn't quite get what
_metadata()
is doing here. I would appreciate it if you could explain it a bit.There's some mismatch between the
postgres
dialect andrisingwave
dialect, which are still not fully tested in this PR. We'll continue to work on it.This PR requires some new features of Risingwave v1.6.0 and sqlalchemy-risingwave v1.0.0 which are not released yet. They'll be released soon.Done. BTW, how should I indicate this backend is only for risingwave > 1.6?I don't quite understand the test pipeline of Ibis. I copied the test cases from the
postgres
dialect and modified them to fit therisingwave
dialect, and some of them are commented temporarily due to the lack of support. I also added an SQL script to help set up the test environment, which creates tables and loads data. But I don't know how to run it in the test pipeline. Any suggestions or guidance are welcomed. I suppose the test pipeline would require a docker image of Risingwave. We can provide one if needed.I'm a newbie in the ibis community, this PR may not be perfect considering others. Any comments are welcomed and I sincerely appreciate your time and patience.
closes #8038