-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add example for writing an AnalyzerRule
#10855
Comments
Perhaps @goldmedal is interested in this / has some example code to share |
I think we have an existing example showing how to create an AnalyzerRule in
However, it only works with the analyzer and the optimizer. Maybe we can enhance it after #10849 to show how to apply the custom rules for the end-to-end DataFusion query flow. |
Currently, in my personal work, I just use it like let ctx = SessionContext::new();
let new_state = ctx
.state()
.add_analyzer_rule(Arc::new(ModelAnalyzeRule::new()))
.add_analyzer_rule(Arc::new(ModelGenerationRule::new()));
let new_ctx = SessionContext::new_with_state(new_state);
// create a plan to run a SQL query
let df = new_ctx.sql("SELECT * FROM datafusion.default.orders").await?; After the API updated, I guess it would be let ctx = SessionContext::new();
ctx.add_analyzer_rule(Arc::new(ModelAnalyzeRule::new()))
.add_analyzer_rule(Arc::new(ModelGenerationRule::new()));
// create a plan to run a SQL query
let df = ctx.sql("SELECT * FROM datafusion.default.orders").await?; After #10849 is done, if no one else is working on it, I think I can help with it. |
AnalyzerRule
I have some time on a plane today that I may use to try and write up this example. I was inspired by some discussion I had this week |
Hi @alamb, I just want to share how I use user-defined AnalyzerRule. As you know, I'm working on reimplementing the semantic layer engine, wren-engine, for LLM using DataFusion. The project is still a work in progress. However, I think I have finished the part of integration with DataFusion. I think it could be a nice use case for The basic concept of wren-engine is that user can define a virtual modeling layer to apply on their physical data. select * from wrenai.default.customers_model Then , wren engine translate it to a physical query: SELECT
"customers_model"."city",
"customers_model"."id",
"customers_model"."state"
FROM
(
SELECT
"customers_model"."city",
"customers_model"."id",
"customers_model"."state"
FROM
(
SELECT
"datafusion"."public"."customers"."city" AS "city",
"datafusion"."public"."customers"."id" AS "id",
"datafusion"."public"."customers"."state" AS "state"
FROM
"public"."customers"
) AS "customers_model"
) AS "customers_model" It's a simple use case to show how it works. We have other features, such as The related PR, Canner/wren-engine#613, is still under review. However, you can find the usage of Thanks to DataFusion for providing such amazing features, allowing me to implement a more stable and structured converter for the semantic layer. |
Thank you @goldmedal - this is a great hint. I have some basic analyzer rule coded up but maybe your example is a better idea. I have a PR in progress that I need to polish up and then will ping you for review |
Here is my proposed analyzer rule example: #11089 It isn't quite as fancy as the wren example, but I think it is an understandable enough example |
Is your feature request related to a problem or challenge?
We have an example for writing a user defined optimizer rule in
datafusion/datafusion/core/tests/user_defined/user_defined_plan.rs
Line 251 in 3773fb7
However, we don't have a corresponding example for writing a user defined AnalyzerRule which also means the APIs for using them are complicated (see #10849 for example)
Describe the solution you'd like
The idea example I think would be ti Add a file to https://github.com/apache/datafusion/tree/3773fb7fb54419f889e7d18b73e9eb48069eb08e/datafusion-examples
user_defined_analyzer.rs
Perhaps the example could show how to create an
AnalyzerRule
that replaced an experssion likea / b
with a function callsafe_div(a, b)
wheresave_div
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: