Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Expr::GetIndexedField and GetFieldAccess and always use function get_field for indexing #10374

Closed
jayzhan211 opened this issue May 4, 2024 · 6 comments · Fixed by #10568
Assignees
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

Is your feature request related to a problem or challenge?

Discussion starts from #10181 (comment)

Describe the solution you'd like

As title mentioned, use function instead of Expr

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor

alamb commented May 8, 2024

One potential downside as @westonpace mentioned somewhere I can't find now, is that systems that want to look for field accesses for their own analysis (e.g. to find all nested field references) would find it harder to do so

@westonpace
Copy link
Member

I think it would actually be easier. The easiest was the old way where we have only Expr::GetIndexedField and get_field doesn't exist (no functions). However, that ship has already sailed (the functions are valuable because they allow extensibility).

Today, I have to account for both paths. We use the SQL parser as our input. I don't know (off the top of my head) if it uses Expr or get_field and it doesn't really matter because, if I don't account for both, it will inevitably change which one it uses (murphy's law).

So having one path, even if it is a slightly less obvious path, is better than having two paths.

@jayzhan211
Copy link
Contributor Author

@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step.
I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.

Have you thought about "user-defined" parser idea before, the way that user can define their own expression to get from the syntax? Is it useful in production? 🤔

@alamb
Copy link
Contributor

alamb commented May 14, 2024

@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.

I agree that changing the parser to insert a call to get field access directly is a good idea (and would be consistent and allow us to remove Expr::GetFieldAccess

Have you thought about "user-defined" parser idea before, the way that user can define their own expression to get from the syntax? Is it useful in production? 🤔

One thing I have thought about is changing the hard coded lookup of function names from a pattern like this

if let Some(udf) = self.context_provider.get_function_meta("make_array") {
Ok(Expr::ScalarFunction(ScalarFunction::new_udf(udf, values)))
} else {
not_impl_err!(
"array_expression featrue is disable, So should implement make_array UDF by yourself"
)
}

To be something more structured

pub trait PlannerFunctions {
  /// return the UDF to use for creating arrays ("make_array") by default:
  fn make_array(&self) -> Result<ScalarUDF>;
...
  // similar functions for other built in functions
}

And then instead of

 if let Some(udf) = self.context_provider.get_function_meta("make_array") { 
     Ok(Expr::ScalarFunction(ScalarFunction::new_udf(udf, values))) 
 } else { 
     not_impl_err!( 
         "array_expression featrue is disable, So should implement make_array UDF by yourself" 
     ) 
 } 

The planner might look like

  let udf = self.planner_functions.make_array()?;
  Ok(Expr::ScalarFunction(ScalarFunction::new_udf(udf, values))) 

But I haven't had a usecase to do that myself yet

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented May 14, 2024

@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.

The array operator to function is syntax like array1 || array2 -> array_concat, which is in ArrayFunctionRewriter now, so I'm thinking about whether we should move this to the parser or not.

@alamb
Copy link
Contributor

alamb commented May 15, 2024

@alamb Do you think we should also rewrite the array operator to function in parser step? It is currently in optimizer step. I think the downside of moving array rewrite in parser step is that if any user expects different array function with the same syntax, then they can't do it since we don't have "user-defined" parser mechanism now. But the benefit is that we can eliminate intermediate binary expression.

The array operator to function is syntax like array1 || array2 -> array_concat, which is in ArrayFunctionRewriter now, so I'm thinking about whether we should move this to the parser or not.

Let's move the discussion to #10534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
3 participants