Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize FilterExec::statistics / don't ignore errors #7553

Closed
alamb opened this issue Sep 13, 2023 · 2 comments · Fixed by #7793
Closed

Optimize FilterExec::statistics / don't ignore errors #7553

alamb opened this issue Sep 13, 2023 · 2 comments · Fixed by #7793
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Sep 13, 2023

Is your feature request related to a problem or challenge?

This came up from looking at the backgrace in #7522 where we saw a significant amount of planning time going to creating DataFusionError which were then thrown away.

I tracked this down to creating the error (which is ignored) here:

https://github.com/apache/arrow-datafusion/blob/c47ae0d13dbc9474897609f3dea4f80a93cf4de0/datafusion/physical-plan/src/filter.rs#L206-L209

Describe the solution you'd like

I would like less time being spent creating erorrs that are thrown away. I can see two potential ways to do this:

  1. Stop ignoring the error and instead fix whatever the underlying issue is
  2. Cache the calculation of the statistics so it doesn't get called multiple times

Both might be appropriate

Describe alternatives you've considered

No response

Additional context

cc @berkaysynnada as I believe this came in #6982

I don't think this is a critical fix, but I wanted to file it given I had tracked it down

@liukun4515
Copy link
Contributor

liukun4515 commented Sep 14, 2023

I also find this thing when I do this pr #7428
The statistics can't be passed or pass an error result when meet error in the process.

I think the implementation of the filter to get statistics has some issues:
From the
https://github.com/apache/arrow-datafusion/blob/c47ae0d13dbc9474897609f3dea4f80a93cf4de0/datafusion/physical-plan/src/filter.rs#L208
https://github.com/apache/arrow-datafusion/blob/c47ae0d13dbc9474897609f3dea4f80a93cf4de0/datafusion/physical-plan/src/filter.rs#L194
when meet the error or abnormal case, just return the default statistics.

I think we should change the implementation and return the child statistics If we can't get or calculate the right statistics.

@liukun4515
Copy link
Contributor

  1. Stop ignoring the error and instead fix whatever the underlying issue is

I think this is the root solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants