Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12236][SQL] JDBC filter tests all pass if filters are not really pushed down #10221

Closed
wants to merge 1 commit into from

Conversation

HyukjinKwon
Copy link
Member

https://issues.apache.org/jira/browse/SPARK-12236
Currently JDBC filters are not tested properly. All the tests pass even if the filters are not pushed down due to Spark-side filtering.

In this PR,
Firstly, I corrected the tests to properly check the pushed down filters by removing Spark-side filtering.
Also, != was being tested which is actually not pushed down. So I removed them.
Lastly, I moved the stripSparkFilter() function to SQLTestUtils as this functions would be shared for all tests for pushed down filters. This function would be also shared with ORC datasource as the filters for that are also not being tested properly.

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47418 has finished for PR 10221 at commit 3294b31.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

@liancheng Would you like to look through this? it is related with the filter tests.

@holdenk
Copy link
Contributor

holdenk commented Dec 14, 2015

This looks reasonable to me, although I'm a bit confused by the addition of some similar code in #9687 - it seems just having a single shared strip filter utility function makes sense (is that the eventual plan)?

@HyukjinKwon
Copy link
Member Author

Oh yes. That is the eventual plan. I will share that function. I opended some PRs before other PRs are closed. So, I ended up with adding the same function to another PR.

@HyukjinKwon
Copy link
Member Author

@holdenk Actually, would you merge this PR if it looks good?
Four other PRs are having a bit of troubles like you just said due to this PR and I would like to correct them as soon as this PR is meged.

@holdenk
Copy link
Contributor

holdenk commented Dec 15, 2015

Can't merge it, but we could ask @marmbrus or @liancheng to take a look if they have the bandwidth.

@marmbrus
Copy link
Contributor

Why not just implement unhandledFilters?

@HyukjinKwon
Copy link
Member Author

@marmbrus I saw that Jira ticket (for unhandledFilters) before. I thought It looks for internal datsources guys agree with using Spark side filter as it is for now.

I found this problem while testing about that Jira ticket (for unhandledFilters) and decided to test in this way for the internal datasources for now and maybe remove this way when adding unhandledFilters.

It looks adding unhandledFilters is reasonable but could we probably let the existing tests working properly first? Actually testing in this way for Parquet is already merged #9659.

I would like to add that later (maybe right after correcting all filter tests) with a task with subtasks for Parquet, ORC and JDBC datasources if it is acceptable.

Otherwise, if it sounds unreasonable, then I will try to add unhandledFilters here.

@HyukjinKwon
Copy link
Member Author

Actually, we might still need such function even after adding unhandledFilters although the logics might be a bit modified (maybe, at least, to test if the produced DataFrame is wrapped with Spark filter or not by checking relations or actual plans) because it would still pass all tests even if "unhandled" filters are pushed down which causes the DataFrame wrapped with Spark filter just like != was being tested with Spark filter.

@marmbrus
Copy link
Contributor

Fair enough, I guess we can probably commit this as is and do the improvement in another PR.

@marmbrus
Copy link
Contributor

Merging to master.

@asfgit asfgit closed this in 2811265 Dec 16, 2015
@HyukjinKwon
Copy link
Member Author

Thanks!
I filed the issue about the implementation of unhandledFilter here https://issues.apache.org/jira/browse/SPARK-12354.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants