Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-11677][SQL] ORC filter tests all pass if filters are actually not pushed down. #9687

Closed
wants to merge 6 commits into from

Conversation

HyukjinKwon
Copy link
Member

Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.

@HyukjinKwon HyukjinKwon changed the title [SPARK-11677][SQL ]ORC filter tests all pass if filters are actually not pushed down. [SPARK-11677][SQL] ORC filter tests all pass if filters are actually not pushed down. Nov 13, 2015
@SparkQA
Copy link

SparkQA commented Nov 13, 2015

Test build #45845 has finished for PR 9687 at commit 82d0aa7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 16, 2015

Test build #45965 has finished for PR 9687 at commit cd7bd12.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

Several things to mention.

Firstly, I wonder if it is okay to put extractSourceRDDToDataFrame at QueryTest. I did not put as I thought the purpose of QueryTest is not for that and also I did not want to create another class for single shared function. However, I think the extractSourceRDDToDataFrame can be shared with ParquetFilterSuite #9659.

Secondly, I originally wanted to add OrcFilterSuite separately in order to test actual filter evaluation; however, I decided not to do it here (I will do in a separate issue or followup PR) but just let the original test way work properly first.

@HyukjinKwon
Copy link
Member Author

cc @liancheng

@HyukjinKwon
Copy link
Member Author

In this commit, I renamed the function extractSourceRDDToDataFrame to stripSparkFilter.

@HyukjinKwon
Copy link
Member Author

Can I add the function stripSparkFilter to SQLTestUtils for ParquetFilterSuite and OrcQuerySuite (and possibly OrcFilterSuite I will make)?

+JDBCDatasource also needs to be properly tested with the function (filed in Jira and created PR).

@SparkQA
Copy link

SparkQA commented Dec 9, 2015

Test build #47395 has finished for PR 9687 at commit 18c0326.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@HyukjinKwon
Copy link
Member Author

The function I mentioned is moved to SQLTestUtils in another PR. So I will add a commit for this soon.

@HyukjinKwon
Copy link
Member Author

In the commits above, I removed the stripSparkFilter function to share this.

@HyukjinKwon
Copy link
Member Author

retest this please

@SparkQA
Copy link

SparkQA commented Dec 16, 2015

Test build #47785 has finished for PR 9687 at commit ce9706c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * case class WrapOption(child: Expression, optType: DataType)\n

@marmbrus
Copy link
Contributor

Thanks, merging to master.

@asfgit asfgit closed this in 9657ee8 Dec 16, 2015
@HyukjinKwon HyukjinKwon deleted the SPARK-11677 branch September 23, 2016 18:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants