-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22280][SQL][TEST] Improve StatisticsSuite to test convertMetastore
properly
#19500
Conversation
Test build #82767 has finished for PR 19500 at commit
|
Hi, @gatorsmile . |
sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS") | ||
checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500)) | ||
Seq("orc", "parquet").foreach { format => | ||
Seq("true", "false").foreach { isConverted => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We prefer to using Seq(true, false)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for review, @gatorsmile . Sure.
checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None) | ||
sql(s"ANALYZE TABLE $orcTable COMPUTE STATISTICS") | ||
checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = Some(500)) | ||
Seq("orc", "parquet").foreach { format => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "orcTbl" and "parquetTbl" are better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used in STORED AS $format
, too. This is a minimal. :)
} | ||
withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> "true") { | ||
// We still can get tableSize from Hive before Analyze | ||
checkTableStats(orcTable, hasSizeInBytes = true, expectedRowCounts = None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why orc table has size before analyze command while parquet table does not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is old test case by SPARK-17410.
The original test case before SPARK-17410 wasn't and my new test case didn't.
It's due to convertMetastoreXXX
. Hive INSERT will generate stats.
sql(s"CREATE TABLE $format (key STRING, value STRING) STORED AS $format") | ||
sql(s"INSERT INTO TABLE $format SELECT * FROM src") | ||
|
||
val hasHiveStats = !isConverted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can just inline this val
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. Thank you for review, @wzhfy .
Test build #82779 has finished for PR 19500 at commit
|
Test build #82809 has finished for PR 19500 at commit
|
Hi, @gatorsmile . |
LGTM |
Thanks! Merged to master. |
Thank you so much, @gatorsmile and @wzhfy . :D |
What changes were proposed in this pull request?
This PR aims to improve StatisticsSuite to test
convertMetastore
configuration properly. Currently, some test logic intest statistics of LogicalRelation converted from Hive serde tables
depends on the default configuration. New test case is shorter and covers both(true/false) cases explicitly.This test case was previously modified by SPARK-17410 and SPARK-17284 in Spark 2.3.0.
How was this patch tested?
Pass the Jenkins with the improved test case.