-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23799][SQL] FilterEstimation.evaluateInSet produces devision by zero in a case of empty table with analyzed statistics #21052
Changes from 5 commits
297395e
d634dda
74b6ebd
0faa789
8d21488
8369cbc
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -382,4 +382,32 @@ class StatisticsCollectionSuite extends StatisticsCollectionTestBase with Shared | |
} | ||
} | ||
} | ||
|
||
test("Simple queries must be working, if CBO is turned on") { | ||
withSQLConf(SQLConf.CBO_ENABLED.key -> "true") { | ||
withTable("TBL1", "TBL") { | ||
import org.apache.spark.sql.functions._ | ||
val df = spark.range(1000L).select('id, | ||
'id * 2 as "FLD1", | ||
'id * 12 as "FLD2", | ||
lit("aaa") + 'id as "fld3") | ||
df.write | ||
.mode(SaveMode.Overwrite) | ||
.bucketBy(10, "id", "FLD1", "FLD2") | ||
.sortBy("id", "FLD1", "FLD2") | ||
.saveAsTable("TBL") | ||
sql("ANALYZE TABLE TBL COMPUTE STATISTICS ") | ||
sql("ANALYZE TABLE TBL COMPUTE STATISTICS FOR COLUMNS ID, FLD1, FLD2, FLD3") | ||
val df2 = spark.sql( | ||
""" | ||
SELECT t1.id, t1.fld1, t1.fld2, t1.fld3 | ||
FROM tbl t1 | ||
JOIN tbl t2 on t1.id=t2.id | ||
WHERE t1.fld3 IN (-123.23,321.23) | ||
""".stripMargin) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nit: """
|SELECT t1.id, t1.fld1, t1.fld2, t1.fld3
|FROM tbl t1
|JOIN tbl t2 on t1.id=t2.id
|WHERE t1.fld3 IN (-123.23,321.23)
""".stripMargin) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
df2.createTempView("TBL2") | ||
sql("SELECT * FROM tbl2 WHERE fld3 IN ('qqq', 'qwe') ").explain() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do not use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
} | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why
colStat.min.isEmpty || colStat.max.isEmpty
means empty output? string type always has no max/minThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we need to correct it in the next PR