-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-20194] Add support for partition pruning to in-memory catalog #17510
Conversation
ok to test |
_.references.map(_.name).toSet.subsetOf(partitionColumnNames) | ||
} | ||
if (nonPartitionPruningPredicates.nonEmpty) { | ||
sys.error("Expected only partition pruning predicates: " + nonPartitionPruningPredicates) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Throwing an AnalysisException
is preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the negative test cases in your newly added test cases?
@@ -436,6 +438,37 @@ abstract class ExternalCatalogSuite extends SparkFunSuite with BeforeAndAfterEac | |||
assert(catalog.listPartitions("db2", "tbl2", Some(Map("a" -> "unknown"))).isEmpty) | |||
} | |||
|
|||
test("list partitions by filter") { | |||
val tz = TimeZone.getDefault().getID() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: val tz = TimeZone.getDefault.getID
Not related to this PR. It sounds like we have a bug in |
Test build #75461 has finished for PR 17510 at commit
|
Test build #75464 has finished for PR 17510 at commit
|
LGTM cc @cloud-fan @hvanhovell @ericl |
if (predicates.nonEmpty) { | ||
val clientPrunedPartitions = client.getPartitionsByFilter(rawTable, predicates).map { part => | ||
val clientPrunedPartitions = | ||
client.getPartitionsByFilter(rawTable, predicates).map { part => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if predicates.isEmpty
, the previous code will run client.getPartitions
. Can you double check there is no performance regression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A similar optimization is done in the function itself: client.getPartitionsByFilter(), while Hive Shim_v0_12 delegates to getAllPartitions
anyway.
I've now made the only piece of non-trivial code along that path lazy, so I think we're good.
val catalog = newBasicCatalog() | ||
|
||
def checkAnswer(table: CatalogTable, filters: Seq[Expression], | ||
expected: Set[CatalogTablePartition]): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: code style
def checkAnswer(
param1: XX, param2: XX, ...
LGTM except some minor comments |
Test build #75479 has finished for PR 17510 at commit
|
Thanks! Merging to master. |
Thanks for moving so fast! |
What changes were proposed in this pull request?
This patch implements
listPartitionsByFilter()
forInMemoryCatalog
and thus resolves an outstanding TODO causing thePruneFileSourcePartitions
optimizer rule not to apply when "spark.sql.catalogImplementation" is set to "in-memory" (which is the default).The change is straightforward: it extracts the code for further filtering of the list of partitions returned by the metastore's
getPartitionsByFilter()
out fromHiveExternalCatalog
intoExternalCatalogUtils
and calls this new function fromInMemoryCatalog
on the whole list of partitions.Now that this method is implemented we can always pass the
CatalogTable
to theDataSource
inFindDataSourceTable
, so that the latter is resolved to a relation with aCatalogFileIndex
, which is what thePruneFileSourcePartitions
rule matches for.How was this patch tested?
Ran existing tests and added new test for
listPartitionsByFilter
inExternalCatalogSuite
, which is subclassed by bothInMemoryCatalogSuite
andHiveExternalCatalogSuite
.