-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-14014][SQL] Integrate session catalog (attempt #2) #11938
Conversation
## What changes were proposed in this pull request? `SessionCatalog`, introduced in apache#11750, is a catalog that keeps track of temporary functions and tables, and delegates metastore operations to `ExternalCatalog`. This functionality overlaps a lot with the existing `analysis.Catalog`. As of this commit, `SessionCatalog` and `ExternalCatalog` will no longer be dead code. There are still things that need to be done after this patch, namely: - SPARK-14013: Properly implement temporary functions in `SessionCatalog` - SPARK-13879: Decide which DDL/DML commands to support natively in Spark - SPARK-?????: Implement the ones we do want to support through `SessionCatalog`. - SPARK-?????: Merge SQL/HiveContext ## How was this patch tested? This is largely a refactoring task so there are no new tests introduced. The particularly relevant tests are `SessionCatalogSuite` and `ExternalCatalogSuite`. Author: Andrew Or <[email protected]> Author: Yin Huai <[email protected]> Closes apache#11836 from andrewor14/use-session-catalog.
Test build #2679 has started for PR 11938 at commit |
Test build #54060 has finished for PR 11938 at commit
|
Test build #2678 has finished for PR 11938 at commit
|
Test build #2680 has finished for PR 11938 at commit
|
Test build #2677 has finished for PR 11938 at commit
|
Test build #2681 has finished for PR 11938 at commit
|
Test build #2686 has started for PR 11938 at commit |
Test build #2687 has finished for PR 11938 at commit
|
Test build #2684 has finished for PR 11938 at commit
|
Test build #2685 has finished for PR 11938 at commit
|
Test build #2683 has finished for PR 11938 at commit
|
Test build #54095 has finished for PR 11938 at commit
|
Test build #2690 has started for PR 11938 at commit |
Test build #2689 has finished for PR 11938 at commit
|
Test build #2691 has finished for PR 11938 at commit
|
Test build #54126 has finished for PR 11938 at commit
|
Test build #2692 has finished for PR 11938 at commit
|
Test build #2693 has finished for PR 11938 at commit
|
Test failure was not related. The latest commit passed tests 4 times in a row so I'm going to merge this. |
@andrewor14 in the future we should make the pr description the same as the previous one, with your latest addendum. Otherwise it is very difficult to track once it is committed. |
private[this] var currentDb = "default" | ||
protected[this] var currentDb = { | ||
val defaultName = "default" | ||
val defaultDbDefinition = CatalogDatabase(defaultName, "default database", "", Map()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, @andrewor14 I have a very late comment. For a default database, the locationUri
is an empty string? Is that allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this error when trying to create such a database:
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Hive case the default database should already exist in the metastore, so this won't do anything, right? This is mainly for the in-memory case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The currentDb
is also shared by HiveSessionCatalog
. I think we could just override this variable in HiveSessionCatalog
and directly return the defaultName = "default"
without attempting to create a database:
https://github.com/apache/spark/pull/11938/files#diff-b3f9800839b9b9a1df9da9cbfc01adf8R53
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the HiveSessionCatalog
, we can get the default path by:
override def getDefaultPath: String = client.getConf(HiveConf.ConfVars.METASTOREWAREHOUSE.varname,
HiveConf.ConfVars.METASTOREWAREHOUSE.defaultStrVal)
It might not be useful in this case. I am just trying to implement the CreateDatabase
DDL.
@@ -71,7 +71,8 @@ case class DropTable( | |||
} | |||
hiveContext.invalidateTable(tableName) | |||
hiveContext.runSqlHive(s"DROP TABLE $ifExistsClause$tableName") | |||
hiveContext.sessionState.catalog.unregisterTable(TableIdentifier(tableName)) | |||
hiveContext.sessionState.catalog.dropTable( | |||
TableIdentifier(tableName), ignoreIfNotExists = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrewor14 @yhuai Sorry, I have a question about this.
When doing Create/Alter/Drop Database
, the changes in my PR affect both SQLContext
and HiveContext
. Basically, we only call the related APIs of sessionState.catalog
.
After reading this, I am wondering if we also need to call runSqlHive
for "Create/Alter/Drop Database"? Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, SessionCatalog
calls HiveCatalog
internally so the state will be stored in the metastore
What changes were proposed in this pull request?
This reopens #11836, which was merged but promptly reverted because it introduced flaky Hive tests.
How was this patch tested?
See
CatalogTestCases
,SessionCatalogSuite
andHiveContextSuite
.