-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-19115] [SQL] Supporting Create External Table Like Location #16638
Conversation
Could you follow the title requirement in http://spark.apache.org/contributing.html? |
@@ -58,6 +58,7 @@ import org.apache.spark.util.Utils | |||
case class CreateTableLikeCommand( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the comment of this class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok,i will update it later,Thanks!
We have a few test cases you can follow. Please create test cases. Thanks! |
In the PR, you might need to consider more scenarios. For example, let me ask a question. How does Hive behave when the specified location is not empty? |
Here is the differences between Hive and Spark2.x as follow: 2.Spark2.x: So,this PR follows the spark2.x‘s design rules,thanks! |
First, please change the PR title to |
Let me rephrase it. If the directory specified in the |
Please keep updating your PR description. For example, this PR is not relying on |
I am sorry that I did't grasp the key points of your question. In Hive, if there are data files under the specified path while creating an external table, then Hive will identify the files as table data files. |
* }}} | ||
*/ | ||
override def visitCreateTableLike(ctx: CreateTableLikeContext): LogicalPlan = withOrigin(ctx) { | ||
val targetTable = visitTableIdentifier(ctx.target) | ||
val sourceTable = visitTableIdentifier(ctx.source) | ||
CreateTableLikeCommand(targetTable, sourceTable, ctx.EXISTS != null) | ||
val location = Option(ctx.locationSpec).map(visitLocationSpec) | ||
if (ctx.EXTERNAL != null && location.isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment above this line:
// If we are creating an EXTERNAL table, then the LOCATION field is required
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll do it later, Thanks!
val location = Option(ctx.locationSpec).map(visitLocationSpec) | ||
if (ctx.EXTERNAL != null && location.isEmpty) { | ||
operationNotAllowed("CREATE EXTERNAL TABLE LIKE must be accompanied by LOCATION", ctx) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the other reviewers, we are following what we did in visitCreateHiveTable
@ouyangxiaochen Please do not duplicate the test cases. Try to combine them. @cloud-fan @yhuai Could you please check whether such a DDL support is desirable? |
ok to test. |
Test build #71887 has finished for PR 16638 at commit
|
…ala file. 2. repair the error for test cases in HiveDDLSuite.scala file, sql statements lost a pair of single quotes.
I have fixed the error of test cases and they run successfully. So,please run the test cases again.Thanks a lot! @SparkQA |
@@ -81,8 +81,8 @@ statement | |||
rowFormat? createFileFormat? locationSpec? | |||
(TBLPROPERTIES tablePropertyList)? | |||
(AS? query)? #createHiveTable | |||
| CREATE TABLE (IF NOT EXISTS)? target=tableIdentifier | |||
LIKE source=tableIdentifier #createTableLike | |||
| CREATE EXTERNAL? TABLE (IF NOT EXISTS)? target=tableIdentifier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since Spark 2.2, we wanna hide the manage/external concept from users. It looks reasonable to add a LOCATION
statement in CREATE TABLE LIKE
, but do we really need the EXTERNAL
keyword? We don't need to be exactly same with hive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok then let's simplify the logic: if location
is specified, we create an external table internally. Else, create managed table.
Test build #71904 has finished for PR 16638 at commit
|
ping @ouyangxiaochen : ) |
Happy Chinese New Year ! @gatorsmile |
ping @gatorsmile |
please address #16638 (comment) |
2.simplify the logic: if location is specified, we create an external table internally. Else, create managed table 3.update test cases
@@ -51,13 +51,14 @@ import org.apache.spark.util.Utils | |||
* | |||
* The syntax of using this command in SQL is: | |||
* {{{ | |||
* CREATE TABLE [IF NOT EXISTS] [db_name.]table_name | |||
* LIKE [other_db_name.]existing_table_name | |||
* CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no EXTERNAL
@@ -518,8 +518,8 @@ class HiveDDLCommandSuite extends PlanTest with SQLTestUtils with TestHiveSingle | |||
|
|||
test("create table like") { | |||
val v1 = "CREATE TABLE table1 LIKE table2" | |||
val (target, source, exists) = parser.parsePlan(v1).collect { | |||
case CreateTableLikeCommand(t, s, allowExisting) => (t, s, allowExisting) | |||
val (target, source, location, exists) = parser.parsePlan(v1).collect { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add an assert to check location
is empty
@@ -528,8 +528,8 @@ class HiveDDLCommandSuite extends PlanTest with SQLTestUtils with TestHiveSingle | |||
assert(source.table == "table2") | |||
|
|||
val v2 = "CREATE TABLE IF NOT EXISTS table1 LIKE table2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add one more test case to check CREATE TABLE LIKE with location
TableIdentifier(targetTabName, Some("default"))) | ||
|
||
checkCreateTableLike(sourceTable, targetTable) | ||
test("CREATE TABLE LIKE a temporary view [LOCATION]...") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we don't need to change the test name
checkCreateTableLike(sourceTable, targetTable) | ||
test("CREATE TABLE LIKE a temporary view [LOCATION]...") { | ||
var createdTableType = "MANAGED" | ||
for ( i <- 0 to 1 ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can create a method with parameter location: Option[String]
, instead of writing a for loop with 2 iterations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I write this for the purpose of reusing this piece of public code, because the basic logic of these two scenarios are almost the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
creating a method and wrap this piece of code can also reuse the code.
private def checkCreateTableLike( | ||
sourceTable: CatalogTable, | ||
targetTable: CatalogTable, | ||
tableType: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pass in a CataogTableType
instead of a string?
please resolve the conflict too, thanks! |
Test build #72550 has finished for PR 16638 at commit
|
I met some troubles when I resolving the conflict, So can u give me some guidances? Thanks a lot! @cloud-fan |
you can start with a new branch and apply the changes manually, e.g. copy code from this PR to the new branch. |
Test build #72576 has finished for PR 16638 at commit
|
Should I delete my remote master repository firstly,and fork a new one again? @cloud-fan |
Your master is clean (i.e., exactly identical to the upstream/master), right? |
My master branch with the master of Apache is not synchronized, and then I did the pull operation, my master branch still not synchronized, and finally I removed my remote repository. |
You might not be familiar with the Github/Git. How about submitting a new PR? : ) |
Here's how I create a PR: |
You do not need to do the step 1 every time. You might miss the following two steps when you want to resolve your conflicts.
|
Oh, I See, I miss a step ’git remote add upstream ...‘. |
No worry, open/submit a new PR. : ) |
You might be able to make it by forcefully pushing the new changes by |
OK. I'll try it immediately. Thank U very much! |
I have created a PR at https://github.com/apache/spark/pull/16868, please review it, Thanks! @gatorsmile @cloud-fan |
Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238
This pr proposed to close stale PRs. Currently, we have 400+ open PRs and there are some stale PRs whose JIRA tickets have been already closed and whose JIRA tickets does not exist (also, they seem not to be minor issues). // Open PRs whose JIRA tickets have been already closed Closes apache#11785 Closes apache#13027 Closes apache#13614 Closes apache#13761 Closes apache#15197 Closes apache#14006 Closes apache#12576 Closes apache#15447 Closes apache#13259 Closes apache#15616 Closes apache#14473 Closes apache#16638 Closes apache#16146 Closes apache#17269 Closes apache#17313 Closes apache#17418 Closes apache#17485 Closes apache#17551 Closes apache#17463 Closes apache#17625 // Open PRs whose JIRA tickets does not exist and they are not minor issues Closes apache#10739 Closes apache#15193 Closes apache#15344 Closes apache#14804 Closes apache#16993 Closes apache#17040 Closes apache#15180 Closes apache#17238 N/A Author: Takeshi Yamamuro <[email protected]> Closes apache#17734 from maropu/resolved_pr. Change-Id: Id2e590aa7283fe5ac01424d30a40df06da6098b5
What changes were proposed in this pull request?
Support CREATE [EXTERNAL] TABLE LIKE LOCATION... syntax for Hive tables.
In this PR,we follow SparkSQL design rules :
How was this patch tested?
Add new test cases and update existing test cases