Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8379][SQL]avoid speculative tasks write to the same file #6833

Closed
wants to merge 3 commits into from

Conversation

jeanlyn
Copy link
Contributor

@jeanlyn jeanlyn commented Jun 16, 2015

The issue link SPARK-8379
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): 
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo 
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53 
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57

This pr try to write the data to temporary dir when using dynamic parition avoid the speculative tasks writing the same file

@@ -197,7 +197,6 @@ case class InsertIntoHiveTable(
table.hiveQlTable.getPartCols().foreach { entry =>
orderedPartitionSpec.put(entry.getName, partitionSpec.get(entry.getName).getOrElse(""))
}
val partVals = MetaStoreUtils.getPvals(table.hiveQlTable.getPartCols, partitionSpec)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code seems never use,so remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i think you are right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanlyn Yeah, this should be removed.

@chenghao-intel
Copy link
Contributor

Seems the bug only existed the dynamic partition in HiveContext, @jeanlyn can you confirm that?

@scwf
Copy link
Contributor

scwf commented Jun 17, 2015

also met this issue when dynamic partition in HiveContext

@jeanlyn
Copy link
Contributor Author

jeanlyn commented Jun 17, 2015

@chenghao-intel ,I think it only affect the dynamic partition.Because SparkHadoopWriter get the write by OutputFormat.getRecordWriter,most of them use the FileOutputFormat.getTaskOutputPath to get the path

@andrewor14
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Jun 17, 2015

Test build #35053 has finished for PR 6833 at commit 64bbfab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

ok to test. Is this issue the same as the one reported in #6864? @liancheng

@SparkQA
Copy link

SparkQA commented Jun 18, 2015

Test build #35171 has finished for PR 6833 at commit 64bbfab.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@liancheng
Copy link
Contributor

@andrewor14 They are not the same. #6864 affects dynamic partitioning feature of external data sources, while this one is about dynamic partitions of Hive.

@liancheng
Copy link
Contributor

LGTM, thanks for fixing this! Merging to master and branch-1.4.

asfgit pushed a commit that referenced this pull request Jun 21, 2015
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <[email protected]>

Closes #6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649)
Signed-off-by: Cheng Lian <[email protected]>
@asfgit asfgit closed this in a1e3649 Jun 21, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 22, 2015
The issue link [SPARK-8379](https://issues.apache.org/jira/browse/SPARK-8379)
Currently,when we insert data to the dynamic partition with speculative tasks we will get the Exception
```
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
Lease mismatch on /tmp/hive-jeanlyn/hive_2015-06-15_15-20-44_734_8801220787219172413-1/-ext-10000/ds=2015-06-15/type=2/part-00301.lzo
owned by DFSClient_attempt_201506031520_0011_m_000189_0_-1513487243_53
but is accessed by DFSClient_attempt_201506031520_0011_m_000042_0_-1275047721_57
```
This pr try to write the data to temporary dir when using dynamic parition  avoid the speculative tasks writing the same file

Author: jeanlyn <[email protected]>

Closes apache#6833 from jeanlyn/speculation and squashes the following commits:

64bbfab [jeanlyn] use FileOutputFormat.getTaskOutputPath to get the path
8860af0 [jeanlyn] remove the never using code
e19a3bd [jeanlyn] avoid speculative tasks write same file

(cherry picked from commit a1e3649)
Signed-off-by: Cheng Lian <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants