[SPARK-2585] remove unnecessary broadcast for conf #2935

davies · 2014-10-24T21:31:53Z

We already broadcast the task (RDD and closure) itself, so some small data used in RDD or closure do not needed to be broadcasted explicitly any more.

SparkQA · 2014-10-24T21:34:50Z

Test build #22167 has started for PR 2935 at commit b4cd73e.

This patch merges cleanly.

SparkQA · 2014-10-24T21:38:27Z

Test build #22167 has finished for PR 2935 at commit b4cd73e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-24T21:38:28Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22167/
Test FAILed.

JoshRosen · 2014-10-24T23:00:03Z

Due to a Hadoop thread-safety issue in Configuration's constructor, we need to hold a lock in any code that might call new Configuration() on the executor. We don't want to hold the lock for the entire task deserialization because that won't allow tasks to be launched in parallel. Instead, though, we could make our own wrapper that holds a Configuration and grabs CONFIGURATION_INSTANTIATION_LOCK in its readObject or readExternal method.

For some context on this issue, see #2683 and https://issues.apache.org/jira/browse/SPARK-2585

JoshRosen · 2014-10-24T23:00:50Z

BTW, maybe you meant to link to https://issues.apache.org/jira/browse/SPARK-4083? I think this is a duplicate of https://issues.apache.org/jira/browse/SPARK-2585

JoshRosen · 2014-10-24T23:03:34Z

Also, adding our own synchronizing wrapper will let us roll back some of the complexity introduced by #2684 for ensuring thread-safety, since each task will get its own deserialized copy of the configuration.

I suppose that this could have a small performance penalty because we'll always construct a new Configuration (which might be expensive), but I think it should be pretty minimal (we can try measuring it) and is probably offset by other performance improvements in 1.2.

By the way, CONFIGURATION_INSTANTIATION_LOCK should probably be moved to SparkHadoopUtil so that it's accessible from more places that might create Configurations.

JoshRosen · 2014-10-25T01:33:20Z

Oh, one more comment: could you add a newConfiguration() method to SparkHadoopUtil that synchronizes on the lock and returns new Configuration()? This would help to simplify some of the code in the streaming WAL patch which needs to create new configurations.

davies · 2014-10-25T01:59:28Z

But inside readFields(), it may call new Configuration(), so we still need to synchronize it here.

JoshRosen · 2014-10-25T02:17:26Z

Yeah, we'll still need the synchronization there, but I was hoping to add the newConfiguration() to address the use-cases in Streaming, where they construct new configurations.

SparkQA · 2014-10-25T05:09:50Z

QA tests have started for PR 2935 at commit bc46dda.

This patch does not merge cleanly.

davies · 2014-10-25T05:18:06Z

core/src/main/scala/org/apache/spark/SerializableWritable.scala

-    ow.readFields(in)
+    SparkHadoopUtil.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
+      ow.setConf(SparkHadoopUtil.newConfiguration())
+      ow.readFields(in)  // not thread safe


SparkQA · 2014-10-25T05:27:23Z

Test build #22194 has started for PR 2935 at commit 8694cb3.

This patch merges cleanly.

davies · 2014-10-25T05:27:59Z

@JoshRosen this PR is ready to review, thanks!

SparkQA · 2014-10-25T05:30:08Z

QA tests have started for PR 2935 at commit bc46dda.

This patch does not merge cleanly.

SparkQA · 2014-10-25T05:34:54Z

Test build #22195 has started for PR 2935 at commit 32bd815.

This patch merges cleanly.

AmplabJenkins · 2014-10-25T05:37:19Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22193/
Test FAILed.

SparkQA · 2014-10-25T06:22:05Z

QA tests have finished for PR 2935 at commit bc46dda.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-25T06:22:09Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22192/
Test FAILed.

SparkQA · 2014-10-25T06:37:03Z

QA tests have finished for PR 2935 at commit bc46dda.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2014-10-25T06:37:38Z

Test build #22201 has started for PR 2935 at commit 1fd70df.

This patch merges cleanly.

SparkQA · 2014-10-25T08:37:39Z

Test build #22201 timed out for PR 2935 at commit 1fd70df after a configured wait of 120m.

AmplabJenkins · 2014-10-25T08:37:41Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22201/
Test FAILed.

SparkQA · 2014-10-25T08:39:30Z

Test build #432 timed out for PR 2935 at commit 32bd815 after a configured wait of 120m.

SparkQA · 2014-10-25T18:05:42Z

Test build #448 has started for PR 2935 at commit 1fd70df.

This patch merges cleanly.

SparkQA · 2014-10-25T18:07:52Z

Test build #449 has started for PR 2935 at commit 1fd70df.

This patch merges cleanly.

SparkQA · 2014-10-25T20:05:43Z

Test build #448 timed out for PR 2935 at commit 1fd70df after a configured wait of 120m.

SparkQA · 2014-10-25T20:07:52Z

Test build #449 timed out for PR 2935 at commit 1fd70df after a configured wait of 120m.

davies · 2014-10-25T20:52:57Z

Jenkins, test this please.

SparkQA · 2014-10-25T20:59:44Z

Test build #22221 has started for PR 2935 at commit 1fd70df.

This patch merges cleanly.

SparkQA · 2014-10-25T21:24:44Z

Test build #22222 has started for PR 2935 at commit ae32e92.

This patch merges cleanly.

SparkQA · 2014-10-25T22:59:44Z

Test build #22221 timed out for PR 2935 at commit 1fd70df after a configured wait of 120m.

AmplabJenkins · 2014-10-25T22:59:48Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22221/
Test FAILed.

SparkQA · 2014-10-25T23:20:05Z

Test build #22222 has finished for PR 2935 at commit ae32e92.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

AmplabJenkins · 2014-10-25T23:20:08Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22222/
Test FAILed.

SparkQA · 2014-10-26T08:12:12Z

Test build #452 has started for PR 2935 at commit ae32e92.

This patch merges cleanly.

SparkQA · 2014-10-26T08:20:03Z

Test build #22240 has started for PR 2935 at commit 63f2972.

This patch merges cleanly.

SparkQA · 2014-10-26T10:12:12Z

Test build #452 timed out for PR 2935 at commit ae32e92 after a configured wait of 120m.

SparkQA · 2014-10-26T10:20:03Z

Test build #22240 timed out for PR 2935 at commit 63f2972 after a configured wait of 120m.

AmplabJenkins · 2014-10-26T10:20:07Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22240/
Test FAILed.

Conflicts: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

SparkQA · 2014-10-28T07:19:28Z

Test build #482 has started for PR 2935 at commit 63f2972.

This patch merges cleanly.

SparkQA · 2014-10-28T07:19:54Z

Test build #22347 has started for PR 2935 at commit 5329091.

This patch merges cleanly.

SparkQA · 2014-10-28T09:19:28Z

Test build #482 timed out for PR 2935 at commit 63f2972 after a configured wait of 120m.

SparkQA · 2014-10-28T09:19:55Z

Test build #22347 timed out for PR 2935 at commit 5329091 after a configured wait of 120m.

AmplabJenkins · 2014-10-28T09:19:58Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22347/
Test FAILed.

davies · 2014-10-28T21:47:51Z

The timeout of jenkins test is caused by performance regression introduced in this PR. Deserialization of Hadoop Configuration is pretty slow, after removing broadcast for configuration, each task may have 10-50ms overhead for it. For Hive query, each task may have more than one configuration in one query, the regression is even more.

So, we can not remove the broadcast for conf if we can not speed up deserialization of hadoop configuration.

JoshRosen · 2014-10-31T22:05:51Z

If we can't find a workaround to speed up the deserialization, do you mind closing this PR for now?

remove unnecessary broadcast for conf

b4cd73e

davies changed the title ~~[SPARK-4082] remove unnecessary broadcast for conf~~ [SPARK-2585] remove unnecessary broadcast for conf Oct 24, 2014

JoshRosen mentioned this pull request Oct 25, 2014

[SPARK-4027][Streaming] WriteAheadLogBackedBlockRDD to read received either from BlockManager or WAL in HDFS #2931

Closed

thread safety

bc46dda

bugfix

74f4102

davies reviewed Oct 25, 2014
View reviewed changes

Davies Liu added 4 commits October 24, 2014 22:21

refactor

8b0fcd8

refactor

0de73d4

remove docs

8694cb3

Merge branch 'master' of github.com:apache/spark into fix_conf

32bd815

bugfix

1fd70df

Merge branch 'master' of github.com:apache/spark into fix_conf

ae32e92

Merge branch 'master' of github.com:apache/spark into fix_conf

63f2972

Merge branch 'master' of github.com:apache/spark into fix_conf

5329091

Conflicts: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala

davies closed this Oct 31, 2014

[SPARK-2585] remove unnecessary broadcast for conf #2935

[SPARK-2585] remove unnecessary broadcast for conf #2935

Conversation

davies commented Oct 24, 2014

SparkQA commented Oct 24, 2014

SparkQA commented Oct 24, 2014

AmplabJenkins commented Oct 24, 2014

JoshRosen commented Oct 24, 2014

JoshRosen commented Oct 24, 2014

JoshRosen commented Oct 24, 2014

JoshRosen commented Oct 25, 2014

davies commented Oct 25, 2014

JoshRosen commented Oct 25, 2014

SparkQA commented Oct 25, 2014

davies Oct 25, 2014

Choose a reason for hiding this comment

SparkQA commented Oct 25, 2014

davies commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

AmplabJenkins commented Oct 25, 2014

SparkQA commented Oct 25, 2014

AmplabJenkins commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

AmplabJenkins commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

davies commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

SparkQA commented Oct 25, 2014

AmplabJenkins commented Oct 25, 2014

SparkQA commented Oct 25, 2014

AmplabJenkins commented Oct 25, 2014

SparkQA commented Oct 26, 2014

SparkQA commented Oct 26, 2014

SparkQA commented Oct 26, 2014

SparkQA commented Oct 26, 2014

AmplabJenkins commented Oct 26, 2014

SparkQA commented Oct 28, 2014

SparkQA commented Oct 28, 2014

SparkQA commented Oct 28, 2014

SparkQA commented Oct 28, 2014

AmplabJenkins commented Oct 28, 2014

davies commented Oct 28, 2014

JoshRosen commented Oct 31, 2014