update spark.default.parallelism #389

CrazyJvm · 2014-04-11T06:55:54Z

actually, the value 8 is only valid in mesos fine-grained mode :
override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) }

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code>

AmplabJenkins · 2014-04-11T06:58:11Z

Can one of the admins verify this patch?

mateiz · 2014-04-13T01:29:29Z

Jenkins, test this please

mateiz · 2014-04-13T01:29:32Z

Good catch

AmplabJenkins · 2014-04-13T01:33:12Z

Merged build triggered.

AmplabJenkins · 2014-04-13T01:33:20Z

Merged build started.

AmplabJenkins · 2014-04-13T01:34:13Z

Merged build finished.

AmplabJenkins · 2014-04-13T01:34:14Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14084/

pwendell · 2014-04-13T03:17:19Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-13T03:18:11Z

Merged build triggered.

AmplabJenkins · 2014-04-13T03:18:21Z

Merged build started.

AmplabJenkins · 2014-04-13T04:49:24Z

Merged build finished.

AmplabJenkins · 2014-04-13T04:49:25Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14086/

mateiz · 2014-04-13T22:13:01Z

docs/configuration.md

+    <ul>
+      <li>Mesos fine grained mode: 8
+      <li>Local mode: core number of the local machine
+      <li>Others: total core number of all executor nodes or 2, whichever is larger


Actually to have valid HTML, add </li> at the end of these.

oh yeah, missed </li>. fixed it.

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 detail is as following code : <code> def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } </code>

rxin · 2014-04-15T06:00:33Z

Jenkins, retest this please.

AmplabJenkins · 2014-04-15T06:03:13Z

Merged build triggered.

AmplabJenkins · 2014-04-15T06:03:23Z

Merged build started.

AmplabJenkins · 2014-04-15T06:54:55Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-04-15T06:54:55Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/

CrazyJvm · 2014-04-16T06:07:36Z

Jenkins test result is OK, but Travis fails. So...what's going on?

pwendell · 2014-04-16T16:13:58Z

Built this locally and it looked gogd, so I'm merging it. Don't worry about Travis - it's currently experimental.

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code> Author: Chen Chao <[email protected]> Closes #389 from CrazyJvm/patch-2 and squashes the following commits: 84a7fe4 [Chen Chao] miss </li> at the end of every single line 04a9796 [Chen Chao] change format ee0fae0 [Chen Chao] update spark.default.parallelism (cherry picked from commit 9edd887) Signed-off-by: Patrick Wendell <[email protected]>

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes #403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey (cherry picked from commit 9c40b9e) Signed-off-by: Reynold Xin <[email protected]>

<code> private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } </code> it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to #389 Author: Chen Chao <[email protected]> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places (cherry picked from commit 2f63995) Signed-off-by: Reynold Xin <[email protected]>

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to #389 Author: Chen Chao <[email protected]> Closes #766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places

actually, the value 8 is only valid in mesos fine-grained mode : <code> override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8) </code> while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers! <code> override def defaultParallelism(): Int = { conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2)) } </code> Author: Chen Chao <[email protected]> Closes apache#389 from CrazyJvm/patch-2 and squashes the following commits: 84a7fe4 [Chen Chao] miss </li> at the end of every single line 04a9796 [Chen Chao] change format ee0fae0 [Chen Chao] update spark.default.parallelism

"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 detail is as following code : def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = { val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse for (r <- bySize if r.partitioner.isDefined) { return r.partitioner.get } if (rdd.context.conf.contains("spark.default.parallelism")) { new HashPartitioner(rdd.context.defaultParallelism) } else { new HashPartitioner(bySize.head.partitions.size) } } Author: Chen Chao <[email protected]> Closes apache#403 from CrazyJvm/patch-4 and squashes the following commits: 42f6c9e [Chen Chao] fix format 829a995 [Chen Chao] fix format 1568336 [Chen Chao] misleading task number of groupByKey

private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){ new HashPartitioner(numPartitions) } it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism the property "spark.default.parallelism" refers to apache#389 Author: Chen Chao <[email protected]> Closes apache#766 from CrazyJvm/patch-7 and squashes the following commits: 0b7efba [Chen Chao] Update streaming-programming-guide.md cc5b66c [Chen Chao] default task number misleading in several places

…dle (apache#389) [SPARK-24767] Propagate MDC to spark-submit thread in InProcessAppHandle (apache#389) [SPARK-24813][BUILD][FOLLOW-UP][HOTFIX] HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive

Apply AS enabled flavor in FusionCloud job

change format

04a9796

mateiz reviewed Apr 13, 2014
View reviewed changes

miss </li> at the end of every single line

84a7fe4

oh yeah, missed </li>. fixed it.

CrazyJvm mentioned this pull request Apr 14, 2014

misleading task number of groupByKey #403

Closed

asfgit closed this in 9edd887 Apr 16, 2014

CrazyJvm mentioned this pull request May 14, 2014

default task number misleading in several places #766

Closed

tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Aug 9, 2017

fixes apache#389 - increase SparkReadinessWatcher wait time (apache#419)

bd50627

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Merge pull request apache#389 from theopenlab/as-flavor

b6841f5

Apply AS enabled flavor in FusionCloud job

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update spark.default.parallelism #389

update spark.default.parallelism #389

CrazyJvm commented Apr 11, 2014

AmplabJenkins commented Apr 11, 2014

mateiz commented Apr 13, 2014

mateiz commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

pwendell commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

mateiz Apr 13, 2014

rxin commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

CrazyJvm commented Apr 16, 2014

pwendell commented Apr 16, 2014

update spark.default.parallelism #389

update spark.default.parallelism #389

Conversation

CrazyJvm commented Apr 11, 2014

AmplabJenkins commented Apr 11, 2014

mateiz commented Apr 13, 2014

mateiz commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

pwendell commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

AmplabJenkins commented Apr 13, 2014

mateiz Apr 13, 2014

Choose a reason for hiding this comment

rxin commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

AmplabJenkins commented Apr 15, 2014

CrazyJvm commented Apr 16, 2014

pwendell commented Apr 16, 2014