Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update spark.default.parallelism #389

Closed
wants to merge 3 commits into from
Closed

Conversation

CrazyJvm
Copy link
Contributor

actually, the value 8 is only valid in mesos fine-grained mode :

override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!

override def defaultParallelism(): Int = {
conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
}

actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code> 
override def defaultParallelism(): Int = {
    conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@mateiz
Copy link
Contributor

mateiz commented Apr 13, 2014

Jenkins, test this please

@mateiz
Copy link
Contributor

mateiz commented Apr 13, 2014

Good catch

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14084/

@pwendell
Copy link
Contributor

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14086/

<ul>
<li>Mesos fine grained mode: 8
<li>Local mode: core number of the local machine
<li>Others: total core number of all executor nodes or 2, whichever is larger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually to have valid HTML, add </li> at the end of these.

oh yeah, missed </li>. fixed it.
CrazyJvm added a commit to CrazyJvm/spark that referenced this pull request Apr 14, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389 

detail is as following code :
<code>
  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }
</code>
@rxin
Copy link
Contributor

rxin commented Apr 15, 2014

Jenkins, retest this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14136/

@CrazyJvm
Copy link
Contributor Author

Jenkins test result is OK, but Travis fails. So...what's going on?

@pwendell
Copy link
Contributor

Built this locally and it looked gogd, so I'm merging it. Don't worry about Travis - it's currently experimental.

@asfgit asfgit closed this in 9edd887 Apr 16, 2014
asfgit pushed a commit that referenced this pull request Apr 16, 2014
actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code>
override def defaultParallelism(): Int = {
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>

Author: Chen Chao <[email protected]>

Closes #389 from CrazyJvm/patch-2 and squashes the following commits:

84a7fe4 [Chen Chao] miss </li> at the end of every single line
04a9796 [Chen Chao] change format
ee0fae0 [Chen Chao] update spark.default.parallelism
(cherry picked from commit 9edd887)

Signed-off-by: Patrick Wendell <[email protected]>
asfgit pushed a commit that referenced this pull request Apr 17, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes #403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey
asfgit pushed a commit that referenced this pull request Apr 17, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to #389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes #403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey

(cherry picked from commit 9c40b9e)
Signed-off-by: Reynold Xin <[email protected]>
CrazyJvm added a commit to CrazyJvm/spark that referenced this pull request May 14, 2014
<code>
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }
</code>

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to apache#389
asfgit pushed a commit that referenced this pull request May 15, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to #389

Author: Chen Chao <[email protected]>

Closes #766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places

(cherry picked from commit 2f63995)
Signed-off-by: Reynold Xin <[email protected]>
asfgit pushed a commit that referenced this pull request May 15, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to #389

Author: Chen Chao <[email protected]>

Closes #766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
actually, the value 8 is only valid in mesos fine-grained mode :
<code>
  override def defaultParallelism() = sc.conf.getInt("spark.default.parallelism", 8)
</code>

while in coarse-grained model including mesos coares-grained, the value of the property depending on core numbers!
<code>
override def defaultParallelism(): Int = {
   conf.getInt("spark.default.parallelism", math.max(totalCoreCount.get(), 2))
  }
</code>

Author: Chen Chao <[email protected]>

Closes apache#389 from CrazyJvm/patch-2 and squashes the following commits:

84a7fe4 [Chen Chao] miss </li> at the end of every single line
04a9796 [Chen Chao] change format
ee0fae0 [Chen Chao] update spark.default.parallelism
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
"By default, this uses only 8 parallel tasks to do the grouping." is a big misleading. Please refer to apache#389

detail is as following code :

  def defaultPartitioner(rdd: RDD[_], others: RDD[_]*): Partitioner = {
    val bySize = (Seq(rdd) ++ others).sortBy(_.partitions.size).reverse
    for (r <- bySize if r.partitioner.isDefined) {
      return r.partitioner.get
    }
    if (rdd.context.conf.contains("spark.default.parallelism")) {
      new HashPartitioner(rdd.context.defaultParallelism)
    } else {
      new HashPartitioner(bySize.head.partitions.size)
    }
  }

Author: Chen Chao <[email protected]>

Closes apache#403 from CrazyJvm/patch-4 and squashes the following commits:

42f6c9e [Chen Chao] fix format
829a995 [Chen Chao] fix format
1568336 [Chen Chao] misleading task number of groupByKey
pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
  private[streaming] def defaultPartitioner(numPartitions: Int = self.ssc.sc.defaultParallelism){
    new HashPartitioner(numPartitions)
  }

it represents that the default task number in Spark Streaming relies on the variable defaultParallelism in SparkContext, which is decided by the config property spark.default.parallelism

the property "spark.default.parallelism" refers to apache#389

Author: Chen Chao <[email protected]>

Closes apache#766 from CrazyJvm/patch-7 and squashes the following commits:

0b7efba [Chen Chao] Update streaming-programming-guide.md
cc5b66c [Chen Chao] default task number misleading in several places
tangzhankun pushed a commit to tangzhankun/spark that referenced this pull request Aug 9, 2017
mccheah pushed a commit to mccheah/spark that referenced this pull request Nov 28, 2018
…dle (apache#389)

[SPARK-24767] Propagate MDC to spark-submit thread in InProcessAppHandle (apache#389)
[SPARK-24813][BUILD][FOLLOW-UP][HOTFIX] HiveExternalCatalogVersionsSuite still flaky; fall back to Apache archive
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
Apply AS enabled flavor in FusionCloud job
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants