Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-20483] Mesos Coarse mode may starve other Mesos frameworks #17786

Closed
wants to merge 1 commit into from

Conversation

dgshep
Copy link

@dgshep dgshep commented Apr 27, 2017

What changes were proposed in this pull request?

Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores

How was this patch tested?

This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.

@dbtsai @mgummelt

Set max cores to a multiple of the smallestExecutorSize
@dbtsai
Copy link
Member

dbtsai commented Apr 27, 2017

Jenkins, add to whitelist.

@dbtsai
Copy link
Member

dbtsai commented Apr 27, 2017

Jenkins, ok to test.

@SparkQA
Copy link

SparkQA commented Apr 27, 2017

Test build #76234 has finished for PR 17786 at commit a808632.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Apr 27, 2017

LGTM. Merged into master and branch 2.2. Thanks.

asfgit pushed a commit that referenced this pull request Apr 27, 2017
## What changes were proposed in this pull request?

Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores

## How was this patch tested?

This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.

dbtsai mgummelt

Author: Davis Shepherd <[email protected]>

Closes #17786 from dgshep/fix_mesos_max_cores.

(cherry picked from commit 7633933)
Signed-off-by: DB Tsai <[email protected]>
@asfgit asfgit closed this in 7633933 Apr 27, 2017
@dgshep dgshep deleted the fix_mesos_max_cores branch April 27, 2017 18:26
asfgit pushed a commit that referenced this pull request Apr 27, 2017
…s frameworks

## What changes were proposed in this pull request?

Add test case for scenarios where executor.cores is set as a
(non)divisor of spark.cores.max
This tests the change in
#17786

## How was this patch tested?

Ran the existing test suite with the new tests

dbtsai

Author: Davis Shepherd <[email protected]>

Closes #17788 from dgshep/add_mesos_test.

(cherry picked from commit 039e32c)
Signed-off-by: DB Tsai <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Apr 27, 2017
…s frameworks

## What changes were proposed in this pull request?

Add test case for scenarios where executor.cores is set as a
(non)divisor of spark.cores.max
This tests the change in
apache#17786

## How was this patch tested?

Ran the existing test suite with the new tests

dbtsai

Author: Davis Shepherd <[email protected]>

Closes apache#17788 from dgshep/add_mesos_test.
@dbtsai
Copy link
Member

dbtsai commented Apr 27, 2017

Tests are added in a followup PR. #17788

private val maxCores = maxCoresOption.getOrElse(Int.MaxValue)
private val maxCores = {
val cores = maxCoresOption.getOrElse(Int.MaxValue)
// Set maxCores to a multiple of smallest executor we can launch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have a log here to let users know?

@mgummelt
Copy link
Contributor

@dbtsai Can we please wait to get a LGTM from one of the active Mesos contributers (@skonto, @tnachen, myself, etc.) before merging Mesos code?

I would have rather this be solved in the offer rejection code rather than changing what maxCores means.

@mridulm
Copy link
Contributor

mridulm commented Apr 27, 2017

I agree with @mgummelt, maxCores should have reflected user config option.

@dgshep
Copy link
Author

dgshep commented Apr 28, 2017

Fair point. This felt like a succinct way to handle this corner case, but if it makes sense to harden the offer refusal code instead, I can update.

@dbtsai
Copy link
Member

dbtsai commented Apr 28, 2017

@mgummelt We tested this in our production env, and it solves our issue. Since it seems to be a trivial change, I made my judgement. Gonna wait for more feedback. Thanks.

jzhuge pushed a commit to jzhuge/spark that referenced this pull request Aug 20, 2018
Set maxCores to be a multiple of the smallest executor that can be launched. This ensures that we correctly detect the condition where no more executors will be launched when spark.cores.max is not a multiple of spark.executor.cores

This was manually tested with other sample frameworks measuring their incoming offers to determine if starvation would occur.

dbtsai mgummelt

Author: Davis Shepherd <[email protected]>

Closes apache#17786 from dgshep/fix_mesos_max_cores.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants