Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

Closed
wants to merge 1 commit into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Jun 3, 2015

This should help reduce latency for new executor allocations.

This should help reduce latency for new executor allocations.
@SparkQA
Copy link

SparkQA commented Jun 3, 2015

Test build #34042 has finished for PR 6600 at commit 8387a3a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Jun 3, 2015

LGTM. I was trying to think if there was anything in java.util.concurrent that makes this simpler but couldn't find anything, and this is pretty straightforward as is.

@@ -359,7 +360,9 @@ private[spark] class ApplicationMaster(
}
logDebug(s"Number of pending allocations is $numPendingAllocate. " +
s"Sleeping for $sleepInterval.")
Thread.sleep(sleepInterval)
allocatorLock.synchronized {
allocatorLock.wait(sleepInterval)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case it might be ok, but a wait call should not be called from outside of a while: http://stackoverflow.com/questions/1038007/why-should-wait-always-be-called-inside-a-loop
(in this case too, we'd still want to protect against spurious wake ups - so adding a loop is good)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's not strictly necessary but good practice. Though here we're technically in a loop, just a bigger one. Since we intend to allocate stuff periodically the worst thing that can possibly happen is some additional latency, and this only happens if there is a 3rd thread somewhere (does not exist).

In other words the spurious wake ups are benign and not possible given the changes here. I'm fine with this being merged as is.

@srowen
Copy link
Member

srowen commented Jun 3, 2015

Yeah spurious wakeup doesn't matter here. It would just check again early.

*/
def requestTotalExecutors(requestedTotal: Int): Unit = synchronized {
def requestTotalExecutors(requestedTotal: Int): Boolean = synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only slightly scary thing is that we have another method with the exact same signature, but the return value means something different. Since we document this clearly it should be fine.

@andrewor14
Copy link
Contributor

LGTM. I'll let this sit for another day or so in case others have anything else to add.

@harishreedharan
Copy link
Contributor

Yep, in this case it is likely fine, but is good practice to have the loop
inside the synchronized (the variable that is checked by the loop should be
updated inside the synchronized). Many static analysis tools will flag this
though (not sure about Google ErrorProne though).

On Wednesday, June 3, 2015, andrewor14 [email protected] wrote:

LGTM. I'll let this sit for another day or so in case others have anything
else to add.


Reply to this email directly or view it on GitHub
#6600 (comment).

Thanks,
Hari

@vanzin
Copy link
Contributor Author

vanzin commented Jun 3, 2015

In case you guys have not noticed, there's no variable to check, so you can't loop on anything. This is just a plain "wake up".

Yes, you could add one, but that would just complicate the code (e.g. figuring out how much to sleep after a spurious wake up).

@srowen
Copy link
Member

srowen commented Jun 3, 2015

It's good practice when you are wait()-ing on some condition to be true, but that is not the case here. It may be flagged but it is not an instance of the actual problem these tools would be trying to detect.

@harishreedharan
Copy link
Contributor

Not suggesting that we change it here - but it looks like ErrorProne does have a check which we might have to disable for this file - http://errorprone.info/bugpattern/WaitNotInLoop

@andrewor14
Copy link
Contributor

Merging into master. @vanzin Is this worth back porting into 1.4? I didn't because the original patch that reduces heartbeat latency is not in there.

@asfgit asfgit closed this in aa40c44 Jun 3, 2015
@vanzin vanzin deleted the SPARK-8059 branch June 4, 2015 17:59
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This should help reduce latency for new executor allocations.

Author: Marcelo Vanzin <[email protected]>

Closes apache#6600 from vanzin/SPARK-8059 and squashes the following commits:

8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This should help reduce latency for new executor allocations.

Author: Marcelo Vanzin <[email protected]>

Closes apache#6600 from vanzin/SPARK-8059 and squashes the following commits:

8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants