[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

vanzin · 2015-06-03T00:56:27Z

This should help reduce latency for new executor allocations.

SparkQA · 2015-06-03T02:48:44Z

Test build #34042 has finished for PR 6600 at commit 8387a3a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2015-06-03T12:59:03Z

LGTM. I was trying to think if there was anything in java.util.concurrent that makes this simpler but couldn't find anything, and this is pretty straightforward as is.

harishreedharan · 2015-06-03T14:52:26Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala

@@ -359,7 +360,9 @@ private[spark] class ApplicationMaster(
              }
            logDebug(s"Number of pending allocations is $numPendingAllocate. " +
                     s"Sleeping for $sleepInterval.")
-            Thread.sleep(sleepInterval)
+            allocatorLock.synchronized {
+              allocatorLock.wait(sleepInterval)


In this case it might be ok, but a wait call should not be called from outside of a while: http://stackoverflow.com/questions/1038007/why-should-wait-always-be-called-inside-a-loop
(in this case too, we'd still want to protect against spurious wake ups - so adding a loop is good)

Yeah, I think it's not strictly necessary but good practice. Though here we're technically in a loop, just a bigger one. Since we intend to allocate stuff periodically the worst thing that can possibly happen is some additional latency, and this only happens if there is a 3rd thread somewhere (does not exist).

In other words the spurious wake ups are benign and not possible given the changes here. I'm fine with this being merged as is.

srowen · 2015-06-03T15:01:05Z

Yeah spurious wakeup doesn't matter here. It would just check again early.

andrewor14 · 2015-06-03T17:59:18Z

yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala

   */
-  def requestTotalExecutors(requestedTotal: Int): Unit = synchronized {
+  def requestTotalExecutors(requestedTotal: Int): Boolean = synchronized {


The only slightly scary thing is that we have another method with the exact same signature, but the return value means something different. Since we document this clearly it should be fine.

andrewor14 · 2015-06-03T18:00:11Z

LGTM. I'll let this sit for another day or so in case others have anything else to add.

harishreedharan · 2015-06-03T18:45:07Z

Yep, in this case it is likely fine, but is good practice to have the loop
inside the synchronized (the variable that is checked by the loop should be
updated inside the synchronized). Many static analysis tools will flag this
though (not sure about Google ErrorProne though).

On Wednesday, June 3, 2015, andrewor14 [email protected] wrote:

LGTM. I'll let this sit for another day or so in case others have anything
else to add.

—
Reply to this email directly or view it on GitHub
#6600 (comment).

Thanks,
Hari

vanzin · 2015-06-03T18:47:05Z

In case you guys have not noticed, there's no variable to check, so you can't loop on anything. This is just a plain "wake up".

Yes, you could add one, but that would just complicate the code (e.g. figuring out how much to sleep after a spurious wake up).

srowen · 2015-06-03T18:47:09Z

It's good practice when you are wait()-ing on some condition to be true, but that is not the case here. It may be flagged but it is not an instance of the actual problem these tools would be trying to detect.

harishreedharan · 2015-06-03T20:12:53Z

Not suggesting that we change it here - but it looks like ErrorProne does have a check which we might have to disable for this file - http://errorprone.info/bugpattern/WaitNotInLoop

andrewor14 · 2015-06-03T21:59:10Z

Merging into master. @vanzin Is this worth back porting into 1.4? I didn't because the original patch that reduces heartbeat latency is not in there.

This should help reduce latency for new executor allocations. Author: Marcelo Vanzin <[email protected]> Closes apache#6600 from vanzin/SPARK-8059 and squashes the following commits: 8387a3a [Marcelo Vanzin] [SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive.

8387a3a

This should help reduce latency for new executor allocations.

harishreedharan reviewed Jun 3, 2015
View reviewed changes

andrewor14 reviewed Jun 3, 2015
View reviewed changes

asfgit closed this in aa40c44 Jun 3, 2015

andrewor14 mentioned this pull request Jun 3, 2015

[SPARK-7938][BUILD]Use Google ErrorProne during Maven build of Spark #6515

Closed

vanzin deleted the SPARK-8059 branch June 4, 2015 17:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

vanzin commented Jun 3, 2015

SparkQA commented Jun 3, 2015

srowen commented Jun 3, 2015

harishreedharan Jun 3, 2015

andrewor14 Jun 3, 2015

srowen commented Jun 3, 2015

andrewor14 Jun 3, 2015

andrewor14 commented Jun 3, 2015

harishreedharan commented Jun 3, 2015

vanzin commented Jun 3, 2015

srowen commented Jun 3, 2015

harishreedharan commented Jun 3, 2015

andrewor14 commented Jun 3, 2015

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

[SPARK-8059] [yarn] Wake up allocation thread when new requests arrive. #6600

Conversation

vanzin commented Jun 3, 2015

SparkQA commented Jun 3, 2015

srowen commented Jun 3, 2015

harishreedharan Jun 3, 2015

Choose a reason for hiding this comment

andrewor14 Jun 3, 2015

Choose a reason for hiding this comment

srowen commented Jun 3, 2015

andrewor14 Jun 3, 2015

Choose a reason for hiding this comment

andrewor14 commented Jun 3, 2015

harishreedharan commented Jun 3, 2015

vanzin commented Jun 3, 2015

srowen commented Jun 3, 2015

harishreedharan commented Jun 3, 2015

andrewor14 commented Jun 3, 2015