-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13122] Fix race condition in MemoryStore.unrollSafely() #11012
Conversation
Pinging @andrewor14 , the original implementor of unrollSafely(), for any potential feedback. |
Test build #50520 has finished for PR 11012 at commit
|
https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two threads that return the same value for currentTaskAttemptId() execute this method concurrently. This change makes the operation of reading the initial amount of unroll memory used, performing the unroll, and updating the associated memory maps atomic in order to avoid this race condition.
Updated PR with new implementation that uses a counter variable instead of requiring the whole method to be atomic. |
/cc @JoshRosen |
test this please |
I think you need to account for this on a per-task rather than per-block basis in order to enforce fair sharing of memory between concurrently-running tasks. |
Quoting from the JIRA ticket:
I think this is definitely a bug. I believe that the intent here was that we'd return a dummy Even if we did fix the Intuitively, the idea of adding extra synchronization here seems right to me, although I'd like to take a closer look at the changes here to see whether this will introduce performance problems: my guess is that the under-synchronization might have been caused by a desire to avoid holding monitors/locks during expensive operations. @zsxwing, do you know why the streaming longevity / memory leak tests didn't catch this leak? |
@@ -304,10 +309,9 @@ private[spark] class MemoryStore(blockManager: BlockManager, memoryManager: Memo | |||
// release the unroll memory yet. Instead, we transfer it to pending unroll memory | |||
// so `tryToPut` can further transfer it to normal storage memory later. | |||
// TODO: we can probably express this without pending unroll memory (SPARK-10907) | |||
val amountToTransferToPending = currentUnrollMemoryForThisTask - previousMemoryReserved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the race you are addressing right? The issue is that previousMemoryReserved is out of date.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it seems like the current PR description doesn't quite describe the changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my earlier comment, I updated the PR to use to use a var named previousMemoryReserved to manually track the number of unroll bytes allocated during a given invocation of unrollSafely rather than relying on unrollMemoryMap(taskAttemptId) not being modified outside of the given thread between the assignment to previousMemoryReserved and the memory maps being updated in the finally { } block. This should remove the need to make the whole method synchronized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nongli – the problem is that the original implementation assumes that previousMemoryReserved is an invariant representing the number of unroll bytes allocated for the process besides the pending bytes allocated during the unroll, but no synchronization exists to enforce this invariant.
If I understand correctly, this issue only happens when a receiver starts multiple threads. The memory leak tests I did only use one thread per receiver. |
This looks right to me. |
From Jenkins output:
|
retest this please |
@budde thanks for providing so much detail in the JIRA and fixing this issue. I believe the latest changes are correct and seem more performant than what you had earlier. It seems that this issue only happens if we have multiple threads running the same task. In your case, this is the receiver task, though in any case we shouldn't be getting -1 for Have you had a chance to run your test against the latest changes to prove that it fixes the leak? If so, I will go ahead and merge this. |
Latest change is looking good on my end. No unroll memory is being leaked. |
Test build #50612 has finished for PR 11012 at commit
|
Looks like a bunch of Spark SQL/Hive tests are failing due to this error:
I'm guessing this commit didn't break this? |
no it's unrelated. I manually triggered a few extra builds. retest this please |
Test build #2498 has finished for PR 11012 at commit
|
Test build #2499 has finished for PR 11012 at commit
|
Test build #50627 has finished for PR 11012 at commit
|
Test build #50629 has finished for PR 11012 at commit
|
Merged into master and 1.6. Thanks again for catching this issue. |
https://issues.apache.org/jira/browse/SPARK-13122 A race condition can occur in MemoryStore's unrollSafely() method if two threads that return the same value for currentTaskAttemptId() execute this method concurrently. This change makes the operation of reading the initial amount of unroll memory used, performing the unroll, and updating the associated memory maps atomic in order to avoid this race condition. Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID. Author: Adam Budde <[email protected]> Closes #11012 from budde/master. (cherry picked from commit ff71261) Signed-off-by: Andrew Or <[email protected]> Conflicts: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala
https://issues.apache.org/jira/browse/SPARK-13122
A race condition can occur in MemoryStore's unrollSafely() method if two threads that
return the same value for currentTaskAttemptId() execute this method concurrently. This
change makes the operation of reading the initial amount of unroll memory used, performing
the unroll, and updating the associated memory maps atomic in order to avoid this race
condition.
Initial proposed fix wraps all of unrollSafely() in a memoryManager.synchronized { } block. A cleaner approach might be introduce a mechanism that synchronizes based on task attempt ID. An alternative option might be to track unroll/pending unroll memory based on block ID rather than task attempt ID.