-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
list += nd.slice(0, realSize).copy() | ||
val ndSliced = nd.slice(0, realSize) | ||
list += ndSliced.copy() | ||
ndSliced.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little concern here as we cannot dispose the whole nd
but a slice of it. But as tested to be stable, it's better than let customer run it with memory leak. LGTM for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because predExec.outputs
is reused and has same lifecycle as predExec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but how about not copy()
, simply do list += nd.slice(0, realSize)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried that and it just breaks for unknown reasons. First inference goes well and second just break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanking520 Can you show the exception information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lanking520 @yzhliu Is it because the jni doesn't free memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuzx32 The way that Scala works is more like playing around shared pointers. When we create a NDArray, we request a pointer that point to a memory space on the C++ side. If we want to release this piece of memory, we need to call dispose somewhere in the Scala code in order to get it executed. However, if we forgot to do that, Scala/JVM will discard this pointer rather than helping us release the memory and that cause the memory leak.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
On a wider note: how can we verify this with automated testing? (unit tests or integration tests)
list += nd.slice(0, realSize).copy() | ||
val ndSliced = nd.slice(0, realSize) | ||
list += ndSliced.copy() | ||
ndSliced.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we do a try/finally to make sure dispose always happens, even in case of an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're getting an exception here something has gone very wrong. Most likely the problem would be memory access/allocation and I'm not confident dispose would work correctly under those conditions.
I'll make the change anyway because although I think it's unlikely to ever help it definitely won't hurt and is a good practice to follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel a little bit overkilled adding try/finally for each time we dispose sth.
Eventually that should be handled by GC.
} | ||
batch.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about using try/finally to make sure dispose always happens, even in case of an exception.
@@ -230,8 +230,11 @@ class FeedForward private( | |||
val padded = batch.pad | |||
val realSize = batchSize - padded | |||
for ((list, nd) <- outputs zip predExec.outputs) { | |||
list += nd.slice(0, realSize).copy() | |||
val ndSliced = nd.slice(0, realSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a comment explaining why id has to be done this way, and not the more compact way used before. Otherwise, someone may easily revert this change unknowingly in the future.
Hi @lupesko , this is covered in the unit test to do with a single inference with MNIST example. The initial bug came from the inference serving (crash after an hour). We have tested it offline to check the memory usage and saw that goes stable for a long time. It will be hard to track in that case. I think we can place this test as a part of Scala Benchmark AI if they can do it in nightly test. |
} | ||
} | ||
} finally { | ||
batch.dispose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't it be the dataArrays that needs to be disposed ? see loadData and loadDatageneral where its doing a copy from batch to dataArrays
I think we shouldn't dispose the original input from the Iterator, for example if the user wants to use the same input on an another model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't disposing of the original input. When data.next() is called, a slice of the data is made here. This slice is what should be getting disposed of when we dispose of batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Andrew, what about dataArrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing the conversation we had offline: It appears that the copyTo in loadData is copying to memory in the Executor predExec. The memory in the executor seems to get reused across predict calls. I think that the proper place to dispose of this part would be to add a dispose method for FeedForward and have that dispose of it's predExec.
* Fixes Scala memory leak (apache#10436) * Replaced the copy and disposed of sliced ndArray to resolve memory leak * Wrapped disposes in a finally to ensure they are called.
* Fixes Scala memory leak (apache#10436) * Replaced the copy and disposed of sliced ndArray to resolve memory leak * Wrapped disposes in a finally to ensure they are called.
* Fixes Scala memory leak (apache#10436)
* Fixes Scala memory leak (apache#10436)
Description
This fixes a memory leak that results from the FeedForward.predict method not properly disposing of the results from NDArray.slice().
Testing
Verifying leak exists
To verify that there was a leak in the existing code I created and trained a basic model off of MNIST data then called the predict method inside a while(true) loop then monitored the process's memory usage.
Verifying leak fix
After making the code changes, I repeated the process I had used to verify there was a leak. Memory consumption was much improved and looks to be stable.
More testing
In addition to the tests I ran to monitor memory usage, I also ran the existing tests with 'make scalatest'. All tests pass.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.