You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I followed the steps below, but onnxruntime gave the message and then exit with segmentation falut:
[F:onnxruntime:, bfc_arena.h:330 RegionFor] Could not find Region for 0x7ffb2b04d000
Thanks @Z-XQ for the explanation.
The GPU memory is backed by a memory pool (arena) and we have a config knob to shrink the arena (de-allocated unused memory chunks).
Not sure if we have enough tools to accomplish this in Python just yet. The best way to use this feature in C++ is to:
Not allocate weights memory through the arena: See here
Configure the arena to have high enough initial chunk to support most Run() calls. See "initial_chunk_size_bytes" here
Finally, configure the arena to shrink on every Run(). See here. This will keep the initial chunk allocated but de-allocate any unused chunk remaining after the Run() call ends.
For example, if the initial chunk size is set as 500MB, the first Run() will allocate 500MB + any additional chunks required to service the Run() call. The additional chunks will get de-allocated after Run() and only keep 500MB of memory allocated. It is important to not allocate weights (initializers) memory through the arena as that complicates the shrinkage. Hence, step (1).
I followed the steps below, but onnxruntime gave the message and then exit with segmentation falut:
[F:onnxruntime:, bfc_arena.h:330 RegionFor] Could not find Region for 0x7ffb2b04d000
The GPU memory is backed by a memory pool (arena) and we have a config knob to shrink the arena (de-allocated unused memory chunks).
Not sure if we have enough tools to accomplish this in Python just yet. The best way to use this feature in C++ is to:
Not allocate weights memory through the arena: See here
Configure the arena to have high enough initial chunk to support most Run() calls. See "initial_chunk_size_bytes" here
Finally, configure the arena to shrink on every Run(). See here. This will keep the initial chunk allocated but de-allocate any unused chunk remaining after the Run() call ends.
For example, if the initial chunk size is set as 500MB, the first Run() will allocate 500MB + any additional chunks required to service the Run() call. The additional chunks will get de-allocated after Run() and only keep 500MB of memory allocated. It is important to not allocate weights (initializers) memory through the arena as that complicates the shrinkage. Hence, step (1).
Originally posted by @hariharans29 in #9509 (comment)
The text was updated successfully, but these errors were encountered: