-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay][VM] Relay VM memory liveness/lifetime analysis #10026
Conversation
11c01d3
to
5b7eab2
Compare
You might also want to read and delete this pass: https://github.com/apache/tvm/blob/main/python/tvm/relay/transform/memory_plan.py#L19. The main differences afaict is that it tries to detect dynamic/static regions then combine them before inserting kills. |
I'm not going to touch this since it's complimentary to the liveness analysis, but probably worth reviving this pass in C++ later to reduce the allocation overhead. That pass doesn't overlap allocations over time though so it would sort of nullify the effect of liveness analysis until static planning is added. |
I wrote this as a comment in the code, but reposting here for future reference:
|
cc @jroesch @mbs-octoml @mbrookhart this PR is ready for full review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just nits & comments I think so feel free to proceed if green and follow up later.
cfg_.let_map[expr] = curr_node; | ||
cfg_.reverse_post_order.push_back(curr_node); | ||
|
||
if (const IfNode* ite = AsIgnoringOnDevice<IfNode>(inner_let_node->value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Reminder comment that we are dealing with the 'exits' from the bb, hence only looking at the two branch exprs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me know if the comment I added helps at all
// expr of an If branch; when bound vars (i.e. function inputs, pattern matched vars, dead | ||
// bindings) are never used. | ||
// | ||
// 3. When the result expr of an If branch is a variable, and this expr is the last use of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot, I missed that one. Could push the 'return kills' onto an internal stack and pop them after each join point, but agree a TODO is find.
Thanks @altanh @mbs-octoml @jroesch |
* WIP VM memory planning * tuple projection * support if * lint * remove old comment * WIP check in attempt at CFG analysis * rewrite CFG analysis in stages, support ADTs * lint * fix small bug in alias elimination, try fix VM profiler error * update DCE tests since allocations can be DCE'd * optimize worklist to reduce runtime * add docs, rename pass to ManifestLifetimes * add tests, more comments, proper VM profiler fix * lint * ci please * address nits * retry ci again * retry ci once again :) * fix sneaky memory leak due to cyclic refs * fix didn't work but retry ci anyway * slightly reduce size of large pretty printer test
This PR adds basic memory management to the Relay VM by inserting kill annotations on variables at the end of their lifetimes. Kill annotations are translated to a new VM instruction
KillRegister
, which nulls out the specified register. By nulling out a register, we destroy the ObjectRef inside, which eventually leads to tensors being freed via refcounting. This approach automatically handles aliasing of tensors (e.g. via tuples, ADTs) as the aliases are reflected in the run-time refcount.Lifetime analysis is done using standard data-flow analysis on the CFG of the post-memory-lowering IR. The main tricky bit involves respecting the VM compiler's register aliasing scheme (e.g. var-to-var bindings); an alternative approach would be to move the register aliasing logic into the VM compiler itself, so that kill annotations are only translated to a
KillRegister
when all aliases of a register have been killed.This PR does not do "static" memory planning in the sense of optimizing an allocation plan. Follow-up work should revive the storage coalescing pass and do static planning within each storage (although this would need static analysis of aliasing).
With this PR, BERT-SQuAD sees around 10x memory reduction (~20GB -> ~2GB on our tested input size).
Other changes in this PR:
memory.alloc_storage
andmemory.alloc_tensor
are now marked as non-stateful, since they can be safely eliminated by DCETODOs:
conservatively support relay refs (or skip memory planning gracefully at least)refs are not supported in the VM currently (see [Relay][VM] Add support for references. #6798)