-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AOT] Calculate used memory at the callsite of primitive functions #11208
Conversation
5395581
to
71a7fa6
Compare
also cc @mbs-octoml @areusch |
@@ -0,0 +1,367 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No functional changes here, simply moving manifest_lifetimes.cc
../ (outside scope of vm) and splitting into .cc/.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lhutton1 !
I did a take a first look. It looks broady good.
Few suggestions for more test cases and a question about using the the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @lhutton1 @manupa-arm , just had a couple questions on this one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews @manupa-arm, @areusch, @altanh - hoping to have a revised version ready soon!
thanks @lhutton1 for the replies! ping us when this is ready for review again. |
71a7fa6
to
cd58ca9
Compare
cd58ca9
to
9241b66
Compare
Apologies for the delay, this is ready for another look! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly LGTM with some small questions/nits!
297c62e
to
4d95daa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes! LGTM
4d95daa
to
1c274d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@lhutton1 since it has been 18 days, should we re-run a round of CI -- just to be sure :) |
1c274d7
to
9e6db41
Compare
Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89
Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2
In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86
Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd
Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
9e6db41
to
89f7523
Compare
…pache#11208) * [AOT] Calculate used memory at the callsite of primitive functions Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89 * small fix to file description Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2 * Various improvements addressing comments In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86 * addressing comments Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd * add note for dynamic shapes Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
Hi @lhutton1 , thanks for your contributition. import pytest
from collections import OrderedDict
import numpy as np
import tvm
from tvm import relay
from tvm.relay import testing
def AnnotateUsedMemory():
return relay.transform._ffi_api.AnnotateUsedMemory()
def _get_data(in_data_shapes, dtype="float32"):
in_data = OrderedDict()
for name, shape in in_data_shapes.items():
in_data[name] = np.random.uniform(size=shape).astype(dtype)
return in_data
def _run_relay(mod, params, in_data, pass_enabled):
target = "llvm"
dev = tvm.device("llvm", 0)
in_data = [tvm.nd.array(value) for value in in_data.values()]
if pass_enabled:
mod = relay.transform.InferType()(mod)
mod = relay.transform.ToANormalForm()(mod)
mod = relay.transform.InferType()(mod)
mod = AnnotateUsedMemory()(mod)
# create primitive functions
mod = relay.transform.FuseOps()(mod)
print(f'\nmod when AnnotateUsedMemory is {pass_enabled}:\n {mod}')
out_data = relay.create_executor(
"graph", mod, device=dev, target=target).evaluate()(*in_data, **params)
return out_data.numpy()
def _verify_results(mod, params, in_data, rtol=1e-5, atol=1e-5):
before = _run_relay(mod, params, in_data, False)
after = _run_relay(mod, params, in_data, True)
np.testing.assert_allclose(before, after, rtol, atol)
def test_resnet():
num_class = 1000
in_data_shapes = OrderedDict({"data": (1, 3, 224, 224)})
in_data = _get_data(in_data_shapes, dtype="float32")
for n in [18]: # 18, 34, 50, 101
mod, params = tvm.relay.testing.resnet.get_workload(
batch_size=1, num_classes=num_class, num_layers=n)
_verify_results(mod, params, in_data)
if __name__ == "__main__":
pytest.main([__file__]) I am not familar with |
Hi @zhaoyang-star, thanks for taking a look, its great to see this pass being used elsewhere. The pass currently expects the input to be a module of primitive functions so I would suggest running
I did try running your example locally with the above change and this produced the relevant |
@lhutton1, I want to confirm: Did you reproduce the issue( no If I placed the FuseOps before AnnotateUsedMemory just as you showed, there is a error |
Hi @zhaoyang-star, yes I was able to reproduce the issue with your script. The script I have would be the same as yours just with the a different pass order as mentioned above. Placing |
…pache#11208) * [AOT] Calculate used memory at the callsite of primitive functions Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations. Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89 * small fix to file description Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2 * Various improvements addressing comments In addition, a new "io_used_memory" annotation is added to the main function which refers to the total size of the IO tensors in the provided module, enabling these to be discounted from memory pressure calculations where necessary. Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86 * addressing comments Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd * add note for dynamic shapes Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations.
Note: this PR is dependent on #11091 so also shows the contents of that PR.cc @Mousius @NicolaLancellotti @ekalda @manupa-arm