Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AOT] Calculate used memory at the callsite of primitive functions #11208

Merged
merged 5 commits into from
Jun 25, 2022

Conversation

lhutton1
Copy link
Contributor

@lhutton1 lhutton1 commented May 4, 2022

Introduces a new pass in the AOT executor called "AnnotateUsedMemory" which applies liveness analysis to the callsite of each primitive function in order to calculate the total size of the live tensors at this point of execution. The result is provided as a function annotation called "used_memory", which can be consumed by later stages of the compiler (e.g. external codegens) to provide more information about the current memory consumption. This can be useful for some optimizations.

Note: this PR is dependent on #11091 so also shows the contents of that PR.

cc @Mousius @NicolaLancellotti @ekalda @manupa-arm

@github-actions github-actions bot requested a review from manupak May 4, 2022 10:30
@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from 5395581 to 71a7fa6 Compare May 9, 2022 10:11
@lhutton1 lhutton1 marked this pull request as ready for review May 9, 2022 10:13
@lhutton1
Copy link
Contributor Author

lhutton1 commented May 9, 2022

also cc @mbs-octoml @areusch

@@ -0,0 +1,367 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No functional changes here, simply moving manifest_lifetimes.cc ../ (outside scope of vm) and splitting into .cc/.h

Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lhutton1 !

I did a take a first look. It looks broady good.
Few suggestions for more test cases and a question about using the the stack.

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
tests/python/relay/aot/test_used_memory_annotator.py Outdated Show resolved Hide resolved
Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @lhutton1 @manupa-arm , just had a couple questions on this one

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/manifest_lifetimes.cc Outdated Show resolved Hide resolved
Copy link
Contributor Author

@lhutton1 lhutton1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews @manupa-arm, @areusch, @altanh - hoping to have a revised version ready soon!

src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/aot/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/manifest_lifetimes.cc Outdated Show resolved Hide resolved
@areusch
Copy link
Contributor

areusch commented May 23, 2022

thanks @lhutton1 for the replies! ping us when this is ready for review again.

@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from 71a7fa6 to cd58ca9 Compare May 26, 2022 13:40
@github-actions github-actions bot requested a review from Mousius May 26, 2022 13:41
@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from cd58ca9 to 9241b66 Compare May 26, 2022 16:51
@lhutton1
Copy link
Contributor Author

Apologies for the delay, this is ready for another look!

@areusch
Copy link
Contributor

areusch commented May 26, 2022

ok thanks @lhutton1 ! i'll defer to @altanh on this one

Copy link
Contributor

@altanh altanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mostly LGTM with some small questions/nits!

src/relay/backend/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved
src/relay/backend/annotate_used_memory.cc Outdated Show resolved Hide resolved
src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved
@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch 2 times, most recently from 297c62e to 4d95daa Compare May 31, 2022 08:31
Copy link
Contributor

@altanh altanh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! LGTM

src/relay/backend/annotate_used_memory.cc Show resolved Hide resolved
@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from 4d95daa to 1c274d7 Compare June 6, 2022 09:28
Copy link
Contributor

@manupak manupak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@manupak
Copy link
Contributor

manupak commented Jun 24, 2022

@lhutton1 since it has been 18 days, should we re-run a round of CI -- just to be sure :)

@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from 1c274d7 to 9e6db41 Compare June 24, 2022 16:21
lhutton1 added 5 commits June 24, 2022 16:21
Introduces a new pass in the AOT executor called "AnnotateUsedMemory"
which applies liveness analysis to the callsite of each primitive
function in order to calculate the total size of the live tensors at
this point of execution. The result is provided as a function annotation
called "used_memory", which can be consumed by later stages of the
compiler (e.g. external codegens) to provide more information about the
current memory consumption. This can be useful for some optimizations.

Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89
Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2
In addition, a new "io_used_memory" annotation is added to the main
function which refers to the total size of the IO tensors in the
provided module, enabling these to be discounted from memory pressure
calculations where necessary.

Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86
Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd
Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
@lhutton1 lhutton1 force-pushed the annotate-mem-usage branch from 9e6db41 to 89f7523 Compare June 24, 2022 16:21
@manupak manupak merged commit 6d6e070 into apache:main Jun 25, 2022
@manupak
Copy link
Contributor

manupak commented Jun 25, 2022

Thanks @lhutton1 @altanh @areusch ! This is merged now

@lhutton1 lhutton1 deleted the annotate-mem-usage branch June 25, 2022 18:30
blackkker pushed a commit to blackkker/tvm that referenced this pull request Jul 7, 2022
…pache#11208)

* [AOT] Calculate used memory at the callsite of primitive functions

Introduces a new pass in the AOT executor called "AnnotateUsedMemory"
which applies liveness analysis to the callsite of each primitive
function in order to calculate the total size of the live tensors at
this point of execution. The result is provided as a function annotation
called "used_memory", which can be consumed by later stages of the
compiler (e.g. external codegens) to provide more information about the
current memory consumption. This can be useful for some optimizations.

Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89

* small fix to file description

Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2

* Various improvements addressing comments

In addition, a new "io_used_memory" annotation is added to the main
function which refers to the total size of the IO tensors in the
provided module, enabling these to be discounted from memory pressure
calculations where necessary.

Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86

* addressing comments

Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd

* add note for dynamic shapes

Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
@zhaoyang-star
Copy link
Contributor

Hi @lhutton1 , thanks for your contributition.
After running FuseOps pass, I want to get the memory usage per op or per primitive func by AnnotateUsedMemory pass for furture optimization. I get a resnet18 ir model, then put it as the input IRModule of AnnotateUsedMemory pass. The output IRModule has no used_memory attr. Test code as follow:

import pytest
from collections import OrderedDict
import numpy as np
import tvm
from tvm import relay
from tvm.relay import testing


def AnnotateUsedMemory():
    return relay.transform._ffi_api.AnnotateUsedMemory()


def _get_data(in_data_shapes, dtype="float32"):
    in_data = OrderedDict()
    for name, shape in in_data_shapes.items():
        in_data[name] = np.random.uniform(size=shape).astype(dtype)
    return in_data


def _run_relay(mod, params, in_data, pass_enabled):
    target = "llvm"
    dev = tvm.device("llvm", 0)
    in_data = [tvm.nd.array(value) for value in in_data.values()]

    if pass_enabled:
        mod = relay.transform.InferType()(mod)
        mod = relay.transform.ToANormalForm()(mod)
        mod = relay.transform.InferType()(mod)
        mod = AnnotateUsedMemory()(mod)
        # create primitive functions
        mod = relay.transform.FuseOps()(mod)

    print(f'\nmod when AnnotateUsedMemory is {pass_enabled}:\n {mod}')

    out_data = relay.create_executor(
        "graph", mod, device=dev, target=target).evaluate()(*in_data, **params)
    return out_data.numpy()


def _verify_results(mod, params, in_data, rtol=1e-5, atol=1e-5):
    before = _run_relay(mod, params, in_data, False)
    after = _run_relay(mod, params, in_data, True)
    np.testing.assert_allclose(before, after, rtol, atol)


def test_resnet():
    num_class = 1000
    in_data_shapes = OrderedDict({"data": (1, 3, 224, 224)})
    in_data = _get_data(in_data_shapes, dtype="float32")
    for n in [18]:  # 18, 34, 50, 101
        mod, params = tvm.relay.testing.resnet.get_workload(
            batch_size=1, num_classes=num_class, num_layers=n)
        _verify_results(mod, params, in_data)


if __name__ == "__main__":
    pytest.main([__file__])

I am not familar with AnnotateUsedMemory pass. Could memory usage per op or per primitive func be gotten by your pass? If not, how to get it based on your pass? Thanks in advance ^_^

@lhutton1
Copy link
Contributor Author

Hi @zhaoyang-star, thanks for taking a look, its great to see this pass being used elsewhere. The pass currently expects the input to be a module of primitive functions so I would suggest running AnnotateUsedMemory after FuseOps similar to:

mod = relay.transform.InferType()(mod)
mod = relay.transform.FuseOps()(mod)
mod = relay.transform.InferType()(mod)
mod = relay.transform.ToANormalForm()(mod)
mod = relay.transform.InferType()(mod)
mod = AnnotateUsedMemory()(mod)

I did try running your example locally with the above change and this produced the relevant used_memory annotations. However, it looks like there is an issue while building the module after having run the AnnotateUsedMemory pass. Without digging too much into it I would suspect it's because this pass wasn't considered for the graph executor; only for the AOT executor. I believe changes similar to #11091 would be needed in the graph executor to support A-normal form. Hope this helps :)

@zhaoyang-star
Copy link
Contributor

zhaoyang-star commented Oct 17, 2022

Hi @zhaoyang-star, thanks for taking a look, its great to see this pass being used elsewhere. The pass currently expects the input to be a module of primitive functions so I would suggest running AnnotateUsedMemory after FuseOps similar to:

mod = relay.transform.InferType()(mod)
mod = relay.transform.FuseOps()(mod)
mod = relay.transform.InferType()(mod)
mod = relay.transform.ToANormalForm()(mod)
mod = relay.transform.InferType()(mod)
mod = AnnotateUsedMemory()(mod)

I did try running your example locally with the above change and this produced the relevant used_memory annotations. However, it looks like there is an issue while building the module after having run the AnnotateUsedMemory pass. Without digging too much into it I would suspect it's because this pass wasn't considered for the graph executor; only for the AOT executor. I believe changes similar to #11091 would be needed in the graph executor to support A-normal form. Hope this helps :)

@lhutton1, I want to confirm: Did you reproduce the issue( no used_memory attr in the output log) using my script above? If you ran all right, could you please share your script? There is only one io_used_memory attr and no used_memory attr found after running my script.

If I placed the FuseOps before AnnotateUsedMemory just as you showed, there is a error Check failed: (tensor_type) is false:. You have mentioned maybe we should support ANF in graph executor to solve the error.

@lhutton1
Copy link
Contributor Author

Hi @zhaoyang-star, yes I was able to reproduce the issue with your script. The script I have would be the same as yours just with the a different pass order as mentioned above. Placing FuseOps before AnnotateUsedMemory seems like the correct thing to do here; if you print out the module (mod) after the AnnotateUsedMemory pass you should be able to see the used_memory annotations. The Check failed: (tensor_type) is false: error comes later in the compilation so it seems as though some later optimization passes cannot deal with ANF yet.

mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
…pache#11208)

* [AOT] Calculate used memory at the callsite of primitive functions

Introduces a new pass in the AOT executor called "AnnotateUsedMemory"
which applies liveness analysis to the callsite of each primitive
function in order to calculate the total size of the live tensors at
this point of execution. The result is provided as a function annotation
called "used_memory", which can be consumed by later stages of the
compiler (e.g. external codegens) to provide more information about the
current memory consumption. This can be useful for some optimizations.

Change-Id: I8d6b7447498f19260358bbefe34029ddd86b9c89

* small fix to file description

Change-Id: I0e460f6cf43f9b12ffa5fc66fcb68e55304daeb2

* Various improvements addressing comments

In addition, a new "io_used_memory" annotation is added to the main
function which refers to the total size of the IO tensors in the
provided module, enabling these to be discounted from memory pressure
calculations where necessary.

Change-Id: Iafe9c85d7fc69c77a2115ed4efe7645160387c86

* addressing comments

Change-Id: I00f5ba80d5e004076e4c27d39bec143178b3b1dd

* add note for dynamic shapes

Change-Id: If6409e2953addfc880bcc6d95083b78bdf5a23d0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants