[RFC] Symbolic shape runtime #2451

antinucleon · 2019-01-17T04:05:43Z

Problem: In real world workload, not everything is static. We may want to inference image in different size, or batch size. In some work load, we may need to contact different number of embedding for each instance.

Design: Introduce upper_bound idea for memory allocation, use different view for different inputs. For example, the inference input shape is [n, 3, 224, 224], we will give hint of n's upper bound 5, then first time memory allocation is based on [5, 3, 224, 224], but we may create view of [1, 3, 224, 224], [3, 3, 224, 224] during running. In this way we are using some memory to trade of dynamic.

Implementation:

During memory planing, given a map of {var : upper_bound}, memory planing process creates two columns: [storage_id, max_bytes] for runtime setup.
The shape in graph json is no longer a int, but an infix string expression of variable and constant.
When setup runtime, storage pool is set up by [storage_id, max_bytes] columns. The infix expression of shape is converted to postfix expression for fast evaluation. When variable in shape changes, the runtime will run evaluation of all expressions contains variable, then update view if this view is not in cache.

PR: #2447

The text was updated successfully, but these errors were encountered:

tqchen · 2019-01-17T17:18:16Z

cc @jroesch @dmlc/tvm-comitter

I think we also need to discuss the scope of this proposal. Seems the current proposal it is good for doing dynamic shape in the case of object detector, a simple but effective generalization to the current version.

We also need to discuss how to handle the kernels when the shape change present.

nhynes · 2019-01-17T17:43:43Z

+1 to the general idea of this proposal. We should also consider making the max_bytes optional and (for now) adding stubs for delegating to the device allocator just in case the user needs more flexibility.

One example of when allocation might be necessary is during online learning. If the user doesn't fully anticipate the dataset, a large input could come in and crash the system (which is a Bad Thing™). This becomes even more relevant when we consider the complex programs (like loops and graphs) that can be executed by Relay.

eqy · 2019-01-17T19:05:26Z

Does upper_bound mean anything between 1 and upper_bound is likely to be used---or is it more often the case that we only care about a few values? If we want kernels to actually specialize for the difference in values (e.g., actually batching inference instead of running batch-1 n times), then this choice will affect how much time we need to spend optimizing different kernel shapes.

antinucleon · 2019-01-17T21:24:52Z

@eqy upper_bound is a hint for storage planing. With this hint, we can safely allocate memory trunk large enough without reallocating when shape changes between [1, upper_bound].

So far when I pass tvm.var`` to generate kernel I have no issues because this axis is just a loop. For specialization between [1, upper_bound]```, I think it is not a hard problem if we have schedules and hints.

antinucleon · 2019-01-17T21:29:22Z

@nhynes In current PR, max_bytes is optional if all shape is static. This solution is able to solve a huge range of problems in efficient way. For example, in production the bound is determined by developer but not the users. For on the fly dynamic execution, we can't use current graph runtime.

antinucleon · 2019-01-23T17:42:39Z

It has been a while and I hope I can hear from you guys about plan on this PR. Also I suggest update roadmap on time so everyone is able to know what is going on for all future features.

FrozenGene · 2019-01-24T02:47:45Z

As discusses with @antinucleon in PR #2488 , I advice we extend the functionality supporting auto increasing the memory allocation when the input size larger than hint. The factor could be 1.5 like Folly's vector.

kevinthesun · 2019-01-24T06:49:31Z

Memory planning is a great starting point! To achieve dynamic batch inference, we might want to select kernels for different batch sizes, to get descent performance comparing to sequential inference. This depends on the way we support multi-batch kernel schedules. Setting some buckets between [1, upper_bound] might be a possible way. It's also possible to optimize for some values of batch sizes, such as power of 2, and then combine them to generate other batches. @yuruofeifei and I can start to look at this problem.

antinucleon · 2019-01-24T19:22:37Z

@FrozenGene Because we don't want to introduce shape inference in runtime, if we simply linear expand storage pool like a vector, it will make potential bug. If we want to dynamic change, we have to encode storage pool shape expression in json file.

antinucleon · 2019-01-30T19:26:52Z

It has been two weeks. Please let us know what else we should change to land this PR.. Thanks.

tqchen · 2019-01-31T18:16:51Z

Let us wait after 0.5 release and then we revisit the proposal. The current proposal seems to be a good intermediate-term solution for some of the problems.

@antinucleon can you post some end to end examples using the new set of changes to give everyone a sense on what can it do? I am still not very sure whether we propose to use bucketing, or generic shape size. Perhaps @ajtulloch @jroesch can also comment

ZihengJiang · 2019-01-31T20:35:33Z

The RFC addressed how to support symbolic shape in terms of memory planing. We'd better come up a way how to generate efficient dynamic kernels also.

antinucleon · 2019-02-01T04:12:35Z

Let's decompose this problem.

In order to support dynamic shape, we have to fix two problem:

Memory management.
Kernel support.

Memory management will determine whether a program is able to run or not, then may determine efficiency of the runtime. However Kernel support only determines efficiency.

If you agree with this, we can future decompose what blocks us: memory management.

There are two ways to handle dynamic memory: static allocation and create many views on static memory, or do it in dynamic, where we can call it AOT or JIT. Ideally, if we have a very powerful allocator, two ways are equivalent.

In this RFC we are going to use AOT approach.
To know how much memory we need to allocate for each storage pool node, we also have two approaches: 1. Give a hint, then get a number 2. Pre compute expression of each pool node size. To quickly get it work this PR is using a max_bytes hint to get a fixed size, then create view. Note if we can get expression of storage node, JIT allocator will be much easier too.

At this point, we already have solutions to deal with memory management. Let's come back to efficiency problem: kernels.

Assume we are doing tensorization + vectorization + loop partition, we will have a good default kernel (most of human code today is written in this way).
Assume for each work load, we have different kernel. If during compile we can have a map between the compiled function, we can handle it during create views. (see view_cache_)

----------------------------This is a separation line------------------------

End to end changes:

Python API:

b1 = tvm.var("n1")
b2 = tvm.var("n2")
b3 = tvm.var("n3")

shape_var_bounds = {
    b1 : 128,
    b2 : 32,
    b3 : 12
}



x = relay.var("x",
              shape=[b1,
                     4,
                     2, 3], dtype="float32")


y = relay.var("y",
              shape=[b2,
                     4,
                     2, 3], dtype="float32")

z = relay.op.tensor.concatenate([x, y], axis=0)


a = relay.var("a",
              shape=[b3,
                     4,
                     2, 3], dtype="float32")
b = relay.var("b",
              shape=[27,
                     4,
                     2, 3], dtype="float32")
c = relay.op.tensor.concatenate([a, b], axis=0)

out = relay.op.tensor.concatenate([z, c], axis=0)

func = relay.Function([x, y, a, b], out)


graph, mod, param = relay.build_module.build(func, target="llvm", shape_var_bounds=shape_var_bounds)

print(graph)

rt = tvm.contrib.graph_runtime.create(graph, mod, tvm.cpu())

In graph json, three attributes are added:

"symbolic_shape": 1,
"max_bytes": [
      "list_int",
      [
        12288,
        3072,
        1152,
        2592,
        19104
      ]
    ],
    "var_upper_bound": {
      "n1": 128,
      "n2": 32,
      "n3": 12
    },

In runtime, if symbolic_shape is true, memory allocation will according to max_bytes field.
When shape variables changes, new view will be created.

This change is good for dynamic batch, as long as kernel loop on batch axis is not significantly slower.

kevinthesun · 2019-02-01T04:24:16Z

For dynamic kernel, one straightforward way is just including all possible kernel functions into shared lib. However, this can result in too many functions in lib file. We might want a way which just generate less kernel functions, such as just including more common batch sizes, and provide fair performance for other cases.

zhiics · 2019-02-01T05:07:01Z

I am just curious if anybody has any insight about what are the most frequently used batch sizes for a certain model? Is it just a few or many different possible ones? And how much memory would it cost?

My concern is that it would be much harder to determine the max_bytes and also the number of frequently used sizes would be significant when shape also varies. In addition, I am also curious how much it would affect the size of the current graph runtime.

antinucleon · 2019-02-01T05:41:22Z

@zhiics Ideally, JIT + a very good dynamic memory allocator will be most elegant solution. However this plus dynamic control flow, is not something we can quickly use . On server side inference, when memory is not very sensitive, we can use current solution.

The graph runtime size is not changed too much as we only introduced a tiny expression for symbolic shape evaluation. The new size of libtvm_runtime.dylib is 673KB, with dynamic shape and view cache. Old one is 597kb.

tqchen · 2019-10-08T23:19:04Z

superceded by relay vm solution, close for now

tqchen added the status: RFC label Jan 17, 2019

antinucleon mentioned this issue Jan 23, 2019

[runtime] dynamic shape with AOT memory planing #2488

Closed

tqchen closed this as completed Oct 8, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Symbolic shape runtime #2451

[RFC] Symbolic shape runtime #2451

antinucleon commented Jan 17, 2019 •

edited

Loading

tqchen commented Jan 17, 2019

nhynes commented Jan 17, 2019

eqy commented Jan 17, 2019

antinucleon commented Jan 17, 2019

antinucleon commented Jan 17, 2019

antinucleon commented Jan 23, 2019

FrozenGene commented Jan 24, 2019

kevinthesun commented Jan 24, 2019 •

edited

Loading

antinucleon commented Jan 24, 2019

antinucleon commented Jan 30, 2019

tqchen commented Jan 31, 2019 •

edited

Loading

ZihengJiang commented Jan 31, 2019 •

edited

Loading

antinucleon commented Feb 1, 2019 •

edited

Loading

kevinthesun commented Feb 1, 2019 •

edited

Loading

zhiics commented Feb 1, 2019

antinucleon commented Feb 1, 2019

tqchen commented Oct 8, 2019

[RFC] Symbolic shape runtime #2451

[RFC] Symbolic shape runtime #2451

Comments

antinucleon commented Jan 17, 2019 • edited Loading

tqchen commented Jan 17, 2019

nhynes commented Jan 17, 2019

eqy commented Jan 17, 2019

antinucleon commented Jan 17, 2019

antinucleon commented Jan 17, 2019

antinucleon commented Jan 23, 2019

FrozenGene commented Jan 24, 2019

kevinthesun commented Jan 24, 2019 • edited Loading

antinucleon commented Jan 24, 2019

antinucleon commented Jan 30, 2019

tqchen commented Jan 31, 2019 • edited Loading

ZihengJiang commented Jan 31, 2019 • edited Loading

antinucleon commented Feb 1, 2019 • edited Loading

kevinthesun commented Feb 1, 2019 • edited Loading

zhiics commented Feb 1, 2019

antinucleon commented Feb 1, 2019

tqchen commented Oct 8, 2019

antinucleon commented Jan 17, 2019 •

edited

Loading

kevinthesun commented Jan 24, 2019 •

edited

Loading

tqchen commented Jan 31, 2019 •

edited

Loading

ZihengJiang commented Jan 31, 2019 •

edited

Loading

antinucleon commented Feb 1, 2019 •

edited

Loading

kevinthesun commented Feb 1, 2019 •

edited

Loading