-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Symbolic shape runtime #2451
Comments
cc @jroesch @dmlc/tvm-comitter I think we also need to discuss the scope of this proposal. Seems the current proposal it is good for doing dynamic shape in the case of object detector, a simple but effective generalization to the current version. We also need to discuss how to handle the kernels when the shape change present. |
+1 to the general idea of this proposal. We should also consider making the One example of when allocation might be necessary is during online learning. If the user doesn't fully anticipate the dataset, a large input could come in and crash the system (which is a Bad Thing™). This becomes even more relevant when we consider the complex programs (like loops and graphs) that can be executed by Relay. |
Does |
@eqy So far when I pass |
@nhynes In current PR, |
It has been a while and I hope I can hear from you guys about plan on this PR. Also I suggest update roadmap on time so everyone is able to know what is going on for all future features. |
As discusses with @antinucleon in PR #2488 , I advice we extend the functionality supporting auto increasing the memory allocation when the input size larger than hint. The factor could be 1.5 like Folly's vector. |
Memory planning is a great starting point! To achieve dynamic batch inference, we might want to select kernels for different batch sizes, to get descent performance comparing to sequential inference. This depends on the way we support multi-batch kernel schedules. Setting some buckets between [1, upper_bound] might be a possible way. It's also possible to optimize for some values of batch sizes, such as power of 2, and then combine them to generate other batches. @yuruofeifei and I can start to look at this problem. |
@FrozenGene Because we don't want to introduce shape inference in runtime, if we simply linear expand storage pool like a vector, it will make potential bug. If we want to dynamic change, we have to encode storage pool shape expression in json file. |
It has been two weeks. Please let us know what else we should change to land this PR.. Thanks. |
Let us wait after 0.5 release and then we revisit the proposal. The current proposal seems to be a good intermediate-term solution for some of the problems. @antinucleon can you post some end to end examples using the new set of changes to give everyone a sense on what can it do? I am still not very sure whether we propose to use bucketing, or generic shape size. Perhaps @ajtulloch @jroesch can also comment |
The RFC addressed how to support symbolic shape in terms of memory planing. We'd better come up a way how to generate efficient dynamic kernels also. |
Let's decompose this problem. In order to support dynamic shape, we have to fix two problem:
Memory management will determine whether a program is able to run or not, then may determine efficiency of the runtime. However Kernel support only determines efficiency. If you agree with this, we can future decompose what blocks us: memory management. There are two ways to handle dynamic memory: static allocation and create many views on static memory, or do it in dynamic, where we can call it AOT or JIT. Ideally, if we have a very powerful allocator, two ways are equivalent. In this RFC we are going to use AOT approach. At this point, we already have solutions to deal with memory management. Let's come back to efficiency problem: kernels.
----------------------------This is a separation line------------------------ End to end changes: Python API:
In graph json, three attributes are added:
In runtime, if This change is good for dynamic batch, as long as kernel loop on batch axis is not significantly slower. |
For dynamic kernel, one straightforward way is just including all possible kernel functions into shared lib. However, this can result in too many functions in lib file. We might want a way which just generate less kernel functions, such as just including more common batch sizes, and provide fair performance for other cases. |
I am just curious if anybody has any insight about what are the most frequently used batch sizes for a certain model? Is it just a few or many different possible ones? And how much memory would it cost? My concern is that it would be much harder to determine the |
@zhiics Ideally, JIT + a very good dynamic memory allocator will be most elegant solution. However this plus dynamic control flow, is not something we can quickly use . On server side inference, when memory is not very sensitive, we can use current solution. The graph runtime size is not changed too much as we only introduced a tiny expression for symbolic shape evaluation. The new size of |
superceded by relay vm solution, close for now |
Problem: In real world workload, not everything is static. We may want to inference image in different size, or batch size. In some work load, we may need to contact different number of embedding for each instance.
Design: Introduce
upper_bound
idea for memory allocation, use different view for different inputs. For example, the inference input shape is [n, 3, 224, 224], we will give hint of n's upper bound 5, then first time memory allocation is based on [5, 3, 224, 224], but we may create view of [1, 3, 224, 224], [3, 3, 224, 224] during running. In this way we are using some memory to trade of dynamic.Implementation:
During memory planing, given a map of
{var : upper_bound}
, memory planing process creates two columns:[storage_id, max_bytes]
for runtime setup.The shape in graph json is no longer a int, but an infix string expression of variable and constant.
When setup runtime, storage pool is set up by
[storage_id, max_bytes]
columns. The infix expression of shape is converted to postfix expression for fast evaluation. When variable in shape changes, the runtime will run evaluation of all expressions contains variable, then update view if this view is not in cache.PR: #2447
The text was updated successfully, but these errors were encountered: