-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TVM] Fix GatherBound to avoid allocating too much #2104
Conversation
a4d3592
to
65967a1
Compare
@junrushao1994 @were can you help review this PR? |
Will do this Wednesday. |
The problem here is non-trivial. |
Hi, I'd like to share some results of investigation regarding BoundsChecker problem #2079 . I think they are closely related to the current issue as well. Both examples show cases, where problem is clearly exists, but BoundsChecker can't be used effectively. Both programs below perform operations on intermediate tensor Uninitialized data passed to computationsIn this case we use
In this case TVM generates the code below. Note the correct allocation and
Uninitialized data + fallacious allocationsIn this version we make the
As before,
ResultsThis examples demonstrate how a typo from user may remain undetected and lead to bugs which are hard to debug. We see that uninitialized data is passed to computations in both examples. It may spread NaNs or even become a security concern. In the last example TVM decides to alter the size of intermediate tensor. In general, It is clear that the compiler should be able to reduce tensors during optimizations, i.e. in So far we saw negative results of this feature. @tqchen @junrushao1994 could you please name some cases where tensor expansion is a desired behavior? |
superceded by #3526 |
This is a fix for the problem mentioned here on the forum.
During compilation TVM sometimes reshapes tensors using the information from its uses. This is a very important feature because sometimes a schedule may introduce a new tensor (e.g. via
cache_write
), but use only a handful of elements out of it, in which case this transformation may reduce the tensor's size and remove unnecessary computations of unused elements. However, the same transformation sometimes expands tensor sizes. This happens because computing precise ranges for expressions is impossible in general, and often leads to overapproximation. In the example from the forum the expression isi + (0 - i)
, for which the range is evaluated to be [-9; 9] because the range evaluation algorithm assumes that subexpressionsi
and-i
are independent. Of course, this particular example may be simplified but it is always possible to construct an example that the simplifier won't be able to simplify.The solution I propose is to compare the range computed from the uses and the original range from the tensor declaration, and use the smaller one. There are several concerns though:
test_bound_nest_thread
fromtest_schedule_bound_inference.py
fails (it uses a global variablem
as a tensor size).test_bound_nest_thread
may also be caused to fail by fixingm
to be any integer less than 32. The failure is due to additional checks atmessage_passing.cc:36
andschedule_ops.cc:184
, and I'm not sure if they are really necessary.test_schedule_bound_condition
fails because with this fix it cannot reproduce the original problem anymore ([SCHEDULE] Generate Lower Bound Conditions #1014). I removed it for now.(Also I considered another solution, namely intersect the two ranges. In theory this should be perfect, but in practice it results in too complex expressions, which eventually lead to messages like this:
loop_partition.cc:356: Cannot prove: ((((((1 + min((c.outer.h.fused/112), 7)) - max((c.outer.h.fused/112), 0)) - 1) - (8 - max((c.outer.h.fused/112), 0))) + 1) >= 0), when generating the post doubt loop
and also to errors like this:
TVMError: [17:49:02] split_host_device.cc:116: Check failed: !use_count_.count(v) variable c.outer.h.fused has been used before definition!
.The reason is that some local variables leak into tensor bounds which are then used to generate certain conditions.)
@tqchen Could you please provide some comments regarding this issue?