-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve separation of PyTorch dtypes and TVM dtypes in relay PyTorch frontend #5779
Conversation
So what I did here in addition to what is seen was to locally assert that no The next step will be to apply the data type of input tensors on scalar inputs, but I'll keep that for a separate PR. |
f01a143
to
f66d9f0
Compare
The quantized isn't there yet, sorry about that. I'll have a fix in a bit. |
…frontend Previously, we sometimes used type strings taken from PyTorch (e.g. float) in TVM. Now we aim to convert PyTorch dtypes early and only work with TVM dtypes afterwards. Also fix arange/linspace type semantics. A follow-up will then further improve type support (at which point we will be able to convert e.g. double models).
f66d9f0
to
0c5601b
Compare
Now it's all happy. :) |
It seems you are allowing If we use Torch IR input names, users need to manually inspect IR and somehow remember these names. The tricky part is Torch sometimes changes these input names when copying or saving/loading the same modules. So in the end what TVM expects as input names can be different from what users see as inputs to Torch IR. To workaround this, we decided not to use names chosen by Torch and instead let users choose and supply input names (something obvious like input0, input1 that don't require remembering) as part of |
Note that I don't remove the possibility to pass in names. As the thread suggests, people will find that useful. I'm not sure why you would have to insist on passing them if the user is fine with the TorchScript provided ones. I'm not taking away passing input names, I just soften the mandates. Passing the shapes should be needed very little, and I am surprised that you would need the user to do that. Ignoring the dtypes in of the inputs is actively terrible. How about doing the following:
If you insist, I could also live with just the last three. |
The other part is that splitting at a potential |
If you want to use names chosen by Torch, how are you going to figure out the correct names to give to TVM at deploy time? The names are the one attached to the graph after this line https://github.com/apache/incubator-tvm/blob/master/python/tvm/relay/frontend/pytorch.py#L2504, rather than the graph you supply to the frontend. You also need to remember whatever names Torch chooses until deploy time, since TVM doesn't export input names but they are needed to correctly set inputs. I think most of the times names don't change. Previously we were using names chosen by Torch, but due to the corner case reasons discussed in the thread above, we decided to it is better to let user chooses whatever name they like (ones that don't require remembering).
Passing the pairs of (name, shape) is common across other frontends. We initially started with the same API as others, and we haven't found a good reason to deviate from it (we did change dict to list, since the argument order matters in Torch). Yes Torch knows the shape when traced, but for scripted cases it doesn't. The Relay itself is not ready for dynamic shape input. For these reasons, we require input shapes to be passed explicitly. Since TVM users are supposed to know the input shape, I don't think it is a problem. Passing dtypes is not something we (not only pytorch, but other frontends too) thought about, since we always assume float32 inputs. We can discuss how to integrate them. But most of the times inputs are fp32, so I don't want to introduce breaking API changes to allow dtype option. |
Thank you for insisting on using stable names. The user-supplied(!) names are the part before the (last, ha, here only) The function ultimately doing this in PyTorch is DebugNameBase:
I have to strongly differ that most inputs are fp32, starting with anything NLP. |
Do you mean Torch allows users to set the argument name? If you also know when and how exactly Torch changes input names, then sure I can see passing another names for TVM would be annoying. But I'd argue that most users are not familiar with such details of Torchscript, so we shouldn't expect them to correctly deal with names chosen by Torch. Requiring input names are common across other frontends. I think making it optional makes API a bit confusing and we need to explain what input names are expected if omitted, while benefiting only users who are intimately familiar with Torchscript internals. Making the API as close as possible to other frontends also applies to input shapes, so I don't want to make it optional, either. Shapes are required because Relay assumes static input shapes. So my opinion is not make |
Well, I see that this is not going anywhere.. |
It would be great to have a constructive discussion about the technical choices, agree on the pros and cons, before we reach a conclusion. Everyone is contributing to a common project (and we value everyone's opinion) and I think it would be great if we can have a clear discussion. We also need to acknowledge that engineering decisions have tradeoffs and there is no true answer to the problem. One common way I find that I find useful, is to dissect the discussion, label each discussion points, try to agree on sub points and rationales. In the conversation so far, I see a few choices:
We can then discuss their pros and cons. For example T0 is certainly more convenient, but it also depends on the stablity of the torchscript's ability to keep names. T1 is more explicit when a user intend to name the input. D0 solves most of the common problems, but as the machine learning models move to mixed precision, we will inevitably want to support more data types, that likely makes D1 more appealing. Because the pros and cons are mainly technical, I hope that most of us can agree on the technical points. The main thing that we might not agree on, would be something like the priorization of technical tradeoffs. For example, I might favor clear naming scheme over implicit and thus prefer T2. A different person might think simplicity is key and fp32 is fine, so D0 is OK. This should be the only part we disagree on. When we find more comon grounds, it is much easier to reach agreements. In the cases as this, one thing we can do is to have a constructive discussion, and perhaps bringing up more people to see what everyone's thoughts. Having a clear summary and dissected discussion also helps others to quickly understand the situation and share their opinions. In many cases we can find that we do not disagree that much after all. It could be a good discuss forum thread. Regardless of the outcome, I want to say that we value good technical debates and usually they leads to better code overall. Many parts of this PR are certainly valuable, like the better data type handling. So let us have a good conversation and bring a better Pytorch support for everyone. |
I don't have any objection regarding dtype handling in this PR. At the moment we assume fp32 everywhere by default, but to support doubles we need to somehow pass dtype information from user. I think we can pass an optional list of dtypes, corresponding to the entries in I think allowing But it doesn't work for scripted modules. If names are omitted, user need to be aware of (or we need to explain) how to correctly figure out the names Torch maintains. Since this is not trivial and Torch may change the way they handle naming on a whim in the future, we shouldn't rely on naming chosen by Torch. On top of above, I think it is better to keep the API as close as possible to other frontends. They all require input names and shapes to be passed explicitly. |
Seems the most contentious part is the name handling? It would be great if we can also list all the alternatives (in labeled form), and discuss their sides, then talk about the reasoning :) It will make the reasoning clear. @t-vi perhaps we can first go forward with dtype handling and discuss the name and shape handling in the forum? |
Sorry, but the dtypes discussion isn't for me. Working on NLP models like BERT, if I have to argue that non-fp32 inputs are important, TVM is not a good choice for that work. The "ways to handle names" discussion is equally sad, not for the outcome but for the type of arguments, I would prefer to leave the status quo unchanged over having a discussion. I have pushed an update that keeps the requirement and exact layout of input_shapes from the original interface. However, it still has dtype handling (and now this dtype handling is non-optional) and incidentally, a unit test for rsub relied on the incorrect dtype handling, so I had to implement proper type promotion as well (but only in applied in rsub for now). I'll not reopen the PR though because it still does the dtype handling I don't want to preempt your discussion. |
I don't think we disagree in many cases. Let me try to rephrase @t-vi's the dtype part of the arguments:
I think we all agrees to F0 and F1. Now one of @masahi 's argument is
As we can see that K0 do not necessarily conflict with F0 and F1. As an outsider to the discussion, it is a bit harder for me to express my thought(without looking at the code). It would be easier if we can list interface candidates during the discussion, and discuss their pros and cons. There is certainly a tradeoff we need to make, in terms of ease of use, level of compatibility etc. I need a bit more time to dig in order to understand the name argument. But at a high level (this is my opinion) I think it makes sense to inheritate names from the source(if they are available) while allow users to override them. The main reason why certain name/shapes are requirement in frontend is that many cases these information are incomplete. Again having interface candidates will be helpful here. The main reason for a discussion is not necessarily argue for which side is right, but actually to clarify the intent, and reach concensus(sometimes both sides actually already agrees to). So that in the future others can look into it. Also in cases like this discussions are also important to find out the best ways for integeration (e.g. not about whether or not shall we do dtype handling, but how to best do them). Discussions also serves the good purpose on learning. It would be great for everything to share the understanding of the "status quo" and contentious points. Sometimes the problem of the disagreement is not what we need to do to support these, but how to best resolve the situation. We can certainly create followup discussions in the forum. |
I think we can just do a PR :) |
As I cannot reopen this, I opened #5834 . Thank you for the discussion. |
Previously, we sometimes used type strings taken from PyTorch (e.g. float) in TVM. Now we aim to convert PyTorch dtypes early and only work with TVM dtypes afterwards.
Also fix arange/linspace type semantics.
A follow-up will then further improve type support (at which point we will be able to convert e.g. double models).