Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch ONNX export #1333

Open
12 of 15 tasks
albertz opened this issue May 19, 2023 · 98 comments
Open
12 of 15 tasks

PyTorch ONNX export #1333

albertz opened this issue May 19, 2023 · 98 comments
Assignees
Labels

Comments

@albertz
Copy link
Member

albertz commented May 19, 2023

This issue is to track any aspects and issues on PyTorch (#1120) ONNX export.

Other things which are less important for now:

  • Avoiding TracerWarnings for some sanity checking code: I think it's not really possible in an easy way.
@albertz
Copy link
Member Author

albertz commented May 19, 2023

The initial version is done already (export_to_onnx.py) and works for pure PT code, so I think the main work is done. We can already close the issue. But let's anyway add further aspects on this here.

@albertz albertz closed this as completed May 19, 2023
@albertz
Copy link
Member Author

albertz commented May 19, 2023

One aspect is that convolution with padding="same" is not supported by ONNX. (See also here: #1120 (comment)) However, we can simply work around that by doing the padding manually. The code is already there for the RF, and just needs to be used. We can use torch.onnx.is_in_export_mode() to test whether we should use it.

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

As a form of summary: the tool is mainly done, but there are some operators that aren't yet supported, one of which is the convolutional layer padding when specified as a string ("same" or "valid"). This prevents the demos from being run with the tool, which is one of my top priorities.

I found that it's not actually padding="same" which is not implemented, it's actually type(padding) is str what fails: a string can't be provided as padding, since I tried padding="valid" (what we had already discussed) and it failed with the same message, even when padding=0 worked fine.

I've been investigating this issue and I'm currently trying to fix it. There's some people online having the same issues and they also provide implementations of custom padding, so it shouldn't be very hard to fix. I'll try to push a version with my changes before the weekend.

@albertz
Copy link
Member Author

albertz commented May 19, 2023

padding="valid" is just the same as padding=0, or not?

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

Yes it is, but it seems that internally any padding given as a string triggers some parameter called aten::_convolution_mode, which isn't implemented in the ONNX conversion.

Anyway, padding="valid" is the easy case to fix, since then we only have to set padding = 0, as you said. padding="same" is a little more tricky in my view: it involves not only calculating the padding for each dimension (which is the easy part) but also ordering it properly in the result.

Also, padding is given as a tuple of maximum 3 dimensions ([[depth,] height,] width) depending on the dimensions of the convolutional layer (3D, 2D or 1D), so I have to investigate a bit the internals in order to check if there's redundant padding inserted that we don't desire: for instance, with a kernel size of 2x2 and a stride of 1, I think we'd only need to insert an additional row and column, but maybe this tuple considers symmetric padding and inserts one row at the start and another at the end, and the same for the column, making the output size not match the input size.

@albertz
Copy link
Member Author

albertz commented May 19, 2023

Anyway, padding="valid" is the easy case to fix, since then we only have to set padding = 0, as you said.

Yes, my question was just to confirm this. So we can just always do this then, no matter if ONNX or so. Just:

if padding == "valid":
    padding = 0

padding="same" is a little more tricky in my view: it involves not only calculating the padding for each dimension (which is the easy part) but also ordering it properly in the result.

We already have the code for adding the padding manual. See the code. It basically does:

if padding == "same" and <manual_padding_required>:
    ...  # add padding to input
    padding = "valid"

Nothing else needs to be done, or not?

@albertz
Copy link
Member Author

albertz commented May 19, 2023

Btw, next step after the current demo works: Test our RF Conformer implementation.

@albertz
Copy link
Member Author

albertz commented May 19, 2023

Actually, I just wonder now: Instead of manually adding the padding to the input, could we maybe just set padding = (filter_size - 1) // 2? But probably not really in all cases. E.g. only works with odd filter_size. No striding. Etc. But it might be worth to still have this case, as it might be faster than the manual padding.

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

I was able to insert torch.onnx.is_in_export_mode() into the already developed code which creates manual striding. However, this only works for the RF demo: the PT demo as it is can't be executed, since it also uses padding="same" but we have no handler for pure PT code. Should we just let these details to the user, or also map "same" to a valid padding size?

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

could we maybe just set padding = (filter_size - 1) // 2?

Internally, the forward() implementation of convolutional layers is as follows:

        if self.padding_mode != 'zeros':
            return F.conv1d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
                            weight, bias, self.stride,
                            _single(0), self.dilation, self.groups)

where self._reversed_padding_repeated_twice([i, j]) is [i, i, j, j]. This variable is actually calculated at initialization time. I think if we also do that and introduce an attribute like this one in our convolutional layers, we can save quite a bit of operations as we wouldn't have to calculate the same paddings for every forward, right?

Or maybe we could cache it somehow, or add an additional layer attribute self.padding_sizes at runtime?

@albertz
Copy link
Member Author

albertz commented May 19, 2023

I was able to insert torch.onnx.is_in_export_mode() into the already developed code which creates manual striding. However, this only works for the RF demo: the PT demo as it is can't be executed, since it also uses padding="same" but we have no handler for pure PT code. Should we just let these details to the user, or also map "same" to a valid padding size?

Change padding="same" to padding=2 in the Torch demo. That should fix this issue.

@albertz
Copy link
Member Author

albertz commented May 19, 2023

Internally, the forward() implementation of convolutional layers is as follows ...

This is only for the case padding_mode != 'zeros', which we don't support at all currently, right?

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

Ah you're right, I confused padding with padding_mode. Still, aren't we calculating the same padding numbers for every forward()?

@albertz
Copy link
Member Author

albertz commented May 19, 2023

I'm not exactly sure what you mean. The padding numbers don't need to be calculated at all. F.pad will just add them. F.pad will pad the input. The input is different for every forward() call of course, so there is nothing we can cache.

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

Oh I see, yes, I neglected the effect of the dynamic dimensions on convolution, I was thinking more like image processing where the input is usually the same. Nevermind.

@Icemole
Copy link
Collaborator

Icemole commented May 19, 2023

I've pushed some commits that should fix some of the issues with the exporting tool. Hopefully I explained everything nicely.

With the current state of the tool (4785332), I can properly run both configs by the ONNX exporting tool. However, the graphs differ greatly. The number of operations in the RF graph is much bigger than the number of operations in the PT graph. At first sight, the PT graph looks much cleaner than the RF graph. I haven't studied the differences yet, I'll look at it on Monday. In the meantime, you can find both graphs below.

onnx_pt_graph.txt
onnx_rf_graph.txt

@albertz
Copy link
Member Author

albertz commented May 19, 2023

The number of operations in the RF graph is much bigger than the number of operations in the PT graph.

That's very interesting. Thanks for this. We should study this in detail. I already pushed a few things which should reduce this a bit. Some comments from a quick look:

  • A lot of it seems to be related to calculating the output sizes. This is sth which PyTorch doesn't do. Or at least not this simple demo. In practice, you might possibly have sth similar for pure PyTorch models, it's just that we don't do this in the simple demo. Let's not add this to the PT demo now. When comparing complexity, let's ignore this part for now. But still, we should look into it a bit whether it is reasonable.
  • There might be redundant computations. Maybe some output sizes are computed multiple times. Not sure.
  • Some constants like -9223372036854775807 ($-2^{63}+1$, min int64 + 1) look quite suspicious. I'm not sure if this is correct.

@Icemole
Copy link
Collaborator

Icemole commented May 22, 2023

I tried exporting again the RF demo model with the current changes, and the graph is much smaller now! Basically almost equal to the PyTorch graph: onnx_rf_graph_new.txt 🥳

@albertz
Copy link
Member Author

albertz commented May 22, 2023

I tried exporting again the RF demo model with the current changes, and the graph is much smaller now! Basically almost equal to the PyTorch graph: onnx_rf_graph_new.txt 🥳

The actual computation is not just almost identical, but exactly identical, as far as I can see it.

In addition, it defines the output sizes. Which are just the same as the input sizes, because of padding "same", so it's just an identity function here.

I think when there are new calculations of dims, this will still be unnecessary complex. I think we can improve this more. But not sure how important this is.

@albertz
Copy link
Member Author

albertz commented May 22, 2023

I think I reopen this to keep track of the remaining issues.

One remaining issue now is that we do not really check model_outputs, whether that matches what the user actually returns in the forward-step function. And in case of the PT demo, where the user does not provide the dims in the mark_as_output call, it does not exactly match the dims from model_outputs.

@albertz albertz reopened this May 22, 2023
@Icemole
Copy link
Collaborator

Icemole commented May 22, 2023

Edit: you beat me to it 🙂

There seems to be one last thing to figure out before the tool is fully operational. I get an error while running the PyTorch demo:

RuntimeError: number of output names provided (3) exceeded number of outputs (1)

This is the full stack_trace.

I think this is happening because somehow the RF model is outputting the output sizes as well, while the PT model isn't.

PT and RF don't exactly behave the same when calling rf.get_run_ctx().outputs.as_raw_tensor_dict(). PT instantiates the dimensions of rf.get_run_ctx().outputs.data to actual dimensions, while RF keeps the batch dim and the time dim as dynamic dimensions. Not sure if this change in behavior is intended.

Since in the PT demo we don't have access to batch/time dims, the output dimensions are instantiated to actual values. However, this isn't the case in the RF demo, where we have access to the shape of the original tensor as Dim classes.
Since these dimensions are also marked as output, I think this is the essential difference.

I introduced a workaround in 4785332, which is why I was able to post the PT graph above, but it was reverted on bb40b4c. Is there any other way to fix this?

@albertz
Copy link
Member Author

albertz commented May 22, 2023

I would maybe extend init_forward_step_run_ctx to pass a model_outputs template. Then in mark_as_output, it would check for that and follow the logic I described above. Also in the end, some function check_outputs_match or so, which checks whether the all outputs are given.

@albertz

This comment was marked as resolved.

@Icemole
Copy link
Collaborator

Icemole commented May 22, 2023

some function check_outputs_match or so, which checks whether the all outputs are given

And if they don't match, which is what we're seeing here, what to do? We've detected the error, but should we fix it, or do we leave the user to it? If we fix it, do we give priority to the expected outputs, or to the actual outputs? I would argue that the actual outputs would have more importance, but it's an "unexpected" situation in which maybe an automatic fix isn't the most complete thing to do...

@Icemole

This comment was marked as resolved.

@albertz
Copy link
Member Author

albertz commented May 22, 2023

some function check_outputs_match or so, which checks whether the all outputs are given

And if they don't match, which is what we're seeing here, what to do?

Now, we are not seeing this here. If we do that as I described, there would also not be an error, as they would actually match then. But if they do not match, we would just throw an error. What we are seeing is an error from ONNX, which is not what I mean.

If we fix it, do we give priority to the expected outputs, or to the actual outputs?

It's just an error if the outputs do not match. We don't need to handle this further.

@albertz
Copy link
Member Author

albertz commented Jun 26, 2023

Regarding getting rid of the TracerWarnings for those sanity checking code, we could use such code (e.g. via):

with warnings.catch_warnings():
    # PyTorch JIT tracing will produce this warning:
    # TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect.
    #   We can't record the data flow of Python values,
    #   so this value will be treated as a constant in the future.
    #   This means that the trace might not generalize to other inputs!
    # This is fine for us, that we do this check only once, and then not at later runtime anymore.
    warnings.filterwarnings("ignore", message="")
    # <now the original code which produced the warning>

However I'm not sure if this is the best way. Also, this should be encapsulated somehow in a nicer way, as the triggering code is actually not PyTorch specific.

Or we just ignore it for now.

@Icemole
Copy link
Collaborator

Icemole commented Jun 26, 2023

So what minimal code, standalone code (not using RETURNN at all), assumably using torch.full, would reproduce this?

I understood now. This setup fails with the expected warning that you say.

@Icemole
Copy link
Collaborator

Icemole commented Jun 26, 2023

As a completely different note, a constant value in the constructor of torch.tensor() yields two warnings that are different from the one described above:

/home/nbeneitez/Documentos/work/repos/returnn_pytorch/returnn/torch/frontend/_backend.py:676: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  shape = [torch.tensor(dim, dtype=torch.int64) for dim in shape]
/home/nbeneitez/Documentos/work/repos/returnn_pytorch/returnn/torch/frontend/_backend.py:676: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  shape = [torch.tensor(dim, dtype=torch.int64) for dim in shape]

This is the warning I'm worried about with the current code that you fixed. I understand that my previous implementation was bugged because of the reasons you mentioned, but with the creation of a tensor we're adding another two warnings. I'm worried by the fact that, since the call to torch.tensor() returns some tensor(X), such tensor would be registered in the graph as a constant. The reference to the code you fixed in the graph is the following:

  %onnx::Unsqueeze_23 : Long(requires_grad=0, device=cpu) = onnx::Cast[to=7](%classes:size0) # /home/nbeneitez/Documentos/work/repos/returnn_pytorch/returnn/torch/frontend/_backend.py:676:0

It doesn't seem like a constant to me, but maybe we could simply get rid of the warning by casting with long() as I did, but this time checking if the object to cast was a tensor. I'm not sure if the call to sourceTensor.clone().detach() (as the second warning suggests) can cast the tensor to any other type.

@albertz
Copy link
Member Author

albertz commented Jun 26, 2023

So what minimal code, standalone code (not using RETURNN at all), assumably using torch.full, would reproduce this?

I understood now. This setup fails with the expected warning that you say.

So this looks like a bug in the ONNX export of PyTorch, right? Because running the PyTorch code as-is works completely fine.

Can you make a bug report and link it here?

@albertz
Copy link
Member Author

albertz commented Jun 26, 2023

As a completely different note, a constant value in the constructor of torch.tensor() yields two warnings that are different from the one described above ...

Yes I changed the code again. I thought that torch.tensor would work but I was wrong. In the current code, there should be no such warning, right?

@Icemole
Copy link
Collaborator

Icemole commented Jun 27, 2023

So this looks like a bug in the ONNX export of PyTorch, right? Because running the PyTorch code as-is works completely fine.

You mean that it's a bug because the operation is converted to the same onnx::Unsqueeze operation regardless of whether the tensor is being created again with torch.tensor() or being explicitly casted with tensor.long(), but in the first approach it yields a warning and in the second approach it doesn't, right?

Edit: although I believe our original usage of torch.full isn't bugged, because the onnx::ConstantOfShape operator explicitly requires the input to be int64. While it's possible that the exporting code runs, I can see how not respecting the restrictions given in the code, no matter how hidden, would prevent you from running the final ONNX graph.

In the current code, there should be no such warning, right?

Indeed, there's no such warning 👍 the only warnings are the following ones, which we already knew of their existence:

/home/nbeneitez/Documentos/work/repos/returnn_pytorch/returnn/tensor/_tensor_extra.py:327: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if raw_shape[i] != self.batch_shape[i]:
/home/nbeneitez/Documentos/work/repos/returnn_pytorch/returnn/frontend/_backend.py:136: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert all(dim is None or dim == existing_shape[i] for i, dim in enumerate(shape))
WARNING:root:Output 'output' Tensor{'raw_tensor', [B?,'time:var-unk:output'[B?],F'feature:output'(2)]}: Cannot infer dynamic size for dim Dim{'time:var-unk:output'[B?]}. Using Tensor{'full', [B?], dtype='int32'} as fallback.

@Icemole
Copy link
Collaborator

Icemole commented Jun 29, 2023

I was able to create a better minimal example detailing each of the three cases:

  • Not converting from int32. This results in the error with which we're already familiar: UserWarning: The exported ONNX model failed ONNX shape inference.. Final graph here.
  • Converting from int32 to int64 through torch.tensor(). This results in a warning that I described in my last message: TracerWarning: torch.tensor results are registered as constants in the trace. Final graph here.
  • Converting from int32 to int64 through tensor.long(). This doesn't result in a warning at all, and the graph is the same as the option above. Final graph here.

Each of the three gives different warnings, but as I briefly stated above, converting by both ways yields the same graph. Not converting doesn't yield the same graph with respect to the other two, but I think the only difference is the lack of onnx::Cast operators and the final type of the shape (Int in the case of not converting, Long in the case of converting). Now there's no onnx::ConstantOfShape operator though...

@albertz
Copy link
Member Author

albertz commented Jun 29, 2023

So this looks like a bug in the ONNX export of PyTorch, right? Because running the PyTorch code as-is works completely fine.

You mean that it's a bug because the operation is converted to the same onnx::Unsqueeze operation regardless of whether the tensor is being created again with torch.tensor() or being explicitly casted with tensor.long(), but in the first approach it yields a warning and in the second approach it doesn't, right?

No. It's a bug simply because the PyTorch works without ONNX, but then fails in ONNX export. No matter the details, that already can be considered a bug (or maybe some missing functionality). You said that your example code (this) shows this.

You should not mix different aspects in the bug report, like converting int32 to int64 or so. Keep separate issues as separate issues.

In the issue, the example code is one of the most important aspects. You directly should show the code additionally to linking the Gist.

In the issue, you formatted the error in a way that you cannot easily read it. I think somehow you removed the newlines from it? It would be helpful to add back the line breaks such that it can be read more easily.

In the issue, you now made the example code more complex than it needs to be. But such example code should always be as minimal as possible. Why do you need those d1, d2, d3? Why not just one dimension, as in the code you had before? Also, you are now mixing three different things in there, which could be a bit confusing. That option 2 what you mention there is anyway totally clear. It's not at all a bug, so it doesn't really belong there. It was our own mistake, that we understood wrong how torch.tensor(...) works. Nothing is wrong with that. You definitely should remove that from the issue. It does not add anything and might just distract from the actual issue. An issue is to report an issue, sth which PyTorch should fix.

What is also missing in the issue is the error you get when you try to actually run it with ONNX.

@Icemole
Copy link
Collaborator

Icemole commented Jun 30, 2023

I will reformat the issue as discussed.

Edit: done, I think the issue is clearer now, thanks for the feedback.

@albertz
Copy link
Member Author

albertz commented Jun 30, 2023

Thanks. But as you wrote it, it is still a bit unclear to me. Are there two types of bugs, for option 1 and option 2? Then that should be two separate bug reports. Or are they the same bug? Or is there actually no bug with option 1? Then why is option 1 relevant for the bug report?

When converting from int32 to int64 through tensor_object.long() (final graph here), we don't get any warning.

So, this is option 1? So with option 1, all works as intended?

Then you should not put option 1 even into the code. The code should demonstrate the bug. I think this is confusing. You can mention in a remark that it works correctly when you call .long() first.

Btw, you forgot to enable Python syntax highlighting for the code. It's hard to read. I cannot easily see what part is the commented out code. But anyway, you better should not post any code where there is commented out code.

Btw, shape = [dim.long() for dim in shape] is also too complicated. Just write shape = [d1.long()], to see the difference. But as said, I think you should not put this into the code at all, but just leave it as a comment.

@Icemole
Copy link
Collaborator

Icemole commented Jun 30, 2023

I've formatted the issue as specified.

Moving on to another topic, the inference seems to be working, or at least it's producing some reasonable output from the random input I gave it. However, I get two warnings:

2023-06-30 13:33:33.516802915 [W:onnxruntime:, execution_frame.cc:857 VerifyOutputSizes] Expected shape from model of {} does not match actual shape of {3} for output output:size0
2023-06-30 13:33:33.517066362 [W:onnxruntime:, execution_frame.cc:857 VerifyOutputSizes] Expected shape from model of {-1} does not match actual shape of {3,50,2} for output output:size1

As I understand, the tool is failing to get the output shapes. I'm running the ONNX exporting tool with the torch demo. The code I'm running for inference is the following:

import onnxruntime as ort
import torch


onnx_model = "/tmp/nbeneitez/returnn/demos/demo-torch/model.005.onnx"
torch.manual_seed(42)
dummy_data = torch.randn([3, 50, 9])
dummy_data_len = torch.IntTensor([3, 50, 1])

session = ort.InferenceSession(onnx_model)
outputs_onnx = session.run(None, {"data": dummy_data.numpy(), "classes:size0": dummy_data_len.numpy()})

print(torch.FloatTensor(outputs_onnx[0]))
print(torch.IntTensor(outputs_onnx[1]))
print(torch.IntTensor(outputs_onnx[2]))

I adapted the inference code from this code.

@albertz
Copy link
Member Author

albertz commented Jun 30, 2023

The example code from the issue is still a bit too complicated. You don't need the __init__ function there.

On your code, you are using it incorrectly. classes:size0 is the batch size, which should be a scalar. classes:size1 are the seq lens, a vector.

@Icemole
Copy link
Collaborator

Icemole commented Jul 7, 2023

Thanks for the heads-up.

I tried inserting the batch size and the sequence lengths in classes:size0 and classes:size1 as you requested, but my inference code failed with the following error:

  File "/home/nbeneitez/.venv/returnn_pytorch/lib/python3.10/site-packages/onnxruntime
/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT :
Invalid Feed Input Name:classes:size1

Indeed, in the graph we obtained when the tool initially worked (original comment here), the only two inputs I can see are data and classes:size0. Because of the computation graph I assume the rest of the inputs aren't needed, but they're properly specified in the input names provided to torch.onnx.export, including data:size0, data:size1, and classes:size1. Even if they might not be used, do you think they should be disregarded as inputs to the final graph, when we explicitly stated that these would be inputs to the graph?

A better view of the graph through through NETRON:
model 005 onnx

Edit: but besides that, the code is working properly :) I just had to remove dummy_data_len and add dummy_batch_size, so the core functionality in the snippet above would become the following:

dummy_data = torch.randn([3, 50, 9])
dummy_batch_size = torch.IntTensor(3, dtype=torch.int32)

session = ort.InferenceSession(onnx_model)
outputs_onnx = session.run(None, {"data": dummy_data.numpy(), "classes:size0": dummy_batch_size.numpy()})

And no warning is outputted when running inference.

@albertz
Copy link
Member Author

albertz commented Jul 7, 2023

Even if they might not be used, do you think they should be disregarded as inputs to the final graph, when we explicitly stated that these would be inputs to the graph?

Who do you ask here? You say this is maybe unexpected behavior of the PyTorch ONNX export tool? Yes, maybe. Did you check whether this is discussed anywhere? If not, can you maybe open an issue on this, and ask if this is a bug or expected behavior?

Maybe the answer is, this is expected behavior. Or maybe it would be fixed in a future version but we probably also want to make it work for the current version.

So then the answer goes back to us. What do we actually want to do in such case?

I think we want that we always have a clear interface. When the interface is that we give it (input, in_seq_lens), we want that the resulting ONNX graph can always get this type of input, no matter if it is actually used or not.

Btw, via this argument, we might also say: The interface is that the output should be (output, out_seq_lens). And out_seq_lens should be properly defined. So then our current heuristic to just fill it with dummy values is not good, and we want that the user makes it explicit?

We should maybe discuss with @curufinwe about this. It matters for RASR, what type of interface to expect there. Also when RASR wants to do batched recognition at some point, and when RASR wants to support different sequence lengths properly, with potential downsampling in the model, the in_seq_lens and out_seq_lens are both important.

Btw, the batch_dim in the input is actually sth which we might want to remove, as it is redundant. In general, any dims which have a scalar dyn_size_ext are not needed. We probably should change this.

Icemole added a commit that referenced this issue Jul 7, 2023
@Icemole
Copy link
Collaborator

Icemole commented Jul 7, 2023

I pushed tests for inference to a new branch, please let me know if something needs to be changed.

I don't have more time for the week to work in the ONNX exporting but I quickly went through your comment and overall I agree, we should bring up this up with more people. And yes, I would rather have a constant interface always, even if the inputs end up calculating nothing, since that can be better automatically handled. I will try to examine some issues or details further next week.

@albertz
Copy link
Member Author

albertz commented Jul 12, 2023

I pushed tests for inference to a new branch, please let me know if something needs to be changed.

I don't remember anymore, what was this about? Or just about extending our tests by actually forwarding sth through the ONNX graph via onnxruntime?

Why did you rename _test_torch_export_to_onnx to _test_export_to_onnx? The old name _test_torch_export_to_onnx sounds more explicit and better to me.

Why is there no PR for this branch?

@Icemole
Copy link
Collaborator

Icemole commented Jul 12, 2023

[...] what was this about? [...]

As you said, this was just about extending our tests by actually forwarding something through the ONNX graph via onnxruntime. We wanted to have some tests that checked not only the graph creation but also the inference, so I made these.

Why did you rename [...]

I understand that torch there meant the technology itself, but it was a little confusing for me because I initially thought torch was the demo name, and this code worked for both the torch and the rf demos. I can bring it back if you think it was better.

Why is there no PR [...]

I've been focused on other tasks and didn't have time to invest on the ONNX tool. I'll address the change above and create the PR, but I can't do much more.

@albertz
Copy link
Member Author

albertz commented Jul 17, 2023

Last week, we decided:

ONNX input/output consistency, three use cases for RASR:

  1. Only inputs -> only outputs
  2. Inputs + seq lens -> outputs: assume output seq lens == input seq lens
    a. This is what's currently specified at the moment (seq lens are optional, so actually 1 and 2 are supported)
  3. Inputs + seq lens -> outputs + seq lens: assume output seq lens are correct (don't randomly fill), we should be able to use them later on

User should explicitly specify each of the cases.

So, what this means: actually, in case of RASR, it was not clear yet. Probably RASR can automatically detect whether the model has some seq lens input, and probably also whether the model has some seq lens output, so it can determine automatically which case it has.

However, in case of RETURNN, this explicitly need to be specified. The further question was, how. We basically use the current existing mechanism, i.e. via model_outputs, to specify via the dim tag for the output time dimension, whether it has seq lens or not. Similarly, the user can specify via extern_data, whether the input time dimension has seq lens or not.

@albertz
Copy link
Member Author

albertz commented Jul 19, 2023

@vieting found a bug in how we call get_model in this script. (See here: rwth-i6/i6_core#429 (comment))

Specifically, we must pass epoch and step. This was defined by our API, and most real-world configs would expect that. Actually only the demos do not. I will maybe also update the demos. I will fix this.

An open question is, what should be the epoch and step you pass to it? I think this should just be arguments.

Edit Fixed.

Edit Fixed again. Now we just take epoch and step from the checkpoint.

@albertz
Copy link
Member Author

albertz commented Jul 19, 2023

Btw, I wonder, the resulting ONNX graph, this is a freezed graph, so it contains all the parameters? (@Icemole)

Edit: Yes.

albertz pushed a commit that referenced this issue Jul 24, 2023
albertz added a commit that referenced this issue Jul 24, 2023
This implements proper dynamic dims handling as it was discussed #1333. It also extends some test cases and documentation.

- Test cases: Also perform ONNX inference on the exported ONNX models.
- Handle `available_for_inference` in `extern_data`: They are excluded.
- Handle duplicated dims in `extern_data`: Only the first is used. (First according to the order defined by the dict in the config.)
- Handle scalar dynamic dims in `extern_data` and `model_outputs`: They are excluded.
- It is an error if some dim `dyn_size_ext` in `model_outputs` is specified as not being a scalar (e.g. common seq lens of shape [B]) but its value is not defined.
- Demos are adapted.
- Torch-to-ONNX export documentation extended.

Co-authored-by: Albert Zeyer <[email protected]>
@albertz albertz added the PyTorch label Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants