🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

blchu · 2022-01-29T02:36:04Z

Bug Description

When compiling the encoder transformer model for Sockeye inference, Torch-TensorRT throws a runtime error.

To Reproduce

Steps to reproduce the behavior:

Start a docker container docker run --gpus all --rm -it nvcr.io/nvidia/pytorch:21.11-py3
Run the following to download+preprocess data and train a basic model:

git clone https://github.com/blchu/sockeye.git -b tensorrt_blchu
tail -n 4 sockeye/requirements/requirements.txt > requirements.txt.tmp \
    && mv requirements.txt.tmp sockeye/requirements/requirements.txt
pip install -e ./sockeye
git clone https://github.com/rsennrich/subword-nmt.git
export PYTHONPATH=$(pwd)/subword-nmt:$PYTHONPATH

wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.de.gz
wget http://data.statmt.org/wmt17/translation-task/preprocessed/de-en/corpus.tc.en.gz
gunzip corpus.tc.de.gz
gunzip corpus.tc.en.gz
curl https://data.statmt.org/wmt17/translation-task/preprocessed/de-en/dev.tgz | tar xvzf -

head -n 32768 corpus.tc.de > corpus.tc.de.tmp && mv corpus.tc.de.tmp corpus.tc.de
head -n 32768 corpus.tc.en > corpus.tc.en.tmp && mv corpus.tc.en.tmp corpus.tc.en

python -m learn_joint_bpe_and_vocab --input corpus.tc.de corpus.tc.en \
                                    -s 3000 \
                                    -o bpe.codes \
                                    --write-vocabulary bpe.vocab.de bpe.vocab.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < corpus.tc.de > corpus.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < corpus.tc.en > corpus.tc.BPE.en

python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.de --vocabulary-threshold 50 < newstest2016.tc.de > newstest2016.tc.BPE.de
python -m apply_bpe -c bpe.codes --vocabulary bpe.vocab.en --vocabulary-threshold 50 < newstest2016.tc.en > newstest2016.tc.BPE.en

python -m sockeye.prepare_data_pt \
                        -s corpus.tc.BPE.de \
                        -t corpus.tc.BPE.en \
                        -o train_data \
                        --shared-vocab

torchrun --no_python --nproc_per_node 1 sockeye-train \
         --prepared-data train_data \
         --validation-source newstest2016.tc.BPE.de \
         --validation-target newstest2016.tc.BPE.en \
         --output model \
         --batch-size 2048 \
         --update-interval 1 \
         --checkpoint-interval 1 \
         --max-updates 1 \
         --decoder ssru_transformer \
         --shared-vocab \
         --seed 1 \
         --quiet-secondary-workers

Run the translate command to attempt to compile with Torch-TensorRT, here is where the error should occur:

sockeye-translate \
    --input newstest2016.tc.BPE.de \
    --output out \
    --model model \
    --dtype float16 \
    --beam-size 5 \
    --batch-size 64 \
    --output-type benchmark

Stack trace and logs:

WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Truncating weight (constant in the graph) from Float64 to Float32
WARNING: [Torch-TensorRT TorchScript Conversion Context] - Detected invalid timing cache, setup a local cache instead
WARNING: [Torch-TensorRT TorchScript Conversion Context] - The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieved with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.

WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
[ERROR:root] Uncaught exception
Traceback (most recent call last):
  File "/opt/conda/bin/sockeye-translate", line 33, in <module>
    sys.exit(load_entry_point('sockeye', 'console_scripts', 'sockeye-translate')())
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 43, in main
    run_translate(args)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 147, in run_translate
    read_and_translate(translator=translator,
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 234, in read_and_translate
    chunk_time = translate(output_handler, chunk, translator)
  File "/workspace/temp/sockeye/sockeye/translate_pt.py", line 257, in translate
    trans_outputs = translator.translate(trans_inputs)
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 807, in translate
    batch_translations = self._translate_np(*self._get_inference_input(translator_inputs))  # type: ignore
  File "/workspace/temp/sockeye/sockeye/inference_pt.py", line 995, in _translate_np
    return self._get_best_translations(self._search(source,
  File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 778, in forward
    model_states, estimated_reference_lengths = self._inference.encode_and_initialize(source, source_length)
  File "/workspace/temp/sockeye/sockeye/beam_search_pt.py", line 70, in encode_and_initialize
    states, predicted_output_length = self._model.encode_and_initialize(inputs, valid_length, self._const_lr)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 234, in encode_and_initialize
    source_encoded, source_encoded_lengths = self.encode(inputs, valid_length=valid_length)
  File "/workspace/temp/sockeye/sockeye/model_pt.py", line 200, in encode
    self.traced_encoder = torch_tensorrt.compile(self.traced_encoder,
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 97, in compile
    return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
    compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
RuntimeError: [Error thrown at ./core/conversion/var/Var_inl.h:38] Expected ivalue->isInt() to be true but got false
Requested unwrapping of arg IValue assuming it was l however type is NoneType

Expected behavior

The model should compile successfully without error and translate sentences

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Torch-TensorRT Version (e.g. 1.0.0): 1.11.0a0+b6df043'
PyTorch Version (e.g. 1.0): 1.0.0a0
CPU Architecture: x86_64 (Intel Xeon Platinum 8259CL)
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch (conda, pip, libtorch, source): NGC Container
Python version: 3.8.12
CUDA version: 11.5
GPU models and configuration: Tesla T4

The text was updated successfully, but these errors were encountered:

mjdenkowski · 2022-03-02T17:32:50Z

@narendasan @ncomly-nvidia @blchu Thanks for working on this!

Do we know what the remaining challenges are for compiling the full encoder module?

narendasan · 2022-03-02T20:05:04Z

There is a limitation in TensorRT that is causing the engine building to fail even if we add in workarounds. We are trying to figure out the root cause for this to determine if we can work around at the Torch-TRT level or if we need some improvement to TensorRT

mjdenkowski · 2022-03-02T23:01:19Z

Thanks for digging into this issue!

It looks like a Fairseq transformer that uses a similar encoder runs successfully on TensorRT, so all of the operations should be supported (blog post).

Has the compile error been traced to a specific operator? We can also look at temporary workarounds at the Sockeye level.

narendasan · 2022-03-08T03:47:14Z

@mjdenkowski We recently updated our TensorRT version which addresses our previous issues during engine building.

Currently the status is:

With aten::pow support #918, we can partially compile the module which is comprised of 4 blocks. Two blocks remain in PyTorch because we don't have converters for the following ops:
- aten::bitwise_not(Tensor self) -> (Tensor)
- aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
When sockeye goes to execute the built TorchTRT module, I see errors related to unexpected input shape occasionally (i.e. input tensor is smaller that the set input shape [64, 96, 512]).
- Here we have a couple ways forward:
  1. If padding the input to a uniform size is acceptable then you should be good to go
  2. If you need support for dynamic shape, we need to address the two unsupported ops since we don't currently support using partial compilation and dynamic shape at the same time

mjdenkowski · 2022-03-08T16:51:15Z

That sounds like great progress!

We use dynamic shapes throughout our model, so padding inputs isn't a good fit for our use case.

Is there a timeline for supporting the other two operators? Sockeye implements a standard transformer encoder, so supporting these ops should be broadly helpful for compiling various transformer/BERT/attention implementations.

mjdenkowski · 2022-03-28T15:57:33Z

Thanks again @narendasan and @blchu!

Is there an active issue for supporting the remaining two operators or do we need to open a new issue?

narendasan · 2022-03-28T16:17:20Z

We have it tracked internally and using this issue as a reference. I don't think we need an additional issue for the specific operators

github-actions · 2022-06-27T00:02:00Z

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

ncomly-nvidia · 2022-07-05T20:52:49Z

@blchu any updates on converter progress?

blchu added the bug Something isn't working label Jan 29, 2022

ncomly-nvidia assigned narendasan and ncomly-nvidia Jan 31, 2022

narendasan mentioned this issue Jan 31, 2022

Fix arange #839

Merged

6 tasks

narendasan closed this as completed in #839 Feb 2, 2022

narendasan reopened this Feb 2, 2022

narendasan mentioned this issue Mar 8, 2022

aten::pow support #918

Merged

6 tasks

github-actions bot added the No Activity label Jun 27, 2022

ncomly-nvidia removed the No Activity label Jul 5, 2022

blchu mentioned this issue Aug 26, 2022

feat (//core/conversion) : Add converter for torch.repeat_interleave ( #1313

Merged

7 tasks

narendasan closed this as completed in #1313 Aug 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

blchu commented Jan 29, 2022

mjdenkowski commented Mar 2, 2022

narendasan commented Mar 2, 2022

mjdenkowski commented Mar 2, 2022

narendasan commented Mar 8, 2022 •

edited

Loading

mjdenkowski commented Mar 8, 2022

mjdenkowski commented Mar 28, 2022

narendasan commented Mar 28, 2022

github-actions bot commented Jun 27, 2022

ncomly-nvidia commented Jul 5, 2022

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

🐛 [Bug] RuntimeError when attempting to compile Encoder model in Sockeye #833

Comments

blchu commented Jan 29, 2022

Bug Description

To Reproduce

Expected behavior

Environment

mjdenkowski commented Mar 2, 2022

narendasan commented Mar 2, 2022

mjdenkowski commented Mar 2, 2022

narendasan commented Mar 8, 2022 • edited Loading

mjdenkowski commented Mar 8, 2022

mjdenkowski commented Mar 28, 2022

narendasan commented Mar 28, 2022

github-actions bot commented Jun 27, 2022

ncomly-nvidia commented Jul 5, 2022

narendasan commented Mar 8, 2022 •

edited

Loading