Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [Bug] Encountered cuda 710 error when apply Torch-TensorRT to BERT #1418

Closed
Mansterteddy opened this issue Oct 25, 2022 · 2 comments · Fixed by #1424
Closed

🐛 [Bug] Encountered cuda 710 error when apply Torch-TensorRT to BERT #1418

Mansterteddy opened this issue Oct 25, 2022 · 2 comments · Fixed by #1424
Assignees
Labels
bug: triaged [verified] We can replicate the bug bug Something isn't working

Comments

@Mansterteddy
Copy link

Mansterteddy commented Oct 25, 2022

Bug Description

I wanted to use Torch-TensorRT to boost BERT model inference, but met following errors:

../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
../aten/src/ATen/native/cuda/Indexing.cu:975: indexSelectLargeIndex: block: [239,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.

CUDA initialization failure with error: 710. Please check your CUDA installation: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
Segmentation fault (core dumped)

To Reproduce

from transformers import BertModel, BertTokenizer, BertConfig
import numpy as np
import torch
import torch_tensorrt
import time

print("VERSION:", torch_tensorrt.__version__)

# Creating a dummy input
test_batchsz = 128
tokens_tensor = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()
segments_tensors = torch.zeros((test_batchsz, 20)).to(torch.int32).cuda()
mask_tensors = torch.ones((test_batchsz, 20)).to(torch.int32).cuda()

model = BertModel.from_pretrained("bert-base-chinese", torchscript=True)
torch_script_module = torch.jit.trace(model.eval().cuda(), (tokens_tensor, mask_tensors, segments_tensors))

trt_ts_module = torch_tensorrt.compile(torch_script_module.float(),
                        inputs= [torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        torch_tensorrt.Input(shape=[test_batchsz, 20], dtype=torch.int32),
                        ], 
                        enabled_precisions= {torch.float},
                        workspace_size=2000000000,
                        truncate_long_and_double=True)

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

  • Torch-TensorRT Version (e.g. 1.0.0): 1.2.0
  • PyTorch Version (e.g. 1.0): 1.12.1
  • CPU Architecture:
  • OS (e.g., Linux): Linux
  • How you installed PyTorch (conda, pip, libtorch, source): pip
  • Build command you used (if compiling from source):
  • Are you using local sources or building from archives:
  • Python version: 3.8
  • CUDA version: 11.6
  • GPU models and configuration:
  • Any other relevant information:

Additional context

@Mansterteddy Mansterteddy added the bug Something isn't working label Oct 25, 2022
@Mansterteddy Mansterteddy changed the title 🐛 [Bug] Encountered 710 error when apply Torch-TensorRT to BERT 🐛 [Bug] Encountered cuda 710 error when apply Torch-TensorRT to BERT Oct 25, 2022
@gs-olive
Copy link
Collaborator

gs-olive commented Oct 26, 2022

I can confirm the issue is still occurring on the latest commit (ce29cc), built with PyTorch 1.13, and I am investigating the cause. When only using the first two arguments (tokens_tensor and segments_tensors), the model is currently succeeding (compilation + inference) with my configuration, so it seems passing 3+ tensor arguments to the model is causing the error.

@Mansterteddy
Copy link
Author

I can confirm the issue is still occurring on the latest commit (ce29cc), built with PyTorch 1.13, and I am investigating the cause. When only using the first two arguments (tokens_tensor and segments_tensors), the model is currently succeeding (compilation + inference) with my configuration, so it seems passing 3+ tensor arguments to the model is causing the error.

Yes, same for me.

@narendasan narendasan added the bug: triaged [verified] We can replicate the bug label Oct 28, 2022
gs-olive added a commit to gs-olive/TensorRT that referenced this issue Nov 3, 2022
- Issue arising when compiling BERT models with 3+ inputs
- Added temporary fix by decreasing the range of allowed values to the
random number generator for creating input tensors to [0,2), instead of [0,5)
- Used random float inputs in the range [0, 2) instead of int, then casted to desired
type. The ultimate effect of this change with regard to bug pytorch#1418, is
random floats are selected in the range [0, 2), then casted to Int, effectively making the
range of allowed ints {0, 1}, as required by the model
- More robust fix to follow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug: triaged [verified] We can replicate the bug bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants