Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? #856

Closed
andreabonvini opened this issue Feb 8, 2022 · 24 comments
Assignees
Labels
channel: windows bugs, questions, & RFEs around Windows No Activity question Further information is requested

Comments

@andreabonvini
Copy link

andreabonvini commented Feb 8, 2022

❓ Question

I would need to optimize an already trained segmentation model through TorchTensorRT, the idea would be to optimize the model by running the newest PyTorch NGC docker image under WSL2, exporting the model and then loading it in a C++ application that uses LibTorch, e.g.

#include <torch/script.h>
// ...
torch::jit::script::Module module;
try {
  // Deserialize the ScriptModule from a file using torch::jit::load().
  module = torch::jit::load(argv[1]);
}

Would this be the right approach?

What you have already tried

At the moment I only tried to optimize the model through TorchTensorRT, and something weird happens. Here I'll show the results for the Python script below that I obtained on two different devices:

  • a Ubuntu desktop with a GTX1080Ti (that I use for development)
  • a Windows PC with a RTX3080 (that is my target device)

As you can see, the optimization process under WSL gives me a lot of GPU errors, while on Ubuntu it seems to work fine. Why does this happen?

My script:

import torch_tensorrt
import yaml
import torch
import os
import time
import numpy as np
import torch.backends.cudnn as cudnn
import argparse
import segmentation_models_pytorch as smp
import pytorch_lightning as pl
cudnn.benchmark = True

def benchmark(model, input_shape=(1, 3, 512, 512), dtype=torch.float, nwarmup=50, nruns=1000):
    input_data = torch.randn(input_shape)
    input_data = input_data.to("cuda")
    if dtype==torch.half:
        input_data = input_data.half()
        
    print("Warm up ...")
    with torch.no_grad():
        for _ in range(nwarmup):
            features = model(input_data)
    torch.cuda.synchronize()
    print("Start timing ...")
    timings = []
    with torch.no_grad():
        for i in range(1, nruns+1):
            start_time = time.time()
            features = model(input_data)
            torch.cuda.synchronize()
            end_time = time.time()
            timings.append(end_time - start_time)
            if i%100==0:
                print('Iteration %d/%d, ave batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))

    print("Input shape:", input_data.size())
    print("Output features size:", features.size())
    
    print('Average batch time: %.2f ms'%(np.mean(timings)*1000))
    
def load_config(config_path: str):
    with open(config_path) as f:
        config = yaml.load(f, Loader=yaml.FullLoader)
    return config
    
    
    
def main():
    # Load target model
    parser = argparse.ArgumentParser()
    parser.add_argument("weights_path")
    parser.add_argument("config_path")
    args = parser.parse_args()
    config = load_config(args.config_path)
    model_dict = config["model"]
    model_dict["activation"] = "softmax2d"
    model = smp.create_model(**model_dict)
    state_dict = torch.load(args.weights_path)["state_dict"]
    model.load_state_dict(state_dict)
    model.to("cuda")
    model.eval()
    # Create dummy data for tracing and benchmarking purposes.
    dtype = torch.float32
    shape = (1, 3, 512, 512)
    input_data = torch.randn(shape).to("cuda")
    
    # Convert model to script module
    print("Tracing PyTorch model...")
    traced_script_module = torch.jit.trace(model, input_data)
    # torch_script_module = torch.jit.load(model_path).cuda()
    print("Script Module generated.")
    print("\nBenchmarking Script Module...")
    # First benchmark <===================================
    benchmark(traced_script_module, shape, dtype)
    
    
    # Convert to TRT Module...
    output_path = args.config_path.split(os.path.sep)[-1] + "_trt_.pt"
    print("Creating TRT module...")
    trt_ts_module = torch_tensorrt.compile(
        traced_script_module,
        inputs = [
            torch_tensorrt.Input( # Specify input object with shape and dtype
                shape=shape,
                dtype=dtype) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        ],
        enabled_precisions = {dtype},
      )
    print("TRT Module created")
    print("\nBenchmarking TRT Module...")
    benchmark(trt_ts_module, shape, dtype)
    torch.jit.save(trt_ts_module, os.path.join("models",output_path)) # save the TRT embedded Torchscript
    
if __name__ == "__main__":
    main()
    

Ubuntu desktop

root@ca10ddc496a3:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
No pretrained weights exist for this model. Using random initialization.
Tracing PyTorch model...
/opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if h % output_stride != 0 or w % output_stride != 0:
Script Module generated.

Benchmarking Script Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 7.00 ms
Iteration 200/1000, ave batch time 6.88 ms
Iteration 300/1000, ave batch time 6.76 ms
Iteration 400/1000, ave batch time 6.91 ms
Iteration 500/1000, ave batch time 6.93 ms
Iteration 600/1000, ave batch time 6.98 ms
Iteration 700/1000, ave batch time 6.99 ms
Iteration 800/1000, ave batch time 6.91 ms
Iteration 900/1000, ave batch time 6.89 ms
Iteration 1000/1000, ave batch time 6.87 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 6.87 ms
Creating TRT module...
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 256, 128, 128]
[1, 256, 128, 128]
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
TRT Module created

Benchmarking TRT Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 3.29 ms
Iteration 200/1000, ave batch time 3.30 ms
Iteration 300/1000, ave batch time 3.30 ms
Iteration 400/1000, ave batch time 3.30 ms
Iteration 500/1000, ave batch time 3.31 ms
Iteration 600/1000, ave batch time 3.30 ms
Iteration 700/1000, ave batch time 3.30 ms
Iteration 800/1000, ave batch time 3.30 ms
Iteration 900/1000, ave batch time 3.30 ms
Iteration 1000/1000, ave batch time 3.30 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 3.30 ms

Windows PC

root@3130ab7d9ff8:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
No pretrained weights exist for this model. Using random initialization.
Tracing PyTorch model...
/opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if h % output_stride != 0 or w % output_stride != 0:
Script Module generated.

Benchmarking Script Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 3.21 ms
Iteration 200/1000, ave batch time 3.18 ms
Iteration 300/1000, ave batch time 3.17 ms
Iteration 400/1000, ave batch time 3.17 ms
Iteration 500/1000, ave batch time 3.16 ms
Iteration 600/1000, ave batch time 3.16 ms
Iteration 700/1000, ave batch time 3.16 ms
Iteration 800/1000, ave batch time 3.16 ms
Iteration 900/1000, ave batch time 3.16 ms
Iteration 1000/1000, ave batch time 3.15 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])
Average batch time: 3.15 ms
Creating TRT module...
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 256, 128, 128]
[1, 256, 128, 128]
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.17 : Tensor = aten::_convolution(%1217, %self.encoder.model.blocks.1.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.19 : Tensor = aten::batch_norm(%input.17, %self.encoder.model.blocks.1.0.bn1.weight, %self.encoder.model.blocks.1.0.bn1.bias, %self.encoder.model.blocks.1.0.bn1.running_mean, %self.encoder.model.blocks.1.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1220 : Tensor = aten::relu(%input.19), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.29 : Tensor = aten::_convolution(%1223, %self.encoder.model.blocks.1.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.31 : Tensor = aten::batch_norm(%input.29, %self.encoder.model.blocks.1.0.bn3.weight, %self.encoder.model.blocks.1.0.bn3.bias, %self.encoder.model.blocks.1.0.bn3.running_mean, %self.encoder.model.blocks.1.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.33 : Tensor = aten::_convolution(%input.31, %self.encoder.model.blocks.2.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.35 : Tensor = aten::batch_norm(%input.33, %self.encoder.model.blocks.2.0.bn1.weight, %self.encoder.model.blocks.2.0.bn1.bias, %self.encoder.model.blocks.2.0.bn1.running_mean, %self.encoder.model.blocks.2.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1228 : Tensor = aten::relu(%input.35), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 || %input.369 : Tensor = aten::_convolution(%input.31, %self.decoder.block1.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.371 : Tensor = aten::batch_norm(%input.369, %self.decoder.block1.1.weight, %self.decoder.block1.1.bias, %self.decoder.block1.1.running_mean, %self.decoder.block1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %high_res_features : Tensor = aten::relu(%input.371), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.45 : Tensor = aten::_convolution(%1231, %self.encoder.model.blocks.2.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.47 : Tensor = aten::batch_norm(%input.45, %self.encoder.model.blocks.2.0.bn3.weight, %self.encoder.model.blocks.2.0.bn3.bias, %self.encoder.model.blocks.2.0.bn3.running_mean, %self.encoder.model.blocks.2.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.49 : Tensor = aten::_convolution(%input.47, %self.encoder.model.blocks.2.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.51 : Tensor = aten::batch_norm(%input.49, %self.encoder.model.blocks.2.1.bn1.weight, %self.encoder.model.blocks.2.1.bn1.bias, %self.encoder.model.blocks.2.1.bn1.running_mean, %self.encoder.model.blocks.2.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1236 : Tensor = aten::relu(%input.51), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.65 : Tensor = aten::_convolution(%1242, %self.encoder.model.blocks.3.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.67 : Tensor = aten::batch_norm(%input.65, %self.encoder.model.blocks.3.0.bn1.weight, %self.encoder.model.blocks.3.0.bn1.bias, %self.encoder.model.blocks.3.0.bn1.running_mean, %self.encoder.model.blocks.3.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1245 : Tensor = aten::relu(%input.67), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.85 : Tensor = aten::_convolution(%input.83, %self.encoder.model.blocks.3.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.87 : Tensor = aten::batch_norm(%input.85, %self.encoder.model.blocks.3.0.bn3.weight, %self.encoder.model.blocks.3.0.bn3.bias, %self.encoder.model.blocks.3.0.bn3.running_mean, %self.encoder.model.blocks.3.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.89 : Tensor = aten::_convolution(%input.87, %self.encoder.model.blocks.3.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.91 : Tensor = aten::batch_norm(%input.89, %self.encoder.model.blocks.3.1.bn1.weight, %self.encoder.model.blocks.3.1.bn1.bias, %self.encoder.model.blocks.3.1.bn1.running_mean, %self.encoder.model.blocks.3.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1259 : Tensor = aten::relu(%input.91), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.113 : Tensor = aten::_convolution(%1271, %self.encoder.model.blocks.3.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.115 : Tensor = aten::batch_norm(%input.113, %self.encoder.model.blocks.3.2.bn1.weight, %self.encoder.model.blocks.3.2.bn1.bias, %self.encoder.model.blocks.3.2.bn1.running_mean, %self.encoder.model.blocks.3.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1274 : Tensor = aten::relu(%input.115), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.137 : Tensor = aten::_convolution(%1286, %self.encoder.model.blocks.3.3.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.139 : Tensor = aten::batch_norm(%input.137, %self.encoder.model.blocks.3.3.bn1.weight, %self.encoder.model.blocks.3.3.bn1.bias, %self.encoder.model.blocks.3.3.bn1.running_mean, %self.encoder.model.blocks.3.3.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1289 : Tensor = aten::relu(%input.139), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.161 : Tensor = aten::_convolution(%1301, %self.encoder.model.blocks.4.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.163 : Tensor = aten::batch_norm(%input.161, %self.encoder.model.blocks.4.0.bn1.weight, %self.encoder.model.blocks.4.0.bn1.bias, %self.encoder.model.blocks.4.0.bn1.running_mean, %self.encoder.model.blocks.4.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1304 : Tensor = aten::relu(%input.163), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.185 : Tensor = aten::_convolution(%1316, %self.encoder.model.blocks.4.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.187 : Tensor = aten::batch_norm(%input.185, %self.encoder.model.blocks.4.1.bn1.weight, %self.encoder.model.blocks.4.1.bn1.bias, %self.encoder.model.blocks.4.1.bn1.running_mean, %self.encoder.model.blocks.4.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1319 : Tensor = aten::relu(%input.187), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.209 : Tensor = aten::_convolution(%1331, %self.encoder.model.blocks.4.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.211 : Tensor = aten::batch_norm(%input.209, %self.encoder.model.blocks.4.2.bn1.weight, %self.encoder.model.blocks.4.2.bn1.bias, %self.encoder.model.blocks.4.2.bn1.running_mean, %self.encoder.model.blocks.4.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1334 : Tensor = aten::relu(%input.211), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.233 : Tensor = aten::_convolution(%1346, %self.encoder.model.blocks.5.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.235 : Tensor = aten::batch_norm(%input.233, %self.encoder.model.blocks.5.0.bn1.weight, %self.encoder.model.blocks.5.0.bn1.bias, %self.encoder.model.blocks.5.0.bn1.running_mean, %self.encoder.model.blocks.5.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1349 : Tensor = aten::relu(%input.235), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.253 : Tensor = aten::_convolution(%input.251, %self.encoder.model.blocks.5.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.255 : Tensor = aten::batch_norm(%input.253, %self.encoder.model.blocks.5.0.bn3.weight, %self.encoder.model.blocks.5.0.bn3.bias, %self.encoder.model.blocks.5.0.bn3.running_mean, %self.encoder.model.blocks.5.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.257 : Tensor = aten::_convolution(%input.255, %self.encoder.model.blocks.5.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.259 : Tensor = aten::batch_norm(%input.257, %self.encoder.model.blocks.5.1.bn1.weight, %self.encoder.model.blocks.5.1.bn1.bias, %self.encoder.model.blocks.5.1.bn1.running_mean, %self.encoder.model.blocks.5.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1363 : Tensor = aten::relu(%input.259), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.281 : Tensor = aten::_convolution(%1375, %self.encoder.model.blocks.5.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.283 : Tensor = aten::batch_norm(%input.281, %self.encoder.model.blocks.5.2.bn1.weight, %self.encoder.model.blocks.5.2.bn1.bias, %self.encoder.model.blocks.5.2.bn1.running_mean, %self.encoder.model.blocks.5.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1378 : Tensor = aten::relu(%input.283), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.305 : Tensor = aten::_convolution(%1390, %self.encoder.model.blocks.6.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.307 : Tensor = aten::batch_norm(%input.305, %self.encoder.model.blocks.6.0.bn1.weight, %self.encoder.model.blocks.6.0.bn1.bias, %self.encoder.model.blocks.6.0.bn1.running_mean, %self.encoder.model.blocks.6.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1393 : Tensor = aten::relu(%input.307), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.317 : Tensor = aten::_convolution(%1396, %self.encoder.model.blocks.6.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.319 : Tensor = aten::batch_norm(%input.317, %self.encoder.model.blocks.6.0.bn3.weight, %self.encoder.model.blocks.6.0.bn3.bias, %self.encoder.model.blocks.6.0.bn3.running_mean, %self.encoder.model.blocks.6.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.321 : Tensor = aten::_convolution(%input.319, %self.decoder.aspp.0.convs.0.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.323 : Tensor = aten::batch_norm(%input.321, %self.decoder.aspp.0.convs.0.1.weight, %self.decoder.aspp.0.convs.0.1.bias, %self.decoder.aspp.0.convs.0.1.running_mean, %self.decoder.aspp.0.convs.0.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1401 : Tensor = aten::relu(%input.323), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.327 : Tensor = aten::_convolution(%input.325, %self.decoder.aspp.0.convs.1.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.0/__module.decoder.aspp.0.convs.1.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.329 : Tensor = aten::batch_norm(%input.327, %self.decoder.aspp.0.convs.1.1.weight, %self.decoder.aspp.0.convs.1.1.bias, %self.decoder.aspp.0.convs.1.1.running_mean, %self.decoder.aspp.0.convs.1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1405 : Tensor = aten::relu(%input.329), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.333 : Tensor = aten::_convolution(%input.331, %self.decoder.aspp.0.convs.2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.0/__module.decoder.aspp.0.convs.2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.335 : Tensor = aten::batch_norm(%input.333, %self.decoder.aspp.0.convs.2.1.weight, %self.decoder.aspp.0.convs.2.1.bias, %self.decoder.aspp.0.convs.2.1.running_mean, %self.decoder.aspp.0.convs.2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1409 : Tensor = aten::relu(%input.335), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.339 : Tensor = aten::_convolution(%input.337, %self.decoder.aspp.0.convs.3.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.0/__module.decoder.aspp.0.convs.3.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.341 : Tensor = aten::batch_norm(%input.339, %self.decoder.aspp.0.convs.3.1.weight, %self.decoder.aspp.0.convs.3.1.bias, %self.decoder.aspp.0.convs.3.1.running_mean, %self.decoder.aspp.0.convs.3.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1413 : Tensor = aten::relu(%input.341), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.353 : Tensor = aten::_convolution(%input.351, %self.decoder.aspp.0.project.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.355 : Tensor = aten::batch_norm(%input.353, %self.decoder.aspp.0.project.1.weight, %self.decoder.aspp.0.project.1.bias, %self.decoder.aspp.0.project.1.running_mean, %self.decoder.aspp.0.project.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.357 : Tensor = aten::relu(%input.355), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.363 : Tensor = aten::_convolution(%input.361, %self.decoder.aspp.1.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.1/__module.decoder.aspp.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.365 : Tensor = aten::batch_norm(%input.363, %self.decoder.aspp.2.weight, %self.decoder.aspp.2.bias, %self.decoder.aspp.2.running_mean, %self.decoder.aspp.2.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.367 : Tensor = aten::relu(%input.365), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.377 : Tensor = aten::_convolution(%input.375, %self.decoder.block2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.0/__module.decoder.block2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.379 : Tensor = aten::batch_norm(%input.377, %self.decoder.block2.1.weight, %self.decoder.block2.1.bias, %self.decoder.block2.1.running_mean, %self.decoder.block2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.381 : Tensor = aten::relu(%input.379), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.383 : Tensor = aten::_convolution(%input.381, %self.segmentation_head.0.weight, %self.segmentation_head.0.bias, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.segmentation_head/__module.segmentation_head.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 : invalid argument
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
TRT Module created

Benchmarking TRT Module...
Warm up ...
Start timing ...
Iteration 100/1000, ave batch time 2.74 ms
Iteration 200/1000, ave batch time 2.75 ms
Iteration 300/1000, ave batch time 2.74 ms
Iteration 400/1000, ave batch time 2.75 ms
Iteration 500/1000, ave batch time 2.74 ms
Iteration 600/1000, ave batch time 2.74 ms
Iteration 700/1000, ave batch time 2.75 ms
Iteration 800/1000, ave batch time 2.75 ms
Iteration 900/1000, ave batch time 2.75 ms
Iteration 1000/1000, ave batch time 2.75 ms
Input shape: torch.Size([1, 3, 512, 512])
Output features size: torch.Size([1, 3, 512, 512])

Environment

newest PyTorch NGC docker image

My Windows PC mounts a RTX3080.
My Ubuntu desktop mounts a GTX1080Ti.

Additional context

@andreabonvini andreabonvini added the question Further information is requested label Feb 8, 2022
@andreabonvini andreabonvini changed the title ❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? ❓ [Question] [Bug] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? Feb 8, 2022
@andreabonvini andreabonvini changed the title ❓ [Question] [Bug] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? ❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows? Feb 8, 2022
@narendasan
Copy link
Collaborator

Did you verify that your GPU is accessible in WSL as well as in a container inside WSL?

@narendasan
Copy link
Collaborator

narendasan commented Feb 9, 2022

I tried out one of our example notebooks in WSL2 in the 22.01 container. Seems like things work properly. I would make sure that your GPU is accessible from WSL

@narendasan
Copy link
Collaborator

narendasan commented Feb 9, 2022

Also are you planning to run this model in deployment inside WSL or in Windows? Iirc, there isn't necessarily compatibility across operating systems (WSL would fall under Linux). @ncomly-nvidia do you know? I think however that running in WSL should be fine as long as it fits your usecase

@narendasan
Copy link
Collaborator

I tried out one of our example notebooks in WSL2 in the 22.01 container. Seems like things work properly. I would make sure that your GPU is accessible from WSL

This is on Windows 10: 21H2, with CUDA 11.6 installed on the system and following these instructions https://docs.nvidia.com/cuda/wsl-user-guide/index.html

@andreabonvini
Copy link
Author

Hi @narendasan, thanks for your answer. I solved the first problem (now I have the same behaviour in both WSL and Ubuntu, which is great!) by downloading and installing the latest driver from here. But now I got another problem: I really NEED to use the optimized model in a Windows environment (and not WSL) wth LibTorch. This is the C++ script I'm using to test if the model is functioning correctly:

#include <iostream>
#include <vector>
#include <ATen/Context.h>
#include <torch/torch.h>
#include <torch/script.h>

#include <chrono>


// =============================== SET PARAMETERS ==================================================
std::string MODEL_PATH = "path/to/trt/model.pt";
int nWarmUp = 50;
int nForwardPass = 1000;


int main() {

	const torch::Device device = torch::Device(torch::kCUDA, 0);
	torch::jit::script::Module model;

	std::cout << "Trying to load the model" << std::endl;
	try {
		model = torch::jit::load(MODEL_PATH, device);
		model.eval();
		std::cout << "AI model loaded successfully." << std::endl;
	}
	catch (const c10::Error& e) {
		std::cerr << e.what() << std::endl;
	}

	std::cout << "Warming up model..." << std::endl;
	auto dummy = torch::zeros({ 1, 3, 512, 512 }).to(device);
	torch::Tensor output;
	std::vector<torch::jit::IValue> inputs;
	inputs.clear();
	inputs.emplace_back(dummy);ù
	std::cout << "Warming up...";

	for (int i = 0; i < nWarmUp; i++) {
		output = model.forward(inputs).toTensor();
		torch::cuda::synchronize();
	}


	using milli = std::chrono::milliseconds;
	std::vector<double> times;

	for (int i = 0; i < nForwardPass; i++) {
		auto start = std::chrono::high_resolution_clock::now();
		output = model.forward(inputs).toTensor();
		torch::cuda::synchronize();
		auto finish = std::chrono::high_resolution_clock::now();
		auto t = std::chrono::duration_cast<milli>(finish - start).count();
		times.push_back(static_cast<double>(t));
	}

	std::cout << "\nProfiling concluded. Printing report...\nf" << std::endl;
	std::cout << "==>  MIN inference time: " << *std::min_element(times.begin(), times.end()) << std::endl;
	std::cout << "==> MEAN inference time: " << std::accumulate(std::begin(times), std::end(times), 0.0) / static_cast<double>(times.size()) << std::endl;

}

If I try to run this C++ script with the optimized model, the program fails on loading.

image

Is there any way to make this work?

Thanks

@narendasan
Copy link
Collaborator

You can try turning on debug logging to see if it is torch-trt's runtime failing. Also its worth trying with a non compiled torchscript module beforehand as well

@andreabonvini
Copy link
Author

Hi @narendasan, what do you mean with "turning on debug logging"? The error, as shown in the stack trace, happens in an externel .dll (torch_cpu), the source code should be somewhere around this line of code. Moreover, I already tried to run the code with the same traced script module (not optimized with TorchTensorRT) and it works well.

@narendasan
Copy link
Collaborator

You can enable torchtrt debug logging with torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Debug) before your run things related to torch_tensorrt

@narendasan
Copy link
Collaborator

Also how did you build Torch-TensorRT for windows?

@andreabonvini
Copy link
Author

Ok thanks, I will include here just the ouptut of the tracing and optimization process through TorchTensorRT.
This is the output I have when I run the script without debug logging enabled:

Tracing PyTorch model...
Script Module generated.
Creating TRT module...
[1, 256, 128, 128]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
[1, 256, 128, 128]
[1, 3, 512, 512]
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - Mean converter disregards dtype
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected

This is the output I have when I run the script with debug logging enabled.

I didn't build TorchTensorRT for Windows, I'm using the latest PyTorch Docker container (22.01) on WSL2 by following the official instructions here. But I need to run the model through LibTorch on Windows though.

@narendasan
Copy link
Collaborator

To run the model on windows with libtorch, you need at minimum need to compile the libtorchtrt_runtime library which is the runtime extension to run compiled torchtrt programs. We used to have windows support for a little bit but this quickly degraded. Perhaps just working on the runtime library is easier to get working (just building //core/runtime).

@andreabonvini
Copy link
Author

Ok thanks @narendasan, following your advice I'm trying to compile the whole project on Windows 10, the idea is to build the Python package and optimize the model locally. Firstly, I was able to succesfully run the command
bazel build //:libtorchtrt --compilation_mode opt by modifying a series of files, as I will show below.
This is how my WORKSPACE file looks like:

workspace(name = "Torch-TensorRT")

load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
load("@bazel_tools//tools/build_defs/repo:git.bzl", "git_repository")

http_archive(
    name = "rules_python",
    sha256 = "778197e26c5fbeb07ac2a2c5ae405b30f6cb7ad1f5510ea6fdac03bded96cc6f",
    url = "https://github.com/bazelbuild/rules_python/releases/download/0.2.0/rules_python-0.2.0.tar.gz",
)

load("@rules_python//python:pip.bzl", "pip_install")

http_archive(
    name = "rules_pkg",
    sha256 = "038f1caa773a7e35b3663865ffb003169c6a71dc995e39bf4815792f385d837d",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_pkg/releases/download/0.4.0/rules_pkg-0.4.0.tar.gz",
        "https://github.com/bazelbuild/rules_pkg/releases/download/0.4.0/rules_pkg-0.4.0.tar.gz",
    ],
)

load("@rules_pkg//:deps.bzl", "rules_pkg_dependencies")

rules_pkg_dependencies()

git_repository(
    name = "googletest",
    commit = "703bd9caab50b139428cea1aaff9974ebee5742e",
    remote = "https://github.com/google/googletest",
    shallow_since = "1570114335 -0400",
)

# External dependency for trtorch if you already have precompiled binaries.
# This is currently used in pytorch NGC container CI testing.
#local_repository(
#    name = "trtorch",
#    path = "C:/Python39/Lib/site-packages/trtorch"
#)

# CUDA should be installed on the system locally
new_local_repository(
    name = "cuda",
    build_file = "@//third_party/cuda:BUILD",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
)

new_local_repository(
    name = "cublas",
    build_file = "@//third_party/cublas:BUILD",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
)

####################################################################################
# Locally installed dependencies (use in cases of custom dependencies or aarch64)
####################################################################################

# NOTE: In the case you are using just the pre-cxx11-abi path or just the cxx11 abi path
# with your local libtorch, just point deps at the same path to satisfy bazel.

# NOTE: NVIDIA's aarch64 PyTorch (python) wheel file uses the CXX11 ABI unlike PyTorch's standard
# x86_64 python distribution. If using NVIDIA's version just point to the root of the package
# for both versions here and do not use --config=pre-cxx11-abi

new_local_repository(
    name = "libtorch",
    path = "C:/src/libtorch1.10.0-cuda11.3-release/libtorch",
    # path = "C:/Users/myUser/appdata/local/packages/pythonsoftwarefoundation.python.3.9_qbz5n2kfra8p0/localcache/local-packages/python39/site-packages/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "libtorch_pre_cxx11_abi",
    path = "C:/src/libtorch1.10.0-cuda11.3-release/libtorch",
    # path = "C:/Users/myUser/appdata/local/packages/pythonsoftwarefoundation.python.3.9_qbz5n2kfra8p0/localcache/local-packages/python39/site-packages/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "cudnn",
    path = "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.3",
    build_file = "@//third_party/cudnn/local:BUILD"
)

new_local_repository(
   name = "tensorrt",
   path = "C:/tensorrt",
   build_file = "@//third_party/tensorrt/local:BUILD"
)
  • The first problem I had when trying to compile the project with bazel was the following
C:\Torch-TensorRT>bazel build //:libtorchtrt --compilation_mode opt

...

core/partitioning/shape_analysis.cpp(130): error C2665: 'torch_tensorrt::core::util::toDims': none of the 2 overloads could convert all the argument types
.\core/util/trt_util.h(141): note: could be 'nvinfer1::Dims torch_tensorrt::core::util::toDims(c10::List<int64_t>)'
.\core/util/trt_util.h(140): note: or       'nvinfer1::Dims torch_tensorrt::core::util::toDims(c10::IntArrayRef)'
core/partitioning/shape_analysis.cpp(130): note: while trying to match the argument list '(c10::List<long>)'
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 2.588s, Critical Path: 2.03s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

In order to solve it, it was enough to change core/partitioning/shape_analysis.cpp at line 130:
from (:-1:)
input_shapes.push_back(util::toVec(util::toDims(c10::List<long int>({1}))));
to (:+1:)
input_shapes.push_back(util::toVec(util::toDims(c10::List<long long>({1}))));

  • After that, I retried and obtained the following output
C:\Torch-TensorRT>bazel build //:libtorchtrt --compilation_mode opt
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: missing input file 'external/cudnn/bin/cudnn64_7.dll', owner: '@cudnn//:bin/cudnn64_7.dll'
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: 1 input file(s) do not exist
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10 Linking cpp/lib/torch_tensorrt.dll failed: 1 input file(s) do not exist
INFO: Elapsed time: 39.788s, Critical Path: 14.73s
INFO: 81 processes: 2 internal, 79 local.
FAILED: Build did NOT complete successfully

The solution here is to change the file C:\Torch-TensorRT\third_party\cudnn\local\BUILD
from (:-1:)

cc_import(
    name = "cudnn_lib",
    shared_library = select({
        ":aarch64_linux": "lib/aarch64-linux-gnu/libcudnn.so",
        ":windows": "bin/cudnn64_7.dll",  #Need to configure specific version for windows
        "//conditions:default": "lib/x86_64-linux-gnu/libcudnn.so",
    }),
    visibility = ["//visibility:private"],
)

to (:+1:)

cc_import(
    name = "cudnn_lib",
    shared_library = select({
        ":aarch64_linux": "lib/aarch64-linux-gnu/libcudnn.so",
        ":windows": "bin/cudnn64_8.dll",  #Need to configure specific version for windows
        "//conditions:default": "lib/x86_64-linux-gnu/libcudnn.so",
    }),
    visibility = ["//visibility:private"],
)
  • After that...
C:\Torch-TensorRT>bazel build //:libtorchtrt --compilation_mode opt
INFO: Analyzed target //:libtorchtrt (1 packages loaded, 42 targets configured).
INFO: Found 1 target...
ERROR: C:/torch-tensorrt/cpp/lib/BUILD:34:10: Linking cpp/lib/torch_tensorrt.dll failed: (Exit 1120): link.exe failed: error executing command C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\link.exe @bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll-2.params
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/Wl,-rpath,lib/'; ignored
   Creating library bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.lib and object bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.exp
runtime.lo.lib(TRTEngine.obj) : error LNK2019: unresolved external symbol createInferRuntime_INTERNAL referenced in function "public: __cdecl torch_tensorrt::core::runtime::TRTEngine::TRTEngine(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,struct torch_tensorrt::core::runtime::CudaDevice)" (??0TRTEngine@runtime@core@torch_tensorrt@@QEAA@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@0UCudaDevice@123@@Z)
torch_tensorrt_plugins.lo.lib(register_plugins.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(normalize_plugin.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(interpolate_plugin.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(pooling.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(normalize.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(interpolate.obj) : error LNK2001: unresolved external symbol getPluginRegistry
converters.lo.lib(batch_norm.obj) : error LNK2001: unresolved external symbol getPluginRegistry
torch_tensorrt_plugins.lo.lib(register_plugins.obj) : error LNK2019: unresolved external symbol initLibNvInferPlugins referenced in function "public: __cdecl torch_tensorrt::core::plugins::impl::TorchTRTPluginRegistry::TorchTRTPluginRegistry(void)" (??0TorchTRTPluginRegistry@impl@plugins@core@torch_tensorrt@@QEAA@XZ)
conversionctx.lib(ConversionCtx.obj) : error LNK2019: unresolved external symbol createInferBuilder_INTERNAL referenced in function "public: __cdecl torch_tensorrt::core::conversion::ConversionCtx::ConversionCtx(struct torch_tensorrt::core::conversion::BuilderSettings)" (??0ConversionCtx@conversion@core@torch_tensorrt@@QEAA@UBuilderSettings@123@@Z)
bazel-out\x64_windows-opt\bin\cpp\lib\torch_tensorrt.dll : fatal error LNK1120: 4 unresolved externals
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 0.874s, Critical Path: 0.30s
INFO: 2 processes: 2 internal.
FAILED: Build did NOT complete successfully

That I solved, as suggested by @yuriishutkin in issue #690 (referenced in issue #226) by substituting in third_party/tensorrt/local/BUILD
this (:-1:):

cc_library(
    name = "nvinferplugin",
    hdrs = select({
        ":aarch64_linux": glob(["include/aarch64-linux-gnu/NvInferPlugin*.h"]),
        ":windows": glob(["include/NvInferPlugin*.h"]),
        "//conditions:default": glob(["include/x86_64-linux-gnu/NvInferPlugin*.h"]),
    }),
    srcs = select({
        ":aarch64_linux": ["lib/aarch64-linux-gnu/libnvinfer_plugin.so"],
        ":windows": ["lib/nvinfer_plugin.dll"],
        "//conditions:default": ["lib/x86_64-linux-gnu/libnvinfer_plugin.so"],
    }),
    includes = select({
        ":aarch64_linux": ["include/aarch64-linux-gnu/"],
        ":windows": ["include/"],
        "//conditions:default": ["include/x86_64-linux-gnu/"],
    }),
    deps = [
        "nvinfer",
        "@cuda//:cudart",
        "@cudnn",
    ] + select({
        ":windows": ["@cuda//:cublas"],
        "//conditions:default": ["@cuda//:cublas"],
    }),
    alwayslink = True,
    copts = [
        "-pthread"
    ],
    linkopts = [
        "-lpthread",
    ] + select({
        ":aarch64_linux": ["-Wl,--no-as-needed -ldl -lrt -Wl,--as-needed"],
        "//conditions:default": []
    })
)

with this (:+1:):

cc_library(
    name = "nvinferplugin",
    hdrs = select({
        ":aarch64_linux": glob(["include/aarch64-linux-gnu/NvInferPlugin*.h"]),
        ":windows": glob(["include/NvInferPlugin*.h"]),
        "//conditions:default": glob(["include/x86_64-linux-gnu/NvInferPlugin*.h"]),
    }),
    srcs = select({
        ":aarch64_linux": ["lib/aarch64-linux-gnu/libnvinfer_plugin.so"],
        ":windows": ["lib/nvinfer_plugin.lib","lib/nvinfer_plugin.dll"],
        "//conditions:default": ["lib/x86_64-linux-gnu/libnvinfer_plugin.so"],
    }),
    includes = select({
        ":aarch64_linux": ["include/aarch64-linux-gnu/"],
        ":windows": ["include/"],
        "//conditions:default": ["include/x86_64-linux-gnu/"],
    }),
    deps = [
        "nvinfer",
        "@cuda//:cudart",
        "@cudnn",
    ] + select({
        ":windows": ["@cuda//:cublas", "nvinfer_static_lib"],
        "//conditions:default": ["@cuda//:cublas"],
    }),
    alwayslink = True,
    copts = [
        "-pthread"
    ],
    linkopts = [
        "-lpthread",
    ] + select({
        ":aarch64_linux": ["-Wl,--no-as-needed -ldl -lrt -Wl,--as-needed"],
        "//conditions:default": []
    })
)
  • Now, apparently I'm able to compile the project correctly, through the command bazel build //:libtorchtrt --compilation_mode opt. This is my output.
C:\Torch-TensorRT>bazel build //:libtorchtrt --compilation_mode opt
INFO: Analyzed target //:libtorchtrt (1 packages loaded, 48 targets configured).
INFO: Found 1 target...
INFO: From Linking cpp/lib/torch_tensorrt.dll:
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/Wl,-rpath,lib/'; ignored
   Creating library bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.lib and object bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.exp
Target //:libtorchtrt up-to-date:
  bazel-bin/libtorchtrt.tar.gz
INFO: Elapsed time: 2.278s, Critical Path: 1.85s
INFO: 5 processes: 2 internal, 3 local.
INFO: Build completed successfully, 5 total actions
  • So now I think I can easily build my Python package and try to optimize my model, HOWEVER, when I try to run the following command:
    C:\Torch-TensorRT\py>python3 setup.py install
    I got the following output:
    Could not find bazel in PATH
    It seems that setup.py isn't able to detect where is my bazel.exe file, so i hard-code it by first running
C:\Torch-TensorRT\py>where bazel
C:\ProgramData\chocolatey\bin\bazel.exe

And then changing setup.py:
from (👎🏼 ):

BAZEL_EXE = which("bazelisk")

if BAZEL_EXE is None:
    BAZEL_EXE = which("bazel")
    if BAZEL_EXE is None:
        sys.exit("Could not find bazel in PATH")

to (:+1):

BAZEL_EXE = "C:/ProgramData/chocolatey/bin/bazel.exe" 

if BAZEL_EXE is None:
    BAZEL_EXE = which("bazel")
    if BAZEL_EXE is None:
        sys.exit("Could not find bazel in PATH")
  • So I try again and...
C:\Torch-TensorRT\py>python3 setup.py install
...
INFO: From Linking cpp/lib/torch_tensorrt.dll:
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/Wl,-rpath,lib/'; ignored
LINK : warning LNK4044: unrecognized option '/D_GLIBCXX_USE_CXX11_ABI=0'; ignored
   Creating library bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.lib and object bazel-out/x64_windows-opt/bin/cpp/lib/torch_tensorrt.dll.if.exp
Target //:libtorchtrt up-to-date:
  bazel-bin/libtorchtrt.tar.gz
INFO: Elapsed time: 56.634s, Critical Path: 18.50s
INFO: 112 processes: 2 internal, 110 local.
INFO: Build completed successfully, 112 total actions
Traceback (most recent call last):
  File "C:\Torch-TensorRT\py\setup.py", line 260, in <module>
    setup(
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\site-packages\setuptools\__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\distutils\core.py", line 148, in setup
    dist.run_commands()
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\distutils\dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\lib\distutils\dist.py", line 985, in run_command
    cmd_obj.run()
  File "C:\Torch-TensorRT\py\setup.py", line 160, in run
    gen_version_file()
  File "C:\Torch-TensorRT\py\setup.py", line 113, in gen_version_file
    os.mknod(dir_path + '/torch_tensorrt/_version.py')
AttributeError: module 'os' has no attribute 'mknod'

That makes sense, since os.mknod is available only for Unix systems, it's enough to replace it with open(), so I change (again) setup.py
from (:-1:):

def gen_version_file():
    if not os.path.exists(dir_path + '/torch_tensorrt/_version.py'):
        os.mknod(dir_path + '/torch_tensorrt/_version.py')

    with open(dir_path + '/torch_tensorrt/_version.py', 'w') as f:
        print("creating version file")
        f.write("__version__ = \"" + __version__ + '\"')

to(:+1:):

def gen_version_file():
    if not os.path.exists(dir_path + '/torch_tensorrt/_version.py'):
        open(dir_path + '/torch_tensorrt/_version.py',"a").close()

    with open(dir_path + '/torch_tensorrt/_version.py', 'w') as f:
        print("creating version file")
        f.write("__version__ = \"" + __version__ + '\"')
  • Now, before retrying, I clean my environment...
C:\Torch-TensorRT\py>python3 setup.py clean
running clean
Removing build
error: [WinError 267] The directory name is invalid: 'C:\\Torch-TensorRT\\py\\build'

Well, another problem, but apparently it was enough to rename BUILD to BUILD.bazel and then i was able to clean my environment:

C:\Torch-TensorRT\py>python3 setup.py clean
running clean
Removing torch_tensorrt\lib
Removing torch_tensorrt\include
Removing torch_tensorrt\_version.py
Removing torch_tensorrt\BUILD
Removing torch_tensorrt\WORKSPACE
Removing torch_tensorrt\LICENSE

So I try again to build the Python package....

C:\Torch-TensorRT\py>python3 setup.py install
...
C:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\utils\cpp_extension.py:316: UserWarning: Error checking compiler version for cl: [WinError 2] Impossibile trovare il file specificato
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'torch_tensorrt._C' extension
creating build\temp.win-amd64-3.9
creating build\temp.win-amd64-3.9\Release
creating build\temp.win-amd64-3.9\Release\torch_tensorrt
creating build\temp.win-amd64-3.9\Release\torch_tensorrt\csrc
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\cl.exe /c /nologo /Ox /W3 /GL /DNDEBUG /MD -UNDEBUG -IC:\Torch-TensorRT\pytorch_tensorrt/csrc -IC:\Torch-TensorRT\pytorch_tensorrt/include -IC:\Torch-TensorRT\py/../bazel-TRTorch/external/tensorrt/include -IC:\Torch-TensorRT\py/../bazel-Torch-TensorRT-Preview/external/tensorrt/include -IC:\Torch-TensorRT\py/../bazel-Torch-TensorRT/external/tensorrt/include -IC:\Torch-TensorRT\py/../ -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\torch\csrc\api\include -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\TH -IC:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\include\THC -IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\include -IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\include -IC:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\include -IC:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\include -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\ucrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\shared -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\winrt -IC:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\cppwinrt /EHsc /Tptorch_tensorrt/csrc/register_tensorrt_classes.cpp /Fobuild\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/register_tensorrt_classes.obj /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /EHsc -Wno-deprecated -Wno-deprecated-declarations -D_GLIBCXX_USE_CXX11_ABI=0 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0
cl : Command line warning D9025 : overriding '/DNDEBUG' with '/UNDEBUG'
cl : Command line error D8021 : invalid numeric argument '/Wno-deprecated'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\cl.exe' failed with exit code 2

So, always in setup.py I remove all this -Wno-deprecated options and substitute
this (:-1:):

ext_modules = [
    cpp_extension.CUDAExtension(
        'torch_tensorrt._C', [
            'torch_tensorrt/csrc/torch_tensorrt_py.cpp',
            'torch_tensorrt/csrc/tensorrt_backend.cpp',
            'torch_tensorrt/csrc/tensorrt_classes.cpp',
            'torch_tensorrt/csrc/register_tensorrt_classes.cpp',
        ],
        library_dirs=[(dir_path + '/torch_tensorrt/lib/'), "/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu"],
        libraries=["torchtrt"],
        include_dirs=[
            dir_path + "torch_tensorrt/csrc", dir_path + "torch_tensorrt/include",
            dir_path + "/../bazel-TRTorch/external/tensorrt/include",
            dir_path + "/../bazel-Torch-TensorRT-Preview/external/tensorrt/include",
            dir_path + "/../bazel-Torch-TensorRT/external/tensorrt/include", dir_path + "/../"
        ],
        extra_compile_args=[
            "-Wno-deprecated",
            "-Wno-deprecated-declarations",
        ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]),
        extra_link_args=[
            "-Wno-deprecated", "-Wno-deprecated-declarations", "-Wl,--no-as-needed", "-ltorchtrt",
            "-Wl,-rpath,$ORIGIN/lib", "-lpthread", "-ldl", "-lutil", "-lrt", "-lm", "-Xlinker", "-export-dynamic"
        ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]),
        undef_macros=["NDEBUG"])
]

with this (:+1:):

ext_modules = [
    cpp_extension.CUDAExtension(
        'torch_tensorrt._C', [
            'torch_tensorrt/csrc/torch_tensorrt_py.cpp',
            'torch_tensorrt/csrc/tensorrt_backend.cpp',
            'torch_tensorrt/csrc/tensorrt_classes.cpp',
            'torch_tensorrt/csrc/register_tensorrt_classes.cpp',
        ],
        library_dirs=[(dir_path + '/torch_tensorrt/lib/'), "/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu"],
        libraries=["torchtrt"],
        include_dirs=[
            dir_path + "torch_tensorrt/csrc", dir_path + "torch_tensorrt/include",
            dir_path + "/../bazel-TRTorch/external/tensorrt/include",
            dir_path + "/../bazel-Torch-TensorRT-Preview/external/tensorrt/include",
            dir_path + "/../bazel-Torch-TensorRT/external/tensorrt/include", dir_path + "/../"
        ],
        extra_compile_args=[] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]),
        extra_link_args=[
             "-Wl,--no-as-needed", "-ltorchtrt",
            "-Wl,-rpath,$ORIGIN/lib", "-lpthread", "-ldl", "-lutil", "-lrt", "-lm", "-Xlinker", "-export-dynamic"
        ] + (["-D_GLIBCXX_USE_CXX11_ABI=1"] if CXX11_ABI else ["-D_GLIBCXX_USE_CXX11_ABI=0"]),
        undef_macros=["NDEBUG"])
]
  • Finally, and now unfortunately I don't know how to proceed, this is the last error I got when I tried to build the Python package for the last time:
C:\Torch-TensorRT\py>python3 setup.py clean
running clean
Removing torch_tensorrt\lib
Removing torch_tensorrt\include
Removing torch_tensorrt\_version.py
Removing torch_tensorrt\BUILD
Removing torch_tensorrt\WORKSPACE
Removing torch_tensorrt\LICENSE
C:\Torch-TensorRT\py>python3 setup.py install
...
C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\HostX86\x64\link.exe /nologo /INCREMENTAL:NO /LTCG /DLL /MANIFEST:EMBED,ID=2 /MANIFESTUAC:NO /LIBPATH:C:\Torch-TensorRT\py/torch_tensorrt/lib/ /LIBPATH:/opt/conda/lib/python3.6/config-3.6m-x86_64-linux-gnu /LIBPATH:C:\Users\myUser\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\torch\lib /LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.3\lib/x64 /LIBPATH:C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\libs /LIBPATH:C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.2800.0_x64__qbz5n2kfra8p0\PCbuild\amd64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\ATLMFC\lib\x64 /LIBPATH:C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\lib\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\ucrt\x64 /LIBPATH:C:\Program Files (x86)\Windows Kits\10\lib\10.0.19041.0\um\x64 torchtrt.lib c10.lib torch.lib torch_cpu.lib torch_python.lib cudart.lib c10_cuda.lib torch_cuda_cu.lib torch_cuda_cpp.lib /EXPORT:PyInit__C build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/register_tensorrt_classes.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/tensorrt_backend.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/tensorrt_classes.obj build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc/torch_tensorrt_py.obj /OUT:build\lib.win-amd64-3.9\torch_tensorrt\_C.cp39-win_amd64.pyd /IMPLIB:build\temp.win-amd64-3.9\Release\torch_tensorrt/csrc\_C.cp39-win_amd64.lib -Wl,--no-as-needed -ltorchtrt -Wl,-rpath,$ORIGIN/lib -lpthread -ldl -lutil -lrt -lm -Xlinker -export-dynamic -D_GLIBCXX_USE_CXX11_ABI=0
LINK : warning LNK4044: unrecognized option '/Wl,--no-as-needed'; ignored
LINK : warning LNK4044: unrecognized option '/ltorchtrt'; ignored
LINK : warning LNK4044: unrecognized option '/Wl,-rpath,$ORIGIN/lib'; ignored
LINK : warning LNK4044: unrecognized option '/lpthread'; ignored
LINK : warning LNK4044: unrecognized option '/ldl'; ignored
LINK : warning LNK4044: unrecognized option '/lutil'; ignored
LINK : warning LNK4044: unrecognized option '/lrt'; ignored
LINK : warning LNK4044: unrecognized option '/lm'; ignored
LINK : warning LNK4044: unrecognized option '/Xlinker'; ignored
LINK : warning LNK4044: unrecognized option '/export-dynamic'; ignored
LINK : warning LNK4044: unrecognized option '/D_GLIBCXX_USE_CXX11_ABI=0'; ignored
LINK : fatal error LNK1181: cannot open input file 'torchtrt.lib'
error: command 'C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30133\\bin\\HostX86\\x64\\link.exe' failed with exit code 1181

All I know is that the linker is trying to link against a static library torchtrt.lib while all I have is a dynamic library torch_tensorrt.dll

@narendasan
Copy link
Collaborator

narendasan commented Feb 20, 2022

That's really cool that you got windows compilation working! So really all you need to move forward with your specific use case is just linking/DL_OPEN libtorchtrt_runtimein your app which should be available from just the c++ compilation. So Python is not strictly required. I suspect for python api compilation we need a new set of equivalent flags for what we have for Linux here:
https://github.com/NVIDIA/Torch-TensorRT/blob/4fd886d08ce77323995b5bf6a21a0d0e8dde8d42/py/setup.py#L231 that would get swapped in for people building for windows. Not sure what those flags would be for MSVC to specify dll over lib

@andreabonvini
Copy link
Author

I actually have no libtorchtrt_runtime file, this is the folder tree of bazel-out/x64_windows-opt/bin/cpp and bazel-out/x64_windows-opt/bin/core/runtime (I can send the full bazel-out.zip if you think it could help).

C:\TORCH-TENSORRT\BAZEL-OUT\X64_WINDOWS-OPT\BIN\CPP
│   torch_tensorrt.lo.lib
│   torch_tensorrt.lo.lib-2.params
│
├───lib
│       torch_tensorrt.dll
│       torch_tensorrt.dll-2.params
│       torch_tensorrt.dll.gen.empty.def
│       torch_tensorrt.dll.if.exp
│       torch_tensorrt.dll.if.lib
│
├───_objs
│   └───torch_tensorrt
│           compile_spec.obj
│           logging.obj
│           ptq.obj
│           torch_tensorrt.obj
│           types.obj
│
└───_virtual_includes
    └───torch_tensorrt
        └───torch_tensorrt
                logging.h
                macros.h
                ptq.h
                torch_tensorrt.h
C:\TORCH-TENSORRT\BAZEL-OUT\X64_WINDOWS-OPT\BIN\CORE\RUNTIME
│   include.args
│   include.tar
│   runtime.lo.lib
│   runtime.lo.lib-2.params
│
└───_objs
    └───runtime
            CudaDevice.obj
            DeviceList.obj
            register_trt_op.obj
            runtime.obj
            TRTEngine.obj

Moreover, I' not sure to understand how linking against libtorchtrt_runtime would solve my problem (given that my c++ app crashes in model = torch::jit::load(MODEL_PATH, device);)

@narendasan
Copy link
Collaborator

Moreover, I' not sure to understand how linking against libtorchtrt_runtime would solve my problem (given that my c++ app crashes in model = torch::jit::load(MODEL_PATH, device);)

I suspect that the reason a compiled module is throwing an error on load is because you need the LibTorch runtime extension which add support for Torch-TensorRT compiled modules to deserialize and run. The lightest way to do this is by linking libtorchtrt_runtime to your application which simply loads the runtime extension.

Probably what you need to do to add the torchtrt_runtime.dll target is to modify //cpp/lib/BUILD to add the following target similar to torch_tensorrt.dll

cc_binary(
    name = "torchtrt_runtime.dll",
    srcs = [],
    linkshared = True,
    linkstatic = True,
    deps = [
        "//core/runtime:runtime",
        "//core/plugins:torch_tensorrt_plugins"
    ],
)

@jonahclarsen
Copy link

@andreabonvini did you end up solving this issue? I am facing a similar problem now.

@yuriishutkin
Copy link

torch_tensorrt.dll file is actually the name of generated library as narendasan said.
torch_tensorrt.dll.if.lib should be used when you want to link it with the rest of your application.

But for me it was not end of the story, because after I've built the torch_tensorrt library it appeared to have conflicts with installed torch library. It just crashed with exception somewhere inside the torch. I suppose it's because torch has C++ in its interface, and my compiler version differs from the compiler that was used to build torch.

So, solution can be to match the version of compiler that torch is built or to build torch from sources.

Alternatively, you can switch to WSL and install prebuilt torch_tensorrt package or use ready container from here: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch.

@andreabonvini
Copy link
Author

@jonahclarsen Hi! Unfortunately not, I think I was kinda able to generate torchtrt_runtime.dll but something was wrong when I tried to link it to my program. I haven't had the time to continue investigating recently, but for sure I will retry in the following two months. I will 100% follow this thread during my spare time though.

@jonahclarsen
Copy link

@jonahclarsen Hi! Unfortunately not, I think I was kinda able to generate torchtrt_runtime.dll but something was wrong when I tried to link it to my program. I haven't had the time to continue investigating recently, but for sure I will retry in the following two months. I will 100% follow this thread during my spare time though.

Okay, too bad! Hopefully we can figure it all out soon, I am highly motivated to get this into my Libtorch Windows program.

@jonahclarsen
Copy link

@yuriishutkin When I tried linking my program against torch_tensorrt.dll.if.lib, I still get 'unresolved external symbol' linker errors, even just using a the Input() function that isn't in any namespaces. Are you saying that file was enough for you to successfully link your program? Were you able to use namespaces like torchscript?

@yuriishutkin
Copy link

yuriishutkin commented May 2, 2022

@yuriishutkin When I tried linking my program against torch_tensorrt.dll.if.lib, I still get 'unresolved external symbol' linker errors, even just using a the Input() function that isn't in any namespaces. Are you saying that file was enough for you to successfully link your program? Were you able to use namespaces like torchscript?

Right, I've added runtime and plugin sources into the same library. Also, I had problems with exporting symbols, because MSVC does not have option to export all symbols like GCC does. If you also use MSVC, you need to specify exported symbols manually, e.g. in export file.

For me the following worked:

 cpp/lib/BUILD | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/cpp/lib/BUILD b/cpp/lib/BUILD
index e6d50613..58102867 100644
--- a/cpp/lib/BUILD
+++ b/cpp/lib/BUILD
@@ -38,5 +38,10 @@ cc_binary(
     linkstatic = True,
     deps = [
         "//cpp:torch_tensorrt",
+        "//core/runtime:runtime",
+        "//core/plugins:torch_tensorrt_plugins"
     ],
+	win_def_file = "exports.def"
 )
+
+

exports.def is in attached archive, place it near cpp/lib/BUILD . Yours can be different depending on the version of lib you are using. Just add all unresolved externals to the list.
exports.zip

@jonahclarsen
Copy link

@yuriishutkin Okay, I went another route, by adding __declspec(dllexport) to every unresolved external, I've detailed this in #1014. However, I am still getting this error related to a function defined in nvinfer_plugins, and I have yet to find a way to resolve it:

Creating library bazel-out/x64_windows-opt/bin/core/plugins/torch_tensorrt_plugins.if.lib and object bazel-out/x64_windows-opt/bin/core/plugins/torch_tensorrt_plugins.if.exp
register_plugins.obj : error LNK2019: unresolved external symbol initLibNvInferPlugins referenced in function "public: __cdecl torch_tensorrt::core::plugins::impl::TorchTRTPluginRegistry::TorchTRTPluginRegistry(void)" (??0TorchTRTPluginRegistry@impl@plugins@core@torch_tensorrt@@qeaa@XZ)

Would you be willing to share your entire WORKSPACE and cpp/lib/BUILD files, or ideally even your entire project that was able to succesfully compile the .lib file?

@yuriishutkin
Copy link

yuriishutkin commented May 3, 2022

@jonahclarsen Sure, please take a look.

https://github.com/yuriishutkin/Torch-TensorRT/tree/windows

I run in py directory
python setup.py install

It builds torch_tensorrt.dll + torch_tensorrt.dll.if.lib and then links it to _C lib. The only thing, I do manually copy bazel-out\x64_windows-opt\bin\cpp\lib\torch_tensorrt.dll.if.lib to py\torch_tensorrt\lib\ because bazel do not copy this file automatically.

But once again, for me resulting _C lib is not loaded successfully in python because of exception inside torch.

@ncomly-nvidia ncomly-nvidia added the channel: windows bugs, questions, & RFEs around Windows label May 3, 2022
@narendasan narendasan self-assigned this May 18, 2022
@github-actions
Copy link

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
channel: windows bugs, questions, & RFEs around Windows No Activity question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants