Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[speedster] _dl_check_map_versions assertion error with optimize_model and ONNX compilers #346

Open
trent-s opened this issue Jun 20, 2023 · 3 comments

Comments

@trent-s
Copy link

trent-s commented Jun 20, 2023

Hi. Thank you for your useful work with Speedster. I would like to report an assertion error I encountered following the quickstart documentation for PyTorch.

First off, I used the container documented at https://docs.nebuly.com/Speedster/installation/#optional-download-docker-images-with-frameworks-and-optimizers .
Then I followed the PyTorch quick start code documented at https://github.com/nebuly-ai/nebuly/tree/main/optimization/speedster

Test code from above URL:

import torch
import torchvision.models as models
from speedster import optimize_model

model = models.resnet50()  
input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]

optimized_model = optimize_model(
    model, 
    input_data=input_data, 
    optimization_time="constrained",
    metric_drop_ths=0.05
)

This log output demonstrates the assertion error:

Container image Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

https://developer.nvidia.com/tensorrt

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

To install Python sample dependencies, run /opt/tensorrt/python/python_setup.sh

To install the open-source samples corresponding to this TensorRT release version
run /opt/tensorrt/install_opensource.sh.  To build the open source parsers,
plugins, and samples for current top-of-tree on master or a different branch,
run /opt/tensorrt/install_opensource.sh -b <branch>
See https://github.com/NVIDIA/TensorRT for more information.

root@dc82462885fe:/# python
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> import torchvision.models as models
>>> from speedster import optimize_model
2023-06-20 01:35:55.736961: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-20 01:35:55.795719: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-20 01:35:57.801761: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
>>>
>>> model = models.resnet50()
>>> input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]
>>>
>>> optimized_model = optimize_model(
...     model,
...     input_data=input_data,
...     optimization_time="constrained",
...     metric_drop_ths=0.05
... )
2023-06-20 01:36:18 | INFO     | Running Speedster on GPU:0

2023-06-20 01:36:22 | INFO     | Benchmark performance of original model
2023-06-20 01:36:23 | INFO     | Original model latency: 0.0042613792419433595 sec/iter
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

2023-06-20 01:36:24 | INFO     | [1/2] Running PyTorch Optimization Pipeline
2023-06-20 01:36:24 | INFO     | Optimizing with PytorchBackendCompiler and q_type: None.
2023-06-20 01:36:34 | INFO     | Optimized model latency: 0.0032427310943603516 sec/iter
2023-06-20 01:36:34 | INFO     | Optimizing with PytorchBackendCompiler and q_type: QuantizationType.HALF.
2023-06-20 01:36:45 | INFO     | Optimized model latency: 0.004563808441162109 sec/iter
2023-06-20 01:36:45 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: None.
2023-06-20 01:43:58 | INFO     | Optimized model latency: 0.007601022720336914 sec/iter
2023-06-20 01:43:58 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: QuantizationType.HALF.
2023-06-20 01:53:28 | INFO     | Optimized model latency: 0.008110284805297852 sec/iter
2023-06-20 01:53:28 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: QuantizationType.DYNAMIC.
2023-06-20 02:02:03 | WARNING  | The optimized model will be discarded due to poor results obtained with the given metric.
2023-06-20 02:02:03 | INFO     | [2/2] Running ONNX Optimization Pipeline
2023-06-20 02:02:03 | INFO     | Optimizing with ONNXCompiler and q_type: None.
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!

I have found that if I skip ONNX related compilers that it seems to work fine:

e.g. adding ignore_compilers=["onnx_tensor_rt","onnx_tvm","onnxruntime","tensor_rt", "tvm"] to optimize_model as shown below:

import torch
import torchvision.models as models
from speedster import optimize_model

model = models.resnet50()  
input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]

optimized_model = optimize_model(
    model, 
    input_data=input_data, 
    optimization_time="constrained",
    metric_drop_ths=0.05,
    ignore_compilers=["onnx_tensor_rt","onnx_tvm","onnxruntime","tensor_rt", "tvm"],
)

Successful log output looks like this:

root@dc82462885fe:/# python
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> import torchvision.models as models
>>> from speedster import optimize_model
2023-06-20 02:10:52.634197: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-20 02:10:52.685749: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-20 02:10:54.678411: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
>>>
>>> model = models.resnet50()
>>> input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]
>>>
>>> optimized_model = optimize_model(
...     model,
...     input_data=input_data,
...     optimization_time="constrained",
...     metric_drop_ths=0.05,
...     ignore_compilers=["onnx_tensor_rt","onnx_tvm","onnxruntime","tensor_rt", "tvm"],
... )
2023-06-20 02:11:01 | INFO     | Running Speedster on GPU:0
2023-06-20 02:11:05 | INFO     | Benchmark performance of original model
2023-06-20 02:11:06 | INFO     | Original model latency: 0.004358108043670654 sec/iter
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

2023-06-20 02:11:07 | INFO     | [1/2] Running PyTorch Optimization Pipeline
2023-06-20 02:11:07 | INFO     | Optimizing with PytorchBackendCompiler and q_type: None.
2023-06-20 02:11:17 | INFO     | Optimized model latency: 0.0031163692474365234 sec/iter
2023-06-20 02:11:17 | INFO     | Optimizing with PytorchBackendCompiler and q_type: QuantizationType.HALF.
2023-06-20 02:11:27 | INFO     | Optimized model latency: 0.003615856170654297 sec/iter
2023-06-20 02:11:27 | INFO     | [2/2] Running ONNX Optimization Pipeline

[Speedster results on Tesla V100-PCIE-32GB]
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Metric      ┃ Original Model   ┃ Optimized Model   ┃ Improvement   ┃
┣━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━━━━━╋━━━━━━━━━━━━━━━┫
┃ backend     ┃ PYTORCH          ┃ TorchScript       ┃               ┃
┃ latency     ┃ 0.0044 sec/batch ┃ 0.0031 sec/batch  ┃ 1.40x         ┃
┃ throughput  ┃ 229.46 data/sec  ┃ 320.89 data/sec   ┃ 1.40x         ┃
┃ model size  ┃ 102.56 MB        ┃ 102.63 MB         ┃ 0%            ┃
┃ metric drop ┃                  ┃ 0                 ┃               ┃
┃ techniques  ┃                  ┃ fp32              ┃               ┃
┗━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━━━━━┻━━━━━━━━━━━━━━━┛

Max speed-up with your input parameters is 1.40x. If you want to get a faster optimized model, see the following link for some suggestions: https://docs.nebuly.com/Speedster/advanced_options/#acceleration-suggestions

Thank you for your work on this!

@valeriosofi
Copy link
Collaborator

Hello @trent-s, thank you for this report! I will update the docker image as soon as I can, in the meanwhile you can install speedster using the Quick installation steps that you can find here: installation

@trent-s
Copy link
Author

trent-s commented Jun 21, 2023

Thank you @valeriosofi for your kind and timely response.

I followed the Quick installation steps as you suggested, and still got similar results.
Just FYI, I am posting the latest log here.

root@66f17ec15c70:/# python
Python 3.8.10 (default, Mar 13 2023, 10:26:41)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch

>>> import torchvision.models as models
>>> from speedster import optimize_model
2023-06-21 02:33:42.382076: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-06-21 02:33:42.432123: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-06-21 02:33:44.244064: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
>>>
>>> model = models.resnet50()
>>> input_data = [((torch.randn(1, 3, 256, 256), ), torch.tensor([0])) for _ in range(100)]
>>>
>>> optimized_model = optimize_model(
...     model,
...     input_data=input_data,
...     optimization_time="constrained",
...     metric_drop_ths=0.05
... )
2023-06-21 02:33:53 | INFO     | Running Speedster on GPU:0
2023-06-21 02:33:59 | INFO     | Benchmark performance of original model
2023-06-21 02:33:59 | INFO     | Original model latency: 0.004133939743041992 sec/iter
============= Diagnostic Run torch.onnx.export version 2.0.0+cu118 =============
verbose: False, log level: Level.ERROR
======================= 0 NONE 0 NOTE 0 WARNING 0 ERROR ========================

2023-06-21 02:34:02 | INFO     | [1/2] Running PyTorch Optimization Pipeline
2023-06-21 02:34:02 | INFO     | Optimizing with PytorchBackendCompiler and q_type: None.
2023-06-21 02:34:12 | INFO     | Optimized model latency: 0.003423929214477539 sec/iter
2023-06-21 02:34:12 | INFO     | Optimizing with PytorchBackendCompiler and q_type: QuantizationType.HALF.
2023-06-21 02:34:23 | INFO     | Optimized model latency: 0.004581451416015625 sec/iter
2023-06-21 02:34:23 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: None.
2023-06-21 02:41:47 | INFO     | Optimized model latency: 0.007593631744384766 sec/iter
2023-06-21 02:41:47 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: QuantizationType.HALF.
2023-06-21 02:50:06 | INFO     | Optimized model latency: 0.007908344268798828 sec/iter
2023-06-21 02:50:06 | INFO     | Optimizing with PyTorchApacheTVMCompiler and q_type: QuantizationType.DYNAMIC.
2023-06-21 02:58:06 | WARNING  | The optimized model will be discarded due to poor results obtained with the given metric.
2023-06-21 02:58:06 | INFO     | [2/2] Running ONNX Optimization Pipeline
2023-06-21 02:58:06 | INFO     | Optimizing with ONNXCompiler and q_type: None.
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!

@DavidAdamczyk
Copy link

@valeriosofi i would like to ask you if there is any progress on this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants