Container build fails for 3D-UNet-99 #78

WarrenSchultz · 2024-06-19T17:45:46Z

Running the command for ResNet50 works correctly:
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=resnet50 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no

But 3d-unet-99 fails
cm run script --tags=run-mlperf,inference,_performance-only,_full --division=open --category=edge --device=cuda --model=3d-unet-99 --precision=float32 --implementation=nvidia --backend=tensorrt --scenario=Offline --execution_mode=valid --power=no --adr.python.version_min=3.8 --clean --compliance=no --quiet --time --docker --docker_cache=no

Error log:
`Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine
builder = get_benchmark(job.config)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark
cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls
return getattr(import_module(module_loc.module_path), module_loc.cls_name)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
[2024-06-19 10:30:07,499 generate_engines.py:173 INFO] Building engines for 3d-unet benchmark in Offline scenario...
Loading TensorRT plugin from build/plugins/pixelShuffle3DPlugin/libpixelshuffle3dplugin.so
Loading TensorRT plugin from build/plugins/conv3D1X1X1K4Plugin/libconv3D1X1X1K4Plugin.so
Loading TensorRT plugin from build/plugins/conv3D3X3X3C1K32Plugin/libconv3D3X3X3C1K32Plugin.so
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 176, in handle
total_engine_build_time += self.build_engine(job)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 159, in build_engine
builder = get_benchmark(job.config)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 83, in get_benchmark
cls = get_cls(G_BENCHMARK_CLASS_MAP[benchmark])
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/init.py", line 66, in get_cls
return getattr(import_module(module_loc.module_path), module_loc.cls_name)
File "/usr/lib/python3.8/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1014, in _gcd_import
File "", line 991, in _find_and_load
File "", line 975, in _find_and_load_unlocked
File "", line 671, in _load_unlocked
File "", line 848, in exec_module
File "", line 219, in _call_with_frames_removed
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/3d-unet/tensorrt/3d-unet.py", line 25, in
import onnx
ModuleNotFoundError: No module named 'onnx'
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 231, in
main(main_args, DETECTED_SYSTEM)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 144, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/main.py", line 202, in dispatch_action
handler.run()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/home/cmuser/CM/repos/local/cache/be4b540d34434756/repo/closed/NVIDIA/code/actionhandler/generate_engines.py", line 184, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make: *** [Makefile:37: generate_engines] Error 1

CM error: Portable CM script failed (name = app-mlperf-inference-nvidia, return code = 256)`

However, running 3d-unet-99 within the container built for ResNet50 works correctly.

The text was updated successfully, but these errors were encountered:

arjunsuresh · 2024-06-20T23:21:00Z

Thanks for reporting this. The problem should be fixed now. We typically launch one docker image for nvidia implementation and run all the benchmarks there - so missed this issue for 3d-unet.

WarrenSchultz · 2024-06-25T04:22:42Z

Seems to be working now, thanks!

Code change for rclone and gdown

arjunsuresh added the bug Something isn't working label Jun 20, 2024

arjunsuresh self-assigned this Jun 20, 2024

WarrenSchultz closed this as completed Jun 25, 2024

arjunsuresh added a commit that referenced this issue Jul 1, 2024

Merge pull request #78 from anandhu-eng/fixGithubActionError

098785e

Code change for rclone and gdown

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container build fails for 3D-UNet-99 #78

Container build fails for 3D-UNet-99 #78

WarrenSchultz commented Jun 19, 2024 •

edited

Loading

arjunsuresh commented Jun 20, 2024

WarrenSchultz commented Jun 25, 2024

Container build fails for 3D-UNet-99 #78

Container build fails for 3D-UNet-99 #78

Comments

WarrenSchultz commented Jun 19, 2024 • edited Loading

arjunsuresh commented Jun 20, 2024

WarrenSchultz commented Jun 25, 2024

WarrenSchultz commented Jun 19, 2024 •

edited

Loading