Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker build failing for amd - LLAMA 2 70B #401

Closed
anandhu-eng opened this issue Oct 21, 2024 · 7 comments
Closed

Docker build failing for amd - LLAMA 2 70B #401

anandhu-eng opened this issue Oct 21, 2024 · 7 comments
Assignees
Labels
bug Something isn't working mlperf-inference

Comments

@anandhu-eng
Copy link
Contributor

INFO:root:* cm run script "run docker container"
Traceback (most recent call last):
  File "/home/anandhu/.local/bin/cm", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/cli.py", line 37, in run
    r = cm.access(argv, out='con')
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-mlperf-inference-app/customize.py", line 243, in preprocess
    r = cm.access(ii)
        ^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 758, in access
    return cm.access(i)
           ^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 4093, in docker
    return utils.call_internal_module(self, __file__, 'module_misc', 'docker', i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/utils.py", line 1631, in call_internal_module
    return getattr(tmp_module, module_func)(i)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module_misc.py", line 2095, in docker
    r = self_module.cmind.access(cm_docker_input)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/.local/lib/python3.12/site-packages/cmind/core.py", line 602, in access
    r = action_addr(i)
        ^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 212, in run
    r = self._run(i)
        ^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/automation/script/module.py", line 1474, in _run
    r = customize_code.preprocess(ii)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/run-docker-container/customize.py", line 43, in preprocess
    DOCKER_CONTAINER = docker_image_repo + "/" + docker_image_name + ":" + docker_image_tag
                       ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~
TypeError: can only concatenate str (not "NoneType") to str
@arjunsuresh
Copy link
Contributor

@anandhu-eng this is fixed now right?

@anandhu-eng
Copy link
Contributor Author

Hi @arjunsuresh , i tried to run it now but i got the following error:

cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1    --model=llama2-70b-99    --implementation=amd    --framework=pytorch    --category=datacenter    --scenario=Offline    --execution_mode=test    --device=cuda     --docker --quiet    --test_query_count=50 --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST='yes'
INFO:root:* cm run script "run-mlperf inference _find-performance _full _r4.1"
INFO:root:  * cm run script "get mlcommons inference src"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:  * cm run script "install pip-package for-cmind-python _package.tabulate"
INFO:root:       ! load /home/anandhu/CM/repos/local/cache/a2fa268cf0da4e5f/cm-cached-state.json
INFO:root:  * cm run script "get mlperf inference utils"
INFO:root:    * cm run script "get mlperf inference src"
INFO:root:         ! load /home/anandhu/CM/repos/local/cache/9efb0b6eb31d4e7e/cm-cached-state.json
INFO:root:         ! call "postprocess" from /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/get-mlperf-inference-utils/customize.py
Using MLCommons Inference source from /home/anandhu/CM/repos/local/cache/c451e090cdb24951/inference

Running loadgen scenario: Offline and mode: performance
INFO:root:* cm run script "build dockerfile"
cm pull repo && cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True --quiet
Dockerfile written at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile

Dockerfile generated at /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile
INFO:root:* cm run script "get docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/c28f6fb1b7884706/cm-cached-state.json
INFO:root:* cm run script "get mlperf inference submission dir local _version.r4_1"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/09af3aede75c4983/cm-cached-state.json
INFO:root:* cm run script "get nvidia-docker"
INFO:root:     ! load /home/anandhu/CM/repos/local/cache/3ba5606de4784db5/cm-cached-state.json

CM command line regenerated to be used inside Docker:

cm run script --tags=app,mlperf,inference,generic,_amd,_llama2-70b-99,_pytorch,_cuda,_test,_r4.1_default,_offline --quiet=true --env.CM_MLPERF_MODEL_LLAMA2_70B_DOWNLOAD_TO_HOST=yes --env.CM_QUIET=yes --env.CM_MLPERF_IMPLEMENTATION=amd --env.CM_MLPERF_MODEL=llama2-70b-99 --env.CM_MLPERF_RUN_STYLE=test --env.CM_MLPERF_SKIP_SUBMISSION_GENERATION=False --env.CM_MLPERF_SUBMISSION_SYSTEM_TYPE=datacenter --env.CM_MLPERF_DEVICE=cuda --env.CM_MLPERF_USE_DOCKER=True --env.CM_MLPERF_BACKEND=pytorch --env.CM_MLPERF_LOADGEN_SCENARIO=Offline --env.CM_TEST_QUERY_COUNT=50 --env.CM_MLPERF_FIND_PERFORMANCE_MODE=yes --env.CM_MLPERF_LOADGEN_ALL_MODES=no --env.CM_MLPERF_LOADGEN_MODE=performance --env.CM_MLPERF_RESULT_PUSH_TO_GITHUB=False --env.CM_MLPERF_SUBMISSION_GENERATION_STYLE=full --env.CM_MLPERF_INFERENCE_VERSION=4.1 --env.CM_RUN_MLPERF_INFERENCE_APP_DEFAULTS=r4.1_default --env.CM_MLPERF_LAST_RELEASE=v4.1 --env.CM_TMP_CURRENT_PATH=/home/anandhu/CM/repos --env.CM_TMP_PIP_VERSION_STRING= --env.CM_MODEL=llama2-70b-99 --env.CM_MLPERF_LOADGEN_COMPLIANCE=no --env.CM_MLPERF_LOADGEN_EXTRA_OPTIONS= --env.CM_MLPERF_LOADGEN_SCENARIOS,=Offline --env.CM_MLPERF_LOADGEN_MODES,=performance --env.OUTPUT_BASE_DIR=/home/anandhu/CM/repos --env.CM_OUTPUT_FOLDER_NAME=test_results --add_deps_recursive.coco2014-original.tags=_full --add_deps_recursive.coco2014-preprocessed.tags=_full --add_deps_recursive.imagenet-original.tags=_full --add_deps_recursive.imagenet-preprocessed.tags=_full --add_deps_recursive.openimages-original.tags=_full --add_deps_recursive.openimages-preprocessed.tags=_full --add_deps_recursive.openorca-original.tags=_full --add_deps_recursive.openorca-preprocessed.tags=_full --add_deps_recursive.get-mlperf-inference-results-dir.tags=_version.r4_1 --add_deps_recursive.get-mlperf-inference-submission-dir.tags=_version.r4_1 --add_deps_recursive.mlperf-inference-nvidia-scratch-space.tags=_version.r4_1 --v=False --print_env=False --print_deps=False --dump_version_info=True  --env.OUTPUT_BASE_DIR=/cm-mount/home/anandhu/CM/repos  --env.CM_MLPERF_INFERENCE_SUBMISSION_DIR=/home/cmuser/CM/repos/local/cache/09af3aede75c4983/mlperf-inference-submission  --docker_run_deps 


INFO:root:* cm run script "run docker container"

Checking existing Docker container:

  docker ps --filter "ancestor=local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest"  2> /dev/null


Checking Docker images:

  docker images -q local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest 2> /dev/null

INFO:root:  * cm run script "build docker image"
/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/customize.py:57: SyntaxWarning: invalid escape sequence '\$'
  dockerfile_path = "\${CM_DOCKERFILE_WITH_PATH}"
================================================
CM generated the following Docker build command:

docker build  --build-arg GID=\" $(id -g $USER) \" --build-arg UID=\" $(id -u $USER) \" -f "/home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles/pytorch:rocm6.1.2_ubuntu20.04_py3.9_pytorch_staging.Dockerfile" -t "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest" .

INFO:root:         ! cd /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/app-mlperf-inference/dockerfiles
INFO:root:         ! call /home/anandhu/CM/repos/anandhu-eng@cm4mlops/script/build-docker-image/run.sh from tmp-run.sh
[+] Building 0.0s (0/0)                                          docker:default
ERROR: invalid tag "local/cm-script-app-mlperf-inference-generic--amd--llama2-70b-99--pytorch--cuda--test--r4.1-default--offline:rocm/pytorch-rocm6.1.2ubuntu20.04py3.9pytorchstaging-latest": invalid reference format

CM error: Portable CM script failed (name = build-docker-image, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

@arjunsuresh
Copy link
Contributor

@anandhu-eng Is it happening now?

@anandhu-eng
Copy link
Contributor Author

Hi @arjunsuresh , no its not happening now. But found out that we have not handled the case where llama2 70b is downloaded to host.

@anandhu-eng
Copy link
Contributor Author

Hi @arjunsuresh , i have tried to implement download to host for llama2 70 billion in this PR. But it fails at quantisation state:

cd /home/anandhu/CM/repos/local/cache/72d6fbd1154e466e/repo/closed/AMD/code/llama2-70b-99.9/tools/quark-0.1.0+a9827f5-mlperf/examples/torch/language_modeling/
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/bin/python3 quantize_quark.py --model_dir /home/anandhu/CM/repos/local/cache/4116af2beb99410c/repo     --output_dir /home/anandhu/CM/repos/local/cache/6425984c51d84248     --quant_scheme w_fp8_a_fp8_o_fp8     --dataset /home/anandhu/CM/repos/local/cache/9a2be95304424ce0/open_orca/open_orca_gpt4_tokenized_llama.calibration_1000.pkl.gz     --num_calib_data 1000     --model_export vllm_adopted_safetensors     --no_weight_matrix_merge 

[QUARK-INFO]: C++ kernel compilation check start.

[QUARK-INFO]: C++ kernel build directory /home/anandhu/.cache/torch_extensions/py310_cu121/kernel_ext
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].
  warnings.warn(

[QUARK-INFO]: C++ kernel compilation is already complete. Ending the C++ kernel compilation check. Total time: 0.0688 seconds
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/kernel/hw_emulation/hw_emulation_interface.py:75: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quant_scope::quant_dequant_fp8_e4m3_with_scale")
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/kernel/hw_emulation/hw_emulation_interface.py:105: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quant_scope::fake_quantize_per_tensor_affine")
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/kernel/hw_emulation/hw_emulation_interface.py:133: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @impl_abstract("quant_scope::fake_quantize_per_channel_affine")

Loading model ...
/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/compressed_tensors/quantization/quant_args.py:224: UserWarning: No observer is used for dynamic quantization, setting to None
  warnings.warn(
Initializing tokenizer from /home/anandhu/CM/repos/local/cache/4116af2beb99410c/repo

Loading dataset ...
Loaded 1000 samples from /home/anandhu/CM/repos/local/cache/9a2be95304424ce0/open_orca/open_orca_gpt4_tokenized_llama.calibration_1000.pkl.gz

[QUARK-INFO]: Configuration checking start.

[QUARK-INFO]: Configuration checking end. The configuration is effective. This is static quantization.

[QUARK-INFO]: In-place OPs replacement start.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 343/343 [00:01<00:00, 173.44it/s]

[QUARK-INFO]: In-place OPs replacement end.

[QUARK-INFO]: Calibration start.
  0%|                                                                                                                                        | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/anandhu/CM/repos/local/cache/72d6fbd1154e466e/repo/closed/AMD/code/llama2-70b-99.9/tools/quark-0.1.0+a9827f5-mlperf/examples/torch/language_modeling/quantize_quark.py", line 339, in <module>
    main(args)
  File "/home/anandhu/CM/repos/local/cache/72d6fbd1154e466e/repo/closed/AMD/code/llama2-70b-99.9/tools/quark-0.1.0+a9827f5-mlperf/examples/torch/language_modeling/quantize_quark.py", line 218, in main
    model = quantizer.quantize_model(model, calib_dataloader)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/quantization/api.py", line 158, in quantize_model
    self.model(**data)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward
    outputs = self.model(
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1000, in forward
    layer_outputs = decoder_layer(
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 729, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 612, in forward
    query_states = self.q_proj(hidden_states)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/quantization/nn/modules/quantize_linear.py", line 31, in forward
    quant_weight = self._weight_quantizer(self.weight) if self._weight_quantizer else self.weight
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/quantization/tensor_quantize.py", line 220, in forward
    self.observer(X.detach())
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/anandhu/CM/repos/local/cache/c5f60ba1e1b144af/bertthreading/lib/python3.10/site-packages/quark/torch/quantization/observer/observer.py", line 157, in forward
    min_val_cur, max_val_cur = torch.aminmax(x)
RuntimeError: "aminmax_all_cuda" not implemented for 'Float8_e4m3fn'

CM error: Portable CM script failed (name = get-ml-model-llama2, return code = 256)


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Note that it is often a portability issue of a third-party tool or a native script
wrapped and unified by this CM script (automation recipe). Please re-run
this script with --repro flag and report this issue with the original
command line, cm-repro directory and full log here:

https://github.com/mlcommons/cm4mlops/issues

The CM concept is to collaboratively fix such issues inside portable CM scripts
to make existing tools and native scripts more portable, interoperable
and deterministic. Thank you!

maybe this is due to absence of amd gpu?

@arjunsuresh
Copy link
Contributor

@anandhu-eng which torch version is this? We don't need AMD gpu for quantization if we use torch_cuda. I have fixed this dependency and added a github action for this now.

@arjunsuresh
Copy link
Contributor

Closing this now as the quantization part is now working fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mlperf-inference
Projects
No open projects
Development

No branches or pull requests

2 participants