Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] rocm57 flow nightly crashes #2144

Closed
Sing-Li opened this issue Apr 16, 2024 · 3 comments
Closed

[Bug] rocm57 flow nightly crashes #2144

Sing-Li opened this issue Apr 16, 2024 · 3 comments
Labels
bug Confirmed bugs

Comments

@Sing-Li
Copy link
Contributor

Sing-Li commented Apr 16, 2024

🐛 Bug

When using rocm 5.7 nightly to run serve or chat, the jit will crash the first time after downloading the weights and before outputting an md5-named lib.

To Reproduce

Steps to reproduce the behavior:

  1. install latest rocm57 nightly
  2. clear out any cache, then run serve or chat on any model (known supported and working)
  3. the flow will crash
04:10:24] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[2024-04-15 04:10:26] INFO auto_device.py:85: Not found device: cuda:0
[2024-04-15 04:10:28] INFO auto_device.py:76: Found device: rocm:0
[2024-04-15 04:10:28] INFO auto_device.py:76: Found device: rocm:1
[2024-04-15 04:10:28] INFO auto_device.py:76: Found device: rocm:2
[2024-04-15 04:10:28] INFO auto_device.py:76: Found device: rocm:3
[2024-04-15 04:10:29] INFO auto_device.py:85: Not found device: metal:0
[2024-04-15 04:10:30] INFO auto_device.py:85: Not found device: vulkan:0
[2024-04-15 04:10:31] INFO auto_device.py:85: Not found device: opencl:0
[2024-04-15 04:10:31] INFO auto_device.py:33: Using device: rocm:0
[2024-04-15 04:10:31] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/gemma-2b-it-q4f16_1-MLC
[2024-04-15 04:10:31] INFO download.py:131: Weights already downloaded: /root/.cache/mlc_llm/model_weights/mlc-ai/gemma-2b-it-q4f16_1-MLC
[2024-04-15 04:10:31] INFO chat_module.py:781: Model lib not found. Now compiling model lib on device...
[2024-04-15 04:10:32] INFO jit.py:35: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-04-15 04:10:32] INFO jit.py:94: Compiling using commands below:
[2024-04-15 04:10:32] INFO jit.py:95: /usr/bin/python3 -m mlc_llm compile /root/.cache/mlc_llm/model_weights/mlc-ai/gemma-2b-it-q4f16_1-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0;cutlass=1;ipc_allreduce_strategy=NONE' --overrides 'context_window_size=8192;prefill_chunk_size=1024;tensor_parallel_shards=1' --device rocm:0 --output /tmp/tmpzldeenzs/lib.so
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[04:10:32] /workspace/tvm/src/target/parsers/aprofile.cc:97: Warning: Cannot parse target features. LLVM was not compiled with support for Arm(R)-based targets.
[2024-04-15 04:10:33] INFO auto_config.py:69: Found model configuration: /root/.cache/mlc_llm/model_weights/mlc-ai/gemma-2b-it-q4f16_1-MLC/mlc-chat-config.json
[2024-04-15 04:10:33] INFO auto_target.py:84: Detecting target device: rocm:0
[2024-04-15 04:10:33] INFO auto_target.py:86: Found target: {"thread_warp_size": 64, "mtriple": "amdgcn-amd-amdhsa-hcc", "max_threads_per_block": 1024, "max_num_threads": 256, "kind": "rocm", "max_shared_memory_per_block": 65536, "tag": "", "mcpu": "gfx908", "keys": ["rocm", "gpu"]}
[2024-04-15 04:10:33] INFO auto_target.py:103: Found host LLVM triple: x86_64-unknown-linux-gnu
[2024-04-15 04:10:33] INFO auto_target.py:104: Found host LLVM CPU: skylake-avx512
[2024-04-15 04:10:33] INFO auto_config.py:153: Found model type: gemma. Use `--model-type` to override.
Compiling with arguments:
  --config          GemmaConfig(hidden_size=2048, hidden_act='gelu', intermediate_size=16384, attention_bias=False, num_attention_heads=8, num_key_value_heads=1, head_dim=256, num_hidden_layers=18, rms_norm_eps=1e-06, vocab_size=256000, position_embedding_base=10000.0, context_window_size=8192, prefill_chunk_size=1024, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
  --model-type      gemma
  --target          {"thread_warp_size": 64, "host": {"mtriple": "x86_64-unknown-linux-gnu", "tag": "", "kind": "llvm", "mcpu": "skylake-avx512", "keys": ["cpu"]}, "mtriple": "amdgcn-amd-amdhsa-hcc", "max_threads_per_block": 1024, "max_num_threads": 256, "kind": "rocm", "max_shared_memory_per_block": 65536, "tag": "", "mcpu": "gfx908", "keys": ["rocm", "gpu"]}
  --opt             flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
  --system-lib-prefix ""
  --output          /tmp/tmpzldeenzs/lib.so
  --overrides       context_window_size=8192;sliding_window_size=None;prefill_chunk_size=1024;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=1
[2024-04-15 04:10:33] INFO config.py:106: Overriding context_window_size from 8192 to 8192
[2024-04-15 04:10:33] INFO config.py:106: Overriding prefill_chunk_size from 1024 to 1024
[2024-04-15 04:10:33] INFO config.py:106: Overriding tensor_parallel_shards from 1 to 1
[2024-04-15 04:10:33] INFO compile.py:137: Creating model from: GemmaConfig(hidden_size=2048, hidden_act='gelu', intermediate_size=16384, attention_bias=False, num_attention_heads=8, num_key_value_heads=1, head_dim=256, num_hidden_layers=18, rms_norm_eps=1e-06, vocab_size=256000, position_embedding_base=10000.0, context_window_size=8192, prefill_chunk_size=1024, tensor_parallel_shards=1, max_batch_size=80, kwargs={})
[2024-04-15 04:10:33] INFO compile.py:156: Exporting the model to TVM Unity compiler
[2024-04-15 04:10:34] INFO compile.py:162: Running optimizations using TVM Unity
[2024-04-15 04:10:34] INFO compile.py:176: Registering metadata: {'model_type': 'gemma', 'quantization': 'q4f16_1', 'context_window_size': 8192, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 1024, 'tensor_parallel_shards': 1, 'kv_cache_bytes': 0}
[2024-04-15 04:10:35] INFO pipeline.py:50: Running TVM Relax graph-level optimizations
[2024-04-15 04:10:45] INFO pipeline.py:50: Lowering to TVM TIR kernels
[2024-04-15 04:10:46] INFO pipeline.py:50: Running TVM TIR-level optimizations
[2024-04-15 04:10:50] INFO pipeline.py:50: Running TVM Dlight low-level optimizations
[2024-04-15 04:10:52] INFO pipeline.py:50: Lowering to VM bytecode
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `alloc_embedding_tensor`: 4.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_decode`: 10.31 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_prefill`: 132.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `batch_verify`: 132.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `create_tir_paged_kv_cache`: 0.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `decode`: 0.13 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `embed`: 4.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `prefill`: 132.00 MB
[2024-04-15 04:10:53] INFO estimate_memory_usage.py:57: [Memory usage] Function `softmax_with_temperature`: 0.00 MB
[2024-04-15 04:10:54] INFO pipeline.py:50: Compiling external modules
[2024-04-15 04:10:54] INFO pipeline.py:50: Compilation complete! Exporting to disk
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 52, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 25, in main
    cli.main(sys.argv[2:])
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/compile.py", line 128, in main
    compile(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/compile.py", line 234, in compile
    _compile(args, model_config)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/compile.py", line 179, in _compile
    args.build_func(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/support/auto_target.py", line 266, in build
    relax.build(
  File "/usr/local/lib/python3.10/dist-packages/tvm/relax/vm_build.py", line 341, in build
    return _vmlink(
  File "/usr/local/lib/python3.10/dist-packages/tvm/relax/vm_build.py", line 247, in _vmlink
    lib = tvm.build(
  File "/usr/local/lib/python3.10/dist-packages/tvm/driver/build_module.py", line 297, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/usr/local/lib/python3.10/dist-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "tvm/_ffi/_cython/./packed_func.pxi", line 56, in tvm._ffi._cy3.core.tvm_callback
  File "/usr/local/lib/python3.10/dist-packages/tvm/contrib/rocm.py", line 120, in callback_rocm_link
    rocm_link(tmp_obj, tmp_cobj)
  File "/usr/local/lib/python3.10/dist-packages/tvm/contrib/rocm.py", line 85, in rocm_link
    lld if lld is not None else find_lld()[0],
  File "/usr/local/lib/python3.10/dist-packages/tvm/contrib/rocm.py", line 59, in find_lld
    raise RuntimeError("cannot find ld.lld, candidates are: " + str(lld_list))
RuntimeError: cannot find ld.lld, candidates are: ['ld.lld-17.0', 'ld.lld-17', 'ld.lld', '/opt/rocm/llvm/bin']
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 772, in __init__
    self.model_lib_path = _get_lib_module_path(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 591, in _get_lib_module_path
    raise FileNotFoundError(err_msg)
FileNotFoundError: Cannot find the model library that corresponds to `None`.
`None` is either provided in the `chat_config` you passed in, or specified in /root/.cache/mlc_llm/model_weights/mlc-ai/gemma-2b-it-q4f16_1-MLC/mlc-chat-config.json.
We searched over the following possible paths: 
- None-rocm.so
- dist/prebuilt/lib/None-rocm.so
- dist/HF://mlc-ai/gemma-2b-it-q4f16_1-MLC/None-rocm.so
- /root/.cache/mlc_llm/model_weights/mlc-ai/gemma-2b-it-q4f16_1-MLC/None-rocm.so
- /root/.cache/mlc_llm/model_weights/mlc-ai/None-rocm.so
If you would like to directly specify the model library path, you may consider passing in the `ChatModule.model_lib_path` parameter.
Please checkout https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb for an example on how to load a model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mlc_llm", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/__main__.py", line 37, in main
    cli.main(sys.argv[2:])
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/cli/chat.py", line 41, in main
    chat(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/chat.py", line 133, in chat
    cm = ChatModule(model, device, chat_config=config, model_lib_path=model_lib_path)
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/chat_module.py", line 785, in __init__
    jit.jit(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py", line 123, in jit
    _run_jit(
  File "/usr/local/lib/python3.10/dist-packages/mlc_llm/interface/jit.py", line 96, in _run_jit
    subprocess.run(cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'mlc_llm', 'compile', '/root/.cache/mlc_llm/

Expected behavior

Simple invocation of flow should work, as it does with nvidia cuda 12.2 nightly build.

Environment

  • Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): rocm 5.7
  • Operating system (e.g. Ubuntu/Windows/MacOS/...): linux 22.04lts
  • Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) mi-25
  • How you installed MLC-LLM (conda, source): nightly rocm57
  • How you installed TVM-Unity (pip, source): nightly rocm57
  • Python version (e.g. 3.10): 3.10
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
  • Any other relevant information:

Additional context

Likely due to this issue mlc-ai/relax#316

@Sing-Li Sing-Li added the bug Confirmed bugs label Apr 16, 2024
@LeshengJin
Copy link
Contributor

LeshengJin commented Apr 16, 2024

Thanks @Sing-Li for the report! Would you please install lld via conda install conda-forge::lld? Let me know if it works.

@MasterJH5574
Copy link
Member

@Sing-Li could you please try @LeshengJin's suggestion? I don't have the issue when testing on our AMD GPUs.

@tqchen
Copy link
Contributor

tqchen commented Apr 21, 2024

should be fixed by apache/tvm#16907

@tqchen tqchen closed this as completed Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

4 participants