Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTQ-for-Llama broken on AMD #3754

Closed
1 task done
lufixSch opened this issue Aug 30, 2023 · 7 comments
Closed
1 task done

GPTQ-for-Llama broken on AMD #3754

lufixSch opened this issue Aug 30, 2023 · 7 comments
Labels
bug Something isn't working

Comments

@lufixSch
Copy link

Describe the bug

The update of the requirements.txt and import of gptq_for_llama in the GPTQ_loader module seems to break AMD installation.

When running the installation as described in the README.md the GPTQ-for-Llama test fails:

$ CUDA_VISIBLE_DEVICES=0 python test_kernel.py
Traceback (most recent call last):
  File "/media/Linux DATA/AI/LLM/WebUI/repositories/GPTQ-for-LLaMa/test_kernel.py", line 4, in <module>
    import quant_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

The reason seems to be line 53 in the requirements.txt

https://github.com/jllllll/GPTQ-for-LLaMa-CUDA/releases/download/0.1.0/gptq_for_llama-0.1.0+cu117-cp310-cp310-linux_x86_64.whl; platform_system == "Linux" and platform_machine == "x86_64"

When removing this line the GPTQ-for-Llama test works but loading the model fails because of the reworked imports in GPTQ_loader.
When reverting it to the old import it works again.

sys.path.insert(0, str(Path("repositories/GPTQ-for-LLaMa")))

try:
    import llama_inference_offload
except ImportError:
    logger.error("Failed to load GPTQ-for-LLaMa")
    logger.error(
        "See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md"
    )
    sys.exit(-1)

try:
    from modelutils import find_layers
except ImportError:
    from utils import find_layers

try:
    from quant import make_quant

    is_triton = False
except ImportError:
    import quant

    is_triton = True

Is there an existing issue for this?

  • I have searched the existing issues

Reproduction

Install the text-generation-webui on a AMD device as described in the README.md with the ROCm Installation from https://rentry.org/eq3hg

Screenshot

No response

Logs

Traceback (most recent call last):
  File "/media/Linux DATA/AI/LLM/WebUI/repositories/GPTQ-for-LLaMa/test_kernel.py", line 4, in <module>
    import quant_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

System Info

Operating System: Manjaro Linux 
KDE Plasma Version: 5.27.7
KDE Frameworks Version: 5.109.0
Qt Version: 5.15.10
Kernel Version: 6.1.49-1-MANJARO (64-bit)
Graphics Platform: X11
Processors: 20 × 13th Gen Intel® Core™ i5-13500
Memory: 31.1 GiB of RAM
Graphics Processor: AMD Radeon RX 6750 XT
Manufacturer: Micro-Star International Co., Ltd.
Product Name: MS-7D98
System Version: 1.0
@lufixSch lufixSch added the bug Something isn't working label Aug 30, 2023
@lufixSch
Copy link
Author

Just noticed that the output of the model is now total gibberish. I am not sure if this is related or not.

Screenshot_20230830_140323

@oobabooga
Copy link
Owner

The rentry instructions are severely outdated and a GPTQ-for-LLaMa wheel is currently only included for compatibility with older NVIDIA GPUs. If AutoGPTQ works for AMD, it should be preferred.

I don't know much about AMD, but I have created and pinned an issue where hopefully people can share setup information: #3759

@lufixSch
Copy link
Author

Thanks for the feedback. I tried AutoGPTQ and it seems to work.

However if I install the wheel from https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.2/auto_gptq-0.4.2+rocm5.4.2-cp310-cp310-linux_x86_64.whl it is much slower than GPTQ-for-LLaMa (There is a Warning that ExLLaMa is missing)

When I build it from source it is as fast as expected but the output is gibberish again.

I never had this issue. Could this still be a problem with my GPTQ setup or could this be an unrelated problem?

Thanks for creating the thread I think this will be very helpful.

@oobabooga
Copy link
Owner

Gibberish output is usually a sign of using a model with desc_act=True (also called "act order") and groupsize > 0 while not checking the triton option. Last time I cheched, act order + groupsize requires triton.

@lufixSch
Copy link
Author

I don't think that causes the problem. I used the main version of https://huggingface.co/TheBloke/Llama-2-13B-chat-GPTQ and in the documentation it says desc_act=False.

As Triton is not currently supported on AMD (as far as I know), I am not able to test it with the triton option checked

@lufixSch
Copy link
Author

lufixSch commented Sep 2, 2023

It is getting worse xD
After I reinstalled ROCm, deleted my venv and installed all python dependencies again I am now unable to get results with any model.
As before I am able to load Models (GPTQ or Transformer) but as soon as I want to generate some text the whole program crashes with a segmentation fault.

[1]    58417 segmentation fault (core dumped)  python server.py

@lufixSch
Copy link
Author

lufixSch commented Sep 9, 2023

Was able to solve the problem by reinstalling everything (including a complete reinstall of ROCm).
I have no Idea what caused the problem but I will close this anyway. Thanks for the Help!

@lufixSch lufixSch closed this as completed Sep 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants