Add QuIP# support #4803

oobabooga · 2023-12-04T05:01:51Z

QuIP# is a novel quantization method. Its 2-bit performance is better than anything previously available.

Repository: https://github.com/Cornell-RelaxML/quip-sharp

Blog post: https://cornell-relaxml.github.io/quip-sharp/

Installation

The installation is currently manual, but later I will add it to the one-click installer.

Clone quip-sharp into your repositories folder and install it:

git clone 'https://github.com/Cornell-RelaxML/quip-sharp' repositories/quip-sharp
cd repositories/quip-sharp/quiptools
python setup.py install
cd ../../..

You need to have a C++ compiler (like g++) and nvcc available in your environment for the command above.

Install the following additional requirements:

pip install fast-hadamard-transform glog==0.3.1 primefac==2.0.12

Download a model. Example:

python download-model.py relaxml/Llama-2-70b-E8P-2Bit

~~4) Download my tokenizer (I'm using it as a placeholder for now, as the model above doesn't include a tokenizer):~~

python download-model.py oobabooga/llama-tokenizer

Start the web UI:

python server.py --model relaxml_Llama-2-70b-E8P-2Bit --loader 'QuIP#'

Perplexity

On a small test that I have been running since the beginning of this year to compare different quantizations:

Model	Perplexity
llama-2-70b.ggmlv3.q4_K_M.bin	4.552218437194824
llama-65b.ggmlv3.q4_K_M.bin	4.906391620635986
relaxml/Llama-2-70b-E8P-2Bit	5.173901081085205
llama-30b.ggmlv3.q4_K_M.bin	5.215567588806152
turboderp/LLama2-70B-2.5bpw-h6-exl2	5.4921875

It's the same test as in the first table in this blog post, so the numbers are directly comparable.

This is the first time I see a quantized 70b model that fits in a RTX 3090 perform better than a q4_K_M 30b model. Which is especially important nowadays since Meta never released a Llama-2 30b base model.

Performance

I can get to 3042 context with 24GB VRAM. It generates at around 8 tokens/second when the context is small and 6 tokens/second when it is large.

Output generated in 33.51 seconds (5.94 tokens/s, 199 tokens, context 2842, seed 977283488)

oobabooga · 2023-12-04T14:02:26Z

The installation procedure is almost identical to the one GPTQ-for-LLaMa; maybe @jllllll can come to the rescue and create wheels for this one as well.

LoopControl · 2023-12-04T14:37:21Z

Is there any minimum architecture support required for this (for example, AWQ quant requires Ampere or better on Nvidia cards)?

(Trying to figure out if it will work on cards like the P40 which uses Compute version 6.1 architecture - -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_61,code=sm_61).

oobabooga · 2023-12-04T14:52:04Z

I don't know. Most of these custom CUDA kernels require Ampere cards, but my old fork of GPTQ-for-LLaMa has a custom kernel and works on Pascal cards. I guess it depends on the operations performed.

Maybe @tsengalb99 can tell us what the requirements are.

tsengalb99 · 2023-12-04T15:57:43Z

Hi - a few things:

Our codebase is constantly being updated, so you will want to make sure you have the latest version. It looks like you have a pretty recent version since we only transitioned to the fast_hadamard_kernel vs a local copy a few commits ago.
I'm aware of a "higher than expected" memory consumption issue that I'm looking into right now. If this is something easily fixable (ie not an artifact of HF) you should be able to fit a longer context length into a 24G GPU after I fix it.
HF generate is incompatible with torch's wrapper around CUDA graphs. I have not looked into how to get CUDA graphs working with HF generate, but if you plan on deploying QuIP# models, you should probably do this. We have a lot of kernel launches in our quantized linear implementation and in non HF generate settings, CUDA graphs gives about a 2x speedup.
@chaosagent has been working on the CUDA kernels and can comment more on if they need Ampere or newer. FWIW I was not able to compile on a 1080ti just now but could on a 2080ti, so it looks like Volta or newer may be required.

tsengalb99 · 2023-12-04T16:10:28Z

Download my tokenizer (I'm using it as a placeholder for now, as the model above doesn't include a tokenizer):

BTW our models should work with HF's AutoTokenizer. We have multiple places in our code where we just call AutoTokenizer and everything works fine.

oobabooga · 2023-12-04T16:13:47Z

Thanks for the reply @tsengalb99. Updates and eventual breaking changes are expected, and I'll make sure to update the code in this PR accordingly over time.

About CUDA graphs and the HF .generate(): I am not familiar enough with the HF pipelines to get this integration working myself. The best I can do for now is ping @younesbelkada, as maybe this would be easy for him.

oobabooga · 2023-12-04T16:18:39Z

Download my tokenizer (I'm using it as a placeholder for now, as the model above doesn't include a tokenizer):

BTW our models should work with HF's AutoTokenizer. We have multiple places in our code where we just call AutoTokenizer and everything works fine.

That doesn't work with a local copy of relaxml/Llama-2-70b-E8P-2Bit:

>>> from transformers import AutoTokenizer
>>> tokenizer = AutoTokenizer.from_pretrained('relaxml_Llama-2-70b-E8P-2Bit')

OSError: Can't load tokenizer for 'relaxml_Llama-2-70b-E8P-2Bit'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'relaxml_Llama-2-70b-E8P-2Bit' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

The problem is that the tokenizer files are not present in the repository. This can be easily fixed by uploading the tokenizer files here (or any other copy of the default Llama tokenizer) to that repository.

tsengalb99 · 2023-12-04T16:23:26Z

You need to extract the base model string (eg meta-llama/Llama-2-7b-hf) which model_from_hf_path does. See https://github.com/Cornell-RelaxML/quip-sharp/blob/90fd0d473e255f282e1631d2d2e796593c187239/eval_zeroshot.py#L26 for an example.

oobabooga · 2023-12-04T16:27:35Z

I had seen this, but this repository is based on loading from local copies of HF repositories (stored under text-generation-webui/models) rather than using the native HF download tools that fetch models/tokenizers to a cache folder from their names.

This is very secondary and I wouldn't worry about it.

jerry-chee · 2023-12-04T17:28:09Z

Hi, another QuIP# author here. Depending on what your interested in, we also have quantized 4 bit models in our huggingface repo (ex: relaxml/Llama-2-70b-chat-HI-4Bit-Packed) that have much smaller degradation from the fp16 model. We expect to have fast inference with these 4 bit models approximately by the end of the week; our current forward pass code slow is a slower naive implementation of the codebook for this specific 4 bit quantization.

oobabooga · 2023-12-04T18:17:37Z

That's great to hear @jerry-chee, thanks for the information.

oobabooga · 2023-12-06T03:00:16Z

I spent a while trying to create GitHub Actions wheels for quip-sharp here and failed, so I gave up and instead just added an error message instructing the user to install manually.

I also removed the usage of a default Llama tokenizer as this causes issues such as Cornell-RelaxML/quip-sharp#6. It would be good if the repositories were updated to include the corresponding tokenizer files -- every GPTQ, AWQ, and EXL2 repository on HF contains these.

Hopefully the interest in quip-sharp will increase and someone will soon be able to find a solution to the CUDA graphs issue for better performance. I am personally already happy with the 8 tokens/second I am getting for 70b models.

tsengalb99 · 2023-12-06T03:04:44Z

I spent a while trying to create GitHub Actions wheels for quip-sharp here and failed, so I gave up and instead just added an error message instructing the user to install manually.

Interesting, we can take a look at that later as a very low priority thing

I also removed the usage of a default Llama tokenizer as this causes issues such as Cornell-RelaxML/quip-sharp#6. It would be good if the repositories were updated to include the corresponding tokenizer files -- every GPTQ, AWQ, and EXL2 repository on HF contains these.

We will try to do that some time in the next few weeks.

Hopefully the interest in quip-sharp will increase and someone will soon be able to find a solution to the CUDA graphs issue for better performance. I am personally already happy with the 8 tokens/second I am getting for 70b models.

I filed a ticket with huggingface huggingface/transformers#27837 and it's on their todo list. We have faster kernels in the pipeline so the speed will increase from those alone.

Ph0rk0z · 2023-12-06T12:01:05Z

@tsengalb99 To make pascal work fast.. like your 1060 it requires the use of up-casting to FP32 math. Pascal also has no tensor cores and atomicadd but there are functions for the latter that can be used in it's place and they are reasonable. Compute 6.1 also has dp4a instructions that can be used to speed things up.

Why would anyone bother? The P40 is prolific and is the only other 24gb card besides the 3090 with that much ram. On top of that it's $200. Otherwise people are stuck with janky 7b and 13b models which are useful as simple tools and that's about it.

If the goal is to run larger models, I think pascal support is a good thing to have.

iChristGit · 2023-12-08T17:22:29Z

running bdist_egg
running egg_info
writing quiptools_cuda.egg-info\PKG-INFO
writing dependency_links to quiptools_cuda.egg-info\dependency_links.txt
writing top-level names to quiptools_cuda.egg-info\top_level.txt
C:\Python\lib\site-packages\torch\utils\cpp_extension.py:502: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
  warnings.warn(msg.format('we could not find ninja.'))
reading manifest file 'quiptools_cuda.egg-info\SOURCES.txt'
writing manifest file 'quiptools_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
Traceback (most recent call last):
  File "D:\TextGen\repositories\quip-sharp\quiptools\setup.py", line 4, in <module>
    setup(name='quiptools_cuda',
  File "C:\Python\lib\site-packages\setuptools\__init__.py", line 87, in setup
    return distutils.core.setup(**attrs)
  File "C:\Python\lib\site-packages\setuptools\_distutils\core.py", line 177, in setup
    return run_commands(dist)
  File "C:\Python\lib\site-packages\setuptools\_distutils\core.py", line 193, in run_commands
    dist.run_commands()
  File "C:\Python\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands
    self.run_command(cmd)
  File "C:\Python\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "C:\Python\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "C:\Python\lib\site-packages\setuptools\command\install.py", line 74, in run
    self.do_egg_install()
  File "C:\Python\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install
    self.run_command('bdist_egg')
  File "C:\Python\lib\site-packages\setuptools\_distutils\cmd.py", line 317, in run_command
    self.distribution.run_command(command)
  File "C:\Python\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "C:\Python\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "C:\Python\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "C:\Python\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command
    self.run_command(cmdname)
  File "C:\Python\lib\site-packages\setuptools\_distutils\cmd.py", line 317, in run_command
    self.distribution.run_command(command)
  File "C:\Python\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "C:\Python\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "C:\Python\lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "C:\Python\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build
    self.run_command('build_ext')
  File "C:\Python\lib\site-packages\setuptools\_distutils\cmd.py", line 317, in run_command
    self.distribution.run_command(command)
  File "C:\Python\lib\site-packages\setuptools\dist.py", line 1217, in run_command
    super().run_command(command)
  File "C:\Python\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command
    cmd_obj.run()
  File "C:\Python\lib\site-packages\setuptools\command\build_ext.py", line 79, in run
    _build_ext.run(self)
  File "C:\Python\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 339, in run
    self.build_extensions()
  File "C:\Python\lib\site-packages\torch\utils\cpp_extension.py", line 525, in build_extensions
    _check_cuda_version(compiler_name, compiler_version)
  File "C:\Python\lib\site-packages\torch\utils\cpp_extension.py", line 407, in _check_cuda_version
    torch_cuda_version = packaging.version.parse(torch.version.cuda)
  File "C:\Python\lib\site-packages\pkg_resources\_vendor\packaging\version.py", line 49, in parse
    return Version(version)
  File "C:\Python\lib\site-packages\pkg_resources\_vendor\packaging\version.py", line 264, in __init__
    match = self._regex.search(version)
TypeError: expected string or bytes-like object

I'm getting this error after doing python setup.py install, I use Win11 with a 3090Ti , I have NVCC and Visual Studio 2022. @oobabooga any idea?

NoMansPC · 2023-12-09T12:36:21Z

So I ran cmd_windows, copied and pasted the first command to install Quip manually, and it gave me an error.

iChristGit · 2023-12-09T12:44:39Z

So I ran cmd_windows, copied and pasted the first command to install Quip manually, and it gave me an error.

Can you paste the error?

cmhamiche · 2023-12-09T13:20:27Z

On WSL with Ubuntu LTS , quiptools-cuda compiled with cuda 11.8 not 12.1.
conda install -c "nvidia/label/cuda-11.8.0" cuda

edit: 9.2 gb of vram used for Llama-1-30b-E8P-2Bit at ~4.70 tokens/s on a 3060 12gb it's bonkers.

iChristGit · 2023-12-09T13:38:12Z

On WSL with Ubuntu LTS , quiptools-cuda compiled with cuda 11.8 not 12.1. conda install -c "nvidia/label/cuda-11.8.0" cuda

edit: 9.2 gb of vram used for Llama-1-30b-E8P-2Bit at ~4.70 tokens/s on a 3060 12gb it's bonkers.

I tried it but still get this error:

rWarning: There are no /home/osher/text-generation-webui/installer_files/env/bin/x86_64-conda-linux-gnu-c++ version bounds defined for CUDA version 12.1
  warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'quiptools_cuda' extension
Emitting ninja build file /home/osher/text-generation-webui/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-311/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
/home/osher/text-generation-webui/installer_files/env/bin/x86_64-conda-linux-gnu-c++ -shared -Wl,-rpath,/home/osher/text-generation-webui/installer_files/env/lib -Wl,-rpath-link,/home/osher/text-generation-webui/installer_files/env/lib -L/home/osher/text-generation-webui/installer_files/env/lib -Wl,-rpath,/home/osher/text-generation-webui/installer_files/env/lib -Wl,-rpath-link,/home/osher/text-generation-webui/installer_files/env/lib -L/home/osher/text-generation-webui/installer_files/env/lib -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--disable-new-dtags -Wl,--gc-sections -Wl,--allow-shlib-undefined -Wl,-rpath,/home/osher/text-generation-webui/installer_files/env/lib -Wl,-rpath-link,/home/osher/text-generation-webui/installer_files/env/lib -L/home/osher/text-generation-webui/installer_files/env/lib -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/osher/text-generation-webui/installer_files/env/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/osher/text-generation-webui/installer_files/env/include /home/osher/text-generation-webui/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-311/quiptools.o /home/osher/text-generation-webui/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-311/quiptools_e8p_gemv.o /home/osher/text-generation-webui/repositories/quip-sharp/quiptools/build/temp.linux-x86_64-cpython-311/quiptools_wrapper.o -L/home/osher/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/lib -L/home/osher/text-generation-webui/installer_files/env/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-311/quiptools_cuda.cpython-311-x86_64-linux-gnu.so
/home/osher/text-generation-webui/installer_files/env/bin/../lib/gcc/x86_64-conda-linux-gnu/11.2.0/../../../../x86_64-conda-linux-gnu/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status

cmhamiche · 2023-12-09T14:09:04Z

You still have cuda 12.1 installed.
start cmd_wsl.bat
sudo apt remove cuda-12-1
conda install -c "nvidia/label/cuda-11.8.0" cuda

At compilation, you might see this warning instead:

/home/linux/text-gen-install/text-generation-webui/installer_files/env/lib/python3.11/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 11.8

If nothing works, search for text-gen-install in your WSL home folder, back-up your files, delete text-gen-install folder and start fresh with cuda 11.8 at install.

BadisG · 2023-12-09T16:31:31Z

Doesn't work on windows 10 for me, here's my specs:

PyTorch : 2.1.1+cu118
CUDA : 11.8
C++ compiler: Visual Studio Entreprise 2022 (MSVC 14.3x)

Here's my error:

(textgen) D:\text-generation-webui\repositories\quip-sharp\quiptools>python setup.py install
running install
D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()
D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()
running bdist_egg
running egg_info
writing quiptools_cuda.egg-info\PKG-INFO
writing dependency_links to quiptools_cuda.egg-info\dependency_links.txt
writing top-level names to quiptools_cuda.egg-info\top_level.txt
reading manifest file 'quiptools_cuda.egg-info\SOURCES.txt'
writing manifest file 'quiptools_cuda.egg-info\SOURCES.txt'
installing library code to build\bdist.win-amd64\egg
running install_lib
running build_ext
D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py:383: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'quiptools_cuda' extension
Emitting ninja build file D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\TH -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -ID:\anaconda3\envs\textgen\include -ID:\anaconda3\envs\textgen\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_wrapper.cpp /FoD:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools_wrapper.obj -g -lineinfo -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quiptools_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
cl : Command line warning D9002 : ignoring unknown option '-g'
cl : Command line warning D9002 : ignoring unknown option '-lineinfo'
[2/3] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools_e8p_gemv.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\TH -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -ID:\anaconda3\envs\textgen\include -ID:\anaconda3\envs\textgen\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu -o D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools_e8p_gemv.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O2 -g -Xcompiler -rdynamic -lineinfo -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quiptools_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
FAILED: D:/text-generation-webui/repositories/quip-sharp/quiptools/build/temp.win-amd64-cpython-311/Release/quiptools_e8p_gemv.obj
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools_e8p_gemv.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\TH -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -ID:\anaconda3\envs\textgen\include -ID:\anaconda3\envs\textgen\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu -o D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools_e8p_gemv.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O2 -g -Xcompiler -rdynamic -lineinfo -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quiptools_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
cl : Command line warning D9002 : ignoring unknown option '-rdynamic'
quiptools_e8p_gemv.cu
cl : Command line warning D9002 : ignoring unknown option '-rdynamic'
quiptools_e8p_gemv.cu
D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(131): warning #177-D: variable "local_n_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=double]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(132): warning #177-D: variable "local_k_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=double]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(131): warning #177-D: variable "local_n_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=float]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(132): warning #177-D: variable "local_k_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=float]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(218): error: more than one operator "*" matches these operands:
            built-in operator "arithmetic * arithmetic"
            function "c10::operator*(const c10::Half &, const c10::Half &)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/Half-inl.h(93): here
            function "c10::operator*(c10::Half, float)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/Half-inl.h(141): here
            function "c10::operator*(c10::Half, double)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/Half-inl.h(184): here
            function "c10::operator*(c10::Half, int)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/Half-inl.h(214): here
            function "c10::operator*(c10::Half, int64_t)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/Half-inl.h(242): here
            operand types are: c10::Half * int8_t
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::Half]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(131): warning #177-D: variable "local_n_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::Half]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(132): warning #177-D: variable "local_k_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::Half]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(218): error: more than one operator "*" matches these operands:
            built-in operator "arithmetic * arithmetic"
            function "c10::operator*(const c10::BFloat16 &, const c10::BFloat16 &)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/BFloat16-inl.h(96): here
            function "c10::operator*(c10::BFloat16, float)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/BFloat16-inl.h(152): here
            function "c10::operator*(c10::BFloat16, double)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/BFloat16-inl.h(193): here
            function "c10::operator*(c10::BFloat16, int)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/BFloat16-inl.h(221): here
            function "c10::operator*(c10::BFloat16, int64_t)"
D:\anaconda3\envs\textgen\Lib\site-packages\torch\include\c10/util/BFloat16-inl.h(249): here
            operand types are: c10::BFloat16 * int8_t
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::BFloat16]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(131): warning #177-D: variable "local_n_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::BFloat16]"
(293): here

D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools_e8p_gemv.cu(132): warning #177-D: variable "local_k_i" was declared but never referenced
          detected during instantiation of "void decode_matmul_e8p_kernel(scalar_t *, const scalar_t *, const int16_t *, const int64_t *, int64_t, int64_t, int64_t) [with scalar_t=c10::BFloat16]"
(293): here

2 errors detected in the compilation of "D:/text-generation-webui/repositories/quip-sharp/quiptools/quiptools_e8p_gemv.cu".
quiptools_e8p_gemv.cu
[3/3] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\bin\nvcc --generate-dependencies-with-compile --dependency-output D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\torch\csrc\api\include -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\TH -ID:\anaconda3\envs\textgen\Lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include" -ID:\anaconda3\envs\textgen\include -ID:\anaconda3\envs\textgen\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Tools\MSVC\14.38.33130\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Professional\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22621.0\\cppwinrt" -c D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools.cu -o D:\text-generation-webui\repositories\quip-sharp\quiptools\build\temp.win-amd64-cpython-311\Release\quiptools.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O2 -g -Xcompiler -rdynamic -lineinfo -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=quiptools_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=compute_86 -gencode=arch=compute_86,code=sm_86
cl : Command line warning D9002 : ignoring unknown option '-rdynamic'
quiptools.cu
cl : Command line warning D9002 : ignoring unknown option '-rdynamic'
quiptools.cu
D:\text-generation-webui\repositories\quip-sharp\quiptools\quiptools.cu(34): warning #177-D: function "gpuAssert" was declared but never referenced

quiptools.cu
cl : Command line warning D9002 : ignoring unknown option '-rdynamic'
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build
    subprocess.run(
  File "D:\anaconda3\envs\textgen\Lib\subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\text-generation-webui\repositories\quip-sharp\quiptools\setup.py", line 4, in <module>
    setup(name='quiptools_cuda',
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\__init__.py", line 107, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\core.py", line 185, in setup
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands
    dist.run_commands()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\dist.py", line 1234, in run_command
    super().run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\install.py", line 80, in run
    self.do_egg_install()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\install.py", line 129, in do_egg_install
    self.run_command('bdist_egg')
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\dist.py", line 1234, in run_command
    super().run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\dist.py", line 1234, in run_command
    super().run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\install_lib.py", line 11, in run
    self.build()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\command\install_lib.py", line 111, in build
    self.run_command('build_ext')
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\cmd.py", line 318, in run_command
    self.distribution.run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\dist.py", line 1234, in run_command
    super().run_command(command)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\build_ext.py", line 84, in run
    _build_ext.run(self)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 345, in run
    self.build_extensions()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 467, in build_extensions
    self._build_extensions_serial()
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 493, in _build_extensions_serial
    self.build_extension(ext)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension
    _build_ext.build_extension(self, ext)
  File "D:\anaconda3\envs\textgen\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 548, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

iChristGit · 2023-12-09T17:13:09Z

I think we just need to download a pre-compiled wheel and use it instead of building it @BadisG

BadisG · 2023-12-09T17:21:14Z

Do we have such wheel yet @iChristGit ?

iChristGit · 2023-12-09T17:29:54Z

Do we have such wheel yet @iChristGit ?

Not yet sadly, I also wanna run natively on Windows11, its the same errors that i suppose someone with a native linux build can do and upload, just like the old GPTQ.

TheLounger · 2023-12-10T03:33:00Z

File "D:\anaconda3\envs\textgen\Lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

This can be fixed by disabling Ninja. setup.py line 10:

cmdclass={'build_ext': cpp_extension.BuildExtension.with_options(use_ninja=False)})

But then you'll also probably get this:

quiptools_e8p_gemv.cu(218): error: more than one operator "*" matches these operands:
            built-in operator "arithmetic * arithmetic"
            function "c10::operator*(const c10::Half &, const c10::Half &)" .....

The Internet suggests we need the -D__CUDA_NO_HALF_OPERATORS__ nvcc flag, but this doesn't seem to make a difference (also seems to be in use already.)
set TORCH_NVCC_FLAGS="-D__CUDA_NO_HALF_OPERATORS__" doesn't work either.

Windows 10
PyTorch: 2.1.1+cu121
CUDA: 12.1.1
C++: VS2022 | MSVC 14.36.32532

Note that oobabooga already attempted to make wheels so for Windows we might just need to wait for that to succeed or for QuIP# devs to give some pointers or fix their setup script.

NoMansPC · 2023-12-11T15:57:18Z

Man, I keep hoping that Quip will work out of the box with new iteration of the webui, but so far, still no luck. It's still asking to install Quip manually.

iChristGit · 2023-12-11T16:03:16Z

Man, I keep hoping that Quip will work out of the box with new iteration of the webui, but so far, still no luck. It's still asking to install Quip manually.

Its takes time, when GPTQ was first released I was picking my hair with each error to compile it, now its a 1 click install.
Imagine running Mixtral-8x7B with 2-bit with great perplexity :O

NoMansPC · 2023-12-11T16:07:32Z

Man, I keep hoping that Quip will work out of the box with new iteration of the webui, but so far, still no luck. It's still asking to install Quip manually.

Its takes time, when GPTQ was first released I was picking my hair with each error to compile it, now its a 1 click install. Imagine running Mixtral-8x7B with 2-bit with great perplexity :O

hahaha yeah I know. Same with Exllama 2. It wouldn't work at all when it was released.

tsengalb99 · 2023-12-11T17:51:34Z

Sorry we (the QuIP# team) can't be of much help here since we don't have any access to Windows machines with NVIDIA GPUs. We're hoping to package quiptools into a wheel in the future when it becomes more mature, but as of now since QuIP# is a WIP the install process is a bit more involved (but hopefully not too involved).

Nicoolodion2 · 2023-12-11T20:39:46Z

Okay... who is gonna quantize Mixtral-8x7B? And what VRAM/RAM requirements would that have?

iChristGit · 2023-12-11T21:22:02Z

Okay... who is gonna quantize Mixtral-8x7B? And what VRAM/RAM requirements would that have?

Already been quantized by thebloke
https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF
GGUF/GPTQ/AWQ ready (haven't checked yet)

Nicoolodion2 · 2023-12-14T19:07:51Z

Okay... who is gonna quantize Mixtral-8x7B? And what VRAM/RAM requirements would that have?

Already been quantized by thebloke https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF GGUF/GPTQ/AWQ ready (haven't checked yet)

I thought Thebloke still doesn't provide QuIP# quantization?

iChristGit · 2023-12-16T21:16:17Z

Okay... who is gonna quantize Mixtral-8x7B? And what VRAM/RAM requirements would that have?

Already been quantized by thebloke https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF GGUF/GPTQ/AWQ ready (haven't checked yet)

I thought Thebloke still doesn't provide QuIP# quantization?

You are right, I was thinking you wanted any kind of quant, not Quip#
I dont know who is the hero that would start quantization for 2bit Quip#

BadisG · 2023-12-16T21:45:19Z

@iChristGit As long as it still doesn't work on windows I don't see the incentive for it...

Nicoolodion2 · 2023-12-16T22:37:16Z

@iChristGit As long as it still doesn't work on windows I don't see the incentive for it...

Oh wait what? It doesn't work on windows yet? That would explain a lot for me, because i haven't been able to get it running yet...

iChristGit · 2023-12-17T09:48:03Z

@iChristGit As long as it still doesn't work on windows I don't see the incentive for it...

Oh wait what? It doesn't work on windows yet? That would explain a lot for me, because i haven't been able to get it running yet...

Yep its hard to figure out how to compile it on windows, but on linux its easy as far as people say.
I don't know why can't someone just upload the wheel for us, maybe its not as straightforward.

iChristGit · 2023-12-17T09:48:52Z

@iChristGit As long as it still doesn't work on windows I don't see the incentive for it...

Yep.. you can run WSL on windows for the meantime maybe.

iChristGit · 2023-12-27T17:44:34Z

As of latest commits Quip# is marked as only available on linux, does this mean its not posibble to make it work in windows at all? @oobabooga

oobabooga · 2023-12-27T18:04:49Z

I have tried to compile it for Windows using GitHub actions and it fails with some vague errors. I think that there is something in the quip# code itself that prevents it from compiling on Windows.

Nicoolodion2 · 2023-12-30T21:01:24Z

Okay I am on Windows WSL (Ubuntu) now, but I get this error when I try to install python setup.py install. A part of the error is:

error: can't create or remove files in install directory

The following error occurred while trying to add or remove files in the
installation directory:

[Errno 13] Permission denied: '/usr/local/lib/python3.10/dist-packages/test-easy-install-4350.write-test'

CamiloMM · 2024-01-31T18:41:00Z

Is there any technical reason for it not working on Windows, or is it just "this is too new, and nobody really tried"? If someone bumped into a roadblock, it might be good to document it (some dependency not compiling?)

For the few that managed to run it, is it really as good as the perplexity claims make it seem?

iChristGit · 2024-01-31T20:18:35Z

I have tried to compile it for Windows using GitHub actions and it fails with some vague errors. I think that there is something in the quip# code itself that prevents it from compiling on Windows.

"I have tried to compile it for Windows using GitHub actions and it fails with some vague errors. I think that there is something in the quip# code itself that prevents it from compiling on Windows."

A comment from ooba a couple weeks back, still same issue it wont compile on windows.

CamiloMM · 2024-02-01T20:06:24Z

Ok, tried it for a bit now, the thing that hangs is package fast-hadamard-transform which fails with

UserWarning: fast_hadamard_transform was requested, but nvcc was not found.  Are you sure your environment has nvcc available?

However, nvcc --version reports

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:09:35_Pacific_Daylight_Time_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

That was as far as I got because I have no idea what to do next and Python befuddles me.

Tralen · 2024-02-04T01:41:13Z

@CamiloMM, I solved that problem by installing the current cuda-toolkit from nvidia website (I'm on Linux Mint).

@Nicoolodion2, it is a permission issue, I had to add --user to the command to get setup.py to run correctly. It would be good to know if we should be setting a virtualenv in the instructions.

I any case, even though quip-sharp is right there, oobabooga still doesn't find it for me. I can't get past the error:

QuIP# has not been found. It must be installed manually for now.

Add QuIP# support

323f55c

oobabooga mentioned this pull request Dec 4, 2023

Add QuIP# support #4799

Closed

oobabooga added 3 commits December 3, 2023 21:11

Sort imports

a8422bf

Autodetect the correct loader for QuIP# models

81af128

Try to load the tokenizer from the model folder

e79a244

oobabooga added 3 commits December 4, 2023 10:33

Improve the QuIP# model detection

920e223

Update modules/loaders.py

604071c

Merge branch 'dev' into quip-sharp

3019c3c

TheLounger mentioned this pull request Dec 5, 2023

Mistral models output gibberish Cornell-RelaxML/quip-sharp#6

Closed

oobabooga added 4 commits December 4, 2023 20:35

Merge branch 'dev' into quip-sharp

fce8173

Merge branch 'dev' into quip-sharp

53d5e22

Merge branch 'dev' into quip-sharp

d8e03b5

Fix a logging issue, do not use the default llama tokenizer

21c72a6

oobabooga merged commit 98361af into dev Dec 6, 2023

oobabooga deleted the quip-sharp branch December 6, 2023 05:30

oobabooga mentioned this pull request Feb 7, 2024

Add AQLM support (experimental) #5466

Merged

Add QuIP# support #4803

Add QuIP# support #4803

Conversation

oobabooga commented Dec 4, 2023 • edited Loading

Installation

Perplexity

Performance

oobabooga commented Dec 4, 2023 • edited Loading

LoopControl commented Dec 4, 2023 • edited Loading

oobabooga commented Dec 4, 2023

tsengalb99 commented Dec 4, 2023 • edited Loading

tsengalb99 commented Dec 4, 2023

oobabooga commented Dec 4, 2023

oobabooga commented Dec 4, 2023

tsengalb99 commented Dec 4, 2023

oobabooga commented Dec 4, 2023

jerry-chee commented Dec 4, 2023

oobabooga commented Dec 4, 2023

oobabooga commented Dec 6, 2023

tsengalb99 commented Dec 6, 2023

Ph0rk0z commented Dec 6, 2023

iChristGit commented Dec 8, 2023 • edited Loading

NoMansPC commented Dec 9, 2023

iChristGit commented Dec 9, 2023

cmhamiche commented Dec 9, 2023 • edited Loading

iChristGit commented Dec 9, 2023

cmhamiche commented Dec 9, 2023 • edited Loading

BadisG commented Dec 9, 2023 • edited Loading

iChristGit commented Dec 9, 2023

BadisG commented Dec 9, 2023

iChristGit commented Dec 9, 2023

TheLounger commented Dec 10, 2023 • edited Loading

NoMansPC commented Dec 11, 2023

iChristGit commented Dec 11, 2023

NoMansPC commented Dec 11, 2023

tsengalb99 commented Dec 11, 2023

Nicoolodion2 commented Dec 11, 2023

iChristGit commented Dec 11, 2023

Nicoolodion2 commented Dec 14, 2023

iChristGit commented Dec 16, 2023

BadisG commented Dec 16, 2023

Nicoolodion2 commented Dec 16, 2023

iChristGit commented Dec 17, 2023

iChristGit commented Dec 17, 2023

iChristGit commented Dec 27, 2023

oobabooga commented Dec 27, 2023

Nicoolodion2 commented Dec 30, 2023

CamiloMM commented Jan 31, 2024

iChristGit commented Jan 31, 2024

CamiloMM commented Feb 1, 2024

Tralen commented Feb 4, 2024 • edited Loading

oobabooga commented Dec 4, 2023 •

edited

Loading

oobabooga commented Dec 4, 2023 •

edited

Loading

LoopControl commented Dec 4, 2023 •

edited

Loading

tsengalb99 commented Dec 4, 2023 •

edited

Loading

iChristGit commented Dec 8, 2023 •

edited

Loading

cmhamiche commented Dec 9, 2023 •

edited

Loading

cmhamiche commented Dec 9, 2023 •

edited

Loading

BadisG commented Dec 9, 2023 •

edited

Loading

TheLounger commented Dec 10, 2023 •

edited

Loading

Tralen commented Feb 4, 2024 •

edited

Loading