-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Extend bitsandbytes to support Intel hardware platforms #894
Comments
Update the PR #898 for the above first plan.
|
@jianan-gu Thanks for this high quality and well-written analysis / design document! Tim and I will look into this, as well as the PR, and come back to you soon. Unfortunately, both the images come back with a 404. I bet this is because it's from a non-public repo, so it would show up correctly for you (with access). Would you be so kind to make the images available to us? |
Hi @Titus-von-Koeller, thanks for your reply and reminders, have reattached the images. |
@jianan-gu I've talked with Tim about this and we're definitely going forward with this integration. The design also looks good, but this question grants a deeper look. Since different hardware support for Tim is quite busy these days and currently at NeurIPS, so that might delay things a bit. |
Submitted PR jianan-gu#3 for the above plan step 2 (CPU part).
|
You were submitting PR against @jianan-gu 's personal branch, not bitsandbytes mainline? |
Yes. Because his PR is not merged, we cannot use the mainline as the base. |
We now have our PR to enable NF4 on CPU/XPU here: jianan-gu#4 for step 3 of our plan:
Since out PRs have dependencies to each other, we submitted those PRs in our repos instead of the mainline. We will rebase these PRs when everything is ready. |
Motivation
The current bitsandbytes library is bound with the CUDA platforms. However, we are seeing that there is a rapidly growing demand to run large language models (LLMs) on more platforms like Intel® CPUs and GPUs devices ("xpu" is the device tag for Intel GPU in PyTorch). Therefore, we aim at extending Intel® CPU and GPU ecosystem support and optimizations to bitsandbytes and offer the same scope of the lower-precision computation features (8bits and 4bits) as CUDA.
Approach
To provide the 8bits and 4bits features for Intel platforms, we propose two major changes as follows:
Device abstraction
We will extend CUDA dependency to Intel CPU/GPU in bitsandbytes device setup and init. We will provide common device abstractions for general devices (there will be no changes on CUDA).
Note that there is also no API or usage change for Huggingface users to use different devices with bitsandbytes.
Lightweight integration for Intel CPU/GPU
We will do lightweight and simple integration to enable low-precision computation features, both 8bits and 4bits. We don't plan to add native backend code for Intel CPU and GPU in the first step. Instead, we employ PyTorch 2.x compilation and Intel® Extension for PyTorch to enable those features.
For example:
For example:
Design
(1) Reorganize device_setup to support multiple devices
Intel CPU or GPU
CUDA
(2) Device backend abstraction with key kernel interfaces
Key functions that are used in mainstream 8bits and 4bits:
| F.igemmlt |
| F.double_quant| F.mm_dequant| F.transform| F.extract_outliers| F.quantize_4bit| F.dequantize_4bit |
To extend the support of the above functions on Intel CPU/GPU (CUDA remains the same), we propose the following designs:
PR plans:
Adding options of init Intel CPU/GPU device but no implementations, CUDA remains the same.
Adding implementations of 8bits functions for Intel CPU/GPU devices.
Adding implementations of 4bits functions for Intel CPU/GPU devices.
Additional contents
Besides, we will also propose the PR to Transformers upstream to extend the usage of bitsandbytes API on multi-devices.
Transformers changes
The text was updated successfully, but these errors were encountered: